1.1 Problem statement
Hypertext Transfer Protocol (HTTP) has been in use by the World-Wide Web global information initiative since 1990. The first version of HTTP, referred to as HTTP/0.9, was a simple protocol for raw data transfer across the Internet [1]. HTTP/1.1 has been the standard since 1999 when it was introduced, but in the twenty first century, has been struggling to take full advantage of all the power and performance that TCP offers to keep up with increasingly demanding web pages. HTTP clients and browsers have to be very creative to find solutions that decrease page load times [2]. One such solution, involves browsers using multiple TCP connections to issue parallel requests which can each issue a request. However, one limitation of this workaround is that using too many connections effectively negates TCP congestion control and results in congestion events that hurt performance as well as the network. Furthermore, using so many connections to load the resources of a single page is fundamentally unfair and wasteful, it involves unnecessarily duplicating some of the data travelling through the wire.
Some other workarounds of HTTP/1.1 include spriting, in which multiple images are combined into one larger image, concatenation, in which JavaScript or CSS files are concatenated into a single resource, resource inlining, in which the number of outbound requests is reduced by embedding the resource within the document itself, and domain sharding, which is described further in the discussion section. As workarounds however, they can cause a number of problems themselves (such as the congestion problem that the multiple connections cause) and are indications of underlying problems of the protocol [3].
HTTP/2.0 was introduced in 2015 in order to eliminate the need for such hacks to speed up the response time of the web. Ideally we would believe the HTTP/2.0 protocol is always faster than HTTP/1.1 but studies have shown that In highly congested network, multiplexed stream does not compete for bandwidth better than HTTP/1.1 connection [4]. This begs the question, is HTTP/2.0 generally faster than HTTP/1.1, as advertised?
1.2 Goals of the Study
The objective of this investigation is to survey the adoption of HTTP/2.0. The study also aims to discover whether HTTP/2.0 is indeed faster than HTTP/1.0. We begin by investigating the coverage of the HTTP versions by surveying popular websites to see how many of them support HTTP/2.0. We choose a subset of these websites to measure and compare the performance of both versions of HTTP in terms of the speed of data transfer in order to ascertain whether HTTP/2.0 does indeed produce a net performance gain. Finally, we discuss the features of HTTP/2.0 which can lead to performance differences.
2. Methodology
2.1 Selecting websites for study
113 websites were chosen from alexa.com, which lists the top 500 most popular websites [5]. The factors taken into consideration in choosing these websites were host location – we attempted to diversify our choices to multiple regions rather than just North American, and website recognizability – the websites that were familiar to us.
2.2 Tools used
The command line tool cURL [6] was used to send HTTP requests and retrieve the details of the response. cURL can get an in depth analysis of an HTTP request such as the HTTP status code returned, header size, download speed, upload speed, and the time taken to load the entire website amongst others. Using the different options we can specify the version of HTTP to use, be it 1.1 or 2. A bash script, appendix script 1, was created to automate the capture the timing and packet information from the curl command. It takes as input a txt file containing the list of websites and another txt as an option for the curl call to parse out the information we would like to be outputted in a csv file. This includes the URL, the http status code, the http version the website returned, the total time needed for the website to load and many others as specified in the slide.
2.3 Experiments Conducted
HTTP/2.0 coverage is measured using the option '–http2', and observing whether the server returns a response message in HTTP/2.0 format. If the server responds with an HTTP/1.1 message, the server is deemed to not support HTTP/2.0 protocol. Using the option '–http1.1', we can similarly measure the HTTP/1.1 coverage of websites. After measuring the coverage, the response speeds and header sizes are compared for the websites that support both HTTP/2.0 and HTTP/1.1.
The cURL options [6] used to examine the packets returned were chosen as follows:
• size_download: The total amount of bytes that were downloaded.
• speed_download: The average download speed that curl measured for the complete download. Bytes per second.
• http_code: The numerical response code that was found in the last retrieved HTTP(S) or FTP(s) transfer.
• size_header: The total amount of bytes of the downloaded headers.
• time_namelookup: The time, in seconds, it took from the start until the name resolving was completed.
• time_starttransfer: The time, in seconds, it took from the start until the first byte was just about to be transferred. This includes time_pretransfer and also the time the server needed to calculate the result.
• time_total: The total time, in seconds, that the full operation lasted.
Figure 1: HTTP Request Timings with cURL[7]
By taking the difference between time_total and start_transfer, we can find the time that it takes for content transfer, as shown in figure 1. We can also find the time it takes from the TCP connection to Content transfer by subtracting "time_namelookup" from "time_total".
3. Results
2 out of 113 websites were filtered out because they have garbage data for "size download", "speed download", etc. and would skew our results. In the returned responses, 71 websites out of 111 gave '200 – Success OK' response codes, while the rest gave responses that are summarized in Figure 2. From the descriptions, it is apparent that the response codes that are not 200 indicate that a proper web page with assets was not actually received. Therefore, in examining the coverage and calculating response sizes and header sizes, only those that gave 200 response codes were considered.
Figure 2: HTTP Status Codes
The full tables that show the response times and other metrics for all 111 websites is shown in the Appendix, in Figure 4 for the query using the '–http2' tag, and Figure 5 for the query using the tag '–http1'.
3.1 Coverage
64 out of 71 websites support HTTP/2.0, which is 90.1% of the legitimate data set. We also noted that all of the websites with the top level domain ".cn" did not support HTTP/2.0.
3.2 Speed
Examining the 64 websites that support both protocols, we found that the average content transfer time ("total_time" -"time_starttransfer") for HTTP/1.1 was 0.3773 seconds, while that of HTTP/2.0 was 0.3289, which is a 12.8% speedup. This confirms our hypothesis that HTTP/2.0 has faster response times than HTTP/1.1. We can also calculate load time based on the time that include TCP connection and content generation are 1.168 and 1.081 seconds for v1.1 and 2.0 respectively. This is a 7.4% speedup.
Noting, however, that the "size_download" tended to be smaller overall for HTTP/2.0, 96033 bytes vs 100301 bytes, we also examined the "speed_download" values, which were 70177 bytes/sec and 65944 bytes/sec respectively for HTTP/2.0 and HTTP/1.1.
Thus, HTTP/2.0 transferred 291.982 KB/s and HTTP/1.1 transferred 265.838 KB/s, which results in a 9.8% true speedup from HTTP/1.1.
3.3 Header Size
The average header size for HTTP/2.0 was found to be 1210 bytes, while that of HTTP/1.1 was 1239 bytes, larger than HTTP/2.0, as expected. These results are summarized in Figure 3.
Figure 3: Measurements for HTTP/1.1 vs HTTP/2.0
4. Discussion
Figure 4: Loading assets HTTP/1.1 vs HTTP/2.0 [8]
One major factor in HTTP/2's performance boost is its fully multiplexed nature. In the problem statement, it was mentioned that HTTP/1.1 can only send one request at a time for a TCP connection. This means that each request and response set takes one RTT (Round Trip Time). HTTP/2.0 by contrast, can send multiple requests for data in parallel over a single TCP connection. This idea is demonstrated in Figure X above. The multiplexing concept is especially important as most modern browsers limit the amount of connections that are made to one server [9].
Another reason for the performance bump is that HTTP/2.0 is a binary protocol, whereas HTTP/1.1 is text-based. Binary protocols have the advantages of being more efficient to parse, more compact (and thus faster to send), and they are also much less error-prone, as there is no need to accommodate multiple ways to parse the message [1]. Because of the affordances built into HTTP/1.1 to handle whitespaces, capitalization, line endings, blank lines, and so forth, there are four different ways to parse an HTTP/1.1 message, as compared to the one way of parsing an HTTP/2 message.
Additionally, HTTP/2 has significantly smaller header sizes, due to its more specialized HPACK algorithm [10], a format adapted to represent HTTP header fields more efficiently in the context of HTTP/2.0 along with better protection against Denial-of-service (DoS) attacks such as CRIME. On average, HTTP/2.0 header sizes are smaller with a 3% compression [11].
HTTP/2.0 allows servers to "push responses proactively into client caches instead of waiting for a new request for each resource" [12]. Typically in the 3 step TCP process, a browser requests a page from the server. The server sends the page to the browser and then the browser requests the embedded media from the server. In HTTP/2.0, when a browser requests a page from the server, the server will send the browser the page and all its embedded media. This increases page loading speed. Lastly, domain sharding is not needed with HTTP/2. Domain sharding is a technique for splitting resources across multiple domains to improve page load time [13]. Hosts limit the number of active downloads and when there are more active users downloading from the host, they experience delays. With domain sharding, the user’s browser connects to two or more different domains to simultaneously download the resources needed to render the web page [14]. This increases the download speed but requires an overhead to setup. In HTTP/2.0, this can be done by multiplexing which doesn’t carry the overhead cost of domain sharding.
One possibility for future experiments is to run the script multiple times on different days and at different times of the day. This would allow us to average the results in order to account for the effect that traffic may have on the response times. Presumably, if the level of traffic that there was when the HTTP/1.1 results were measured was lower or higher than when HTTP/2.0 was measured, there may be a measurable difference in the response speeds that are completely independent of the underlying protocol, and rather reflect the state of congestion of the web at the time that data was acquired.
5. Conclusion
From the results of our experiment, we discovered that about 90% of the top websites that we retrieved valid data from support HTTP/2.0. The header sizes were smaller for HTTP/2.0, as expected, and the overall response time was faster by about 13%, also as expected. Even taking into consideration that the average download size of the HTTP/2.0 requests were smaller than that of HTTP/1.1, the average retrieval speed, was greater for HTTP/2.0 by 9.83%. In conclusion, using HTTP/2.0 does indeed lead to better performance than HTTP/1.1.