The negotiation process is easy to understand. First of all, when the browser requests some content from the Web server, the Web server needs to tell the browser which content can be cached. Once the browser knows that a certain content can be cached, the next time the browser needs to request this content, it will not directly request the entire content from the server, but will ask the server whether it can use the local cache. After receiving the browser’s query, the server needs to make a decisive response, whether to allow the browser to use the local cache or to return the latest content to the browser.
The HTTP protocol stipulates the use of GMT, which is Greenwich Mean Time, while our country uses GMT+8 time zone, so the time in the HTTP header information will be 8 hours earlier than our normal time, but it does not affect the normal operation of the HTTP cache at all.
The following tags are added to the HTTP response header:
Last-Modified: Fri, 20 Mar 2009 07:53:02 GMT
The browser is no longer indifferent. It adds the following tag to the HTTP request header:
If-Modified-Since: Fri, 20 Mar 2009 07:53:02 GMT
This means that the browser is asking the Web server: “Has the content I requested been updated after this time?” At this time, the Web server shoulders an important responsibility. it needs to check whether the content has been updated after that time and feed it back to the browser. this process is equivalent to our traditional cache expiration check. for static content, the Web server can easily fix it, as long as it obtains the last modification time of the static file and compares it with the query time of the browser.
What we need to pay attention to here is the change of the response status code. 304 Not Modified means that the Web server tells the browser that this content has not been updated and the browser can use locally cached content. At the same time, the Web server did not send the body of the content to the browser.
HTTP/1.1 also supports another cache negotiation method, that is ETag, which is very similar to the previous negotiation method, but it does not use the last modification time of the content, but uses a string of codes to mark the content, called ETag. One principle is that if the ETag of a content has not changed, then the content must not be updated.
ETag is generated by a Web server. For example, Apache adds the following tag to the HTTP response header of a static file:
After the browser obtains the ETag for this content, it will add the following tag to the HTTP request header to ask the server if the content has changed the next time it requests the content:
At this time, the server needs to recalculate the ETag value of this content and compare it with the ETag in the HTTP request. If it is the same, it will return 304 status codes, if it is different, it will return the latest content to the browser.
HTTP/1.1 does not specify the specific format and calculation method of ETag, that is, the Web server can freely define the format and calculation method of ETag. for example, a simple method is to calculate md5 value for file content as ETag. in short, as long as it can play the role of identifying content, ETag: “1944822255 “
There are some disadvantages in using cache negotiation based on the last modification timeFor example, sometimes some files need to be updated frequently, but the contents may not change. If cache negotiation based on the last modification time is adopted, the browser will retrieve all the contents after the modification time of each file changes, regardless of whether the contents really change. For example,The same file is stored on multiple Web servers. Users’ requests are polled among these servers to achieve load balancing. However, the last modification time of the same file on these servers can hardly be guaranteed to be exactly the same.This will cause the user to retrieve all content every time he switches to a new server. At this time, if we use some ETag algorithm that directly marks the content, we can avoid these problems.
Another tag in HTTP is Expires, which tells the browser when the content expires, implying that the browser does not need to ask the server again before the content expires, but can directly use the local cache.
The benefits are obvious. Once the browser does not need to request the server at all, it will completely save bandwidth and server processing expenses, which can be said to be a great joy.
The Expires tag is more like a manager who is good at delegating power. Once the browser sees that a content is attached with the Expires tag, it has great power. It does not need to ask the server every time before expiration, and it can make its own decisions. The Last-Modified tag makes the browser feel constrained, and they have to ask the server every time, even if they think it is meaningless to do so..
Expires has a format similar to Last-Modified, which indicates the absolute time when the content expires, such as:
Expires: Sun, 10FEB 2002 16: 00: 00 GMT For static content, the Web server will not turn on the support of the Expires flag by default. We need to make certain configuration.
The expiration time specified by Expires is the system time from the Web server.If the user’s local time is inconsistent with the server time, it will definitely affect the validity check of the local cache..
It is easy to imagine, for example, the expiration time set by the server for a certain content is 1 hour, but if the browser’s time is 2 hours later than that of the server, then the content will be considered as expired immediately by the browser. Of course, in general, the operating system we use (such as Winddows) will use the standard time based on GMT, and then the local time will calculate the offset through the time zone, while the GMT time is also used in HTTP, so the difference between the local and the server will not be caused by the time zone for several hours. However, no one can guarantee that the local time of users is consistent with your server, and even sometimes your server time may be wrong, which will affect the normal work of the browser cache and make our painstaking efforts go to waste.
Fortunately, there is another tag in HTTP/1.1 to make up for the deficiency of Expires, which is Cache-Control. Its format is as follows:
Max-age specifies the relative time of cache expiration in seconds, and this time is relative to browser local time.
For static content, in fact, the Web server will automatically add the Cache-Control tag of the response when it turns on Expires for HTTP/1.1 compatibility. We will request a GIF picture, which is located on Apache server. We have set the expiration policy for Apache as follows:
ExpiresByType image/gif “access plus 1 hours”
Next, we request the URL of this picture on the browser, and then track the HTTP response header, as follows:
HTTP/1.1 200 OK Date: Tue, 24 Mar 2009 04:51:03 GMT Server: Apache/2.2.11 (Unix) PHP/5.2.1 DAV/2 SVN/1.4.3 Last-Modified: Wed, 27 Feb 2008 18:11:26 GMT ETag: "7815c-303-44727bbbf0f80" Accept-Ranges: bytes Content-Length: 771 Cache-Control: max-age=3600 Expires: Tue, 24 Mar 2009 05:51:03 GMT Keep-Alive: timeout=30, max=97 Connection: Keep-Alive Content-Type: image/gif
From the above header information, it can be calculated that the Expires time is exactly one hour after the Date time, and the value of max-age is 3600 seconds.
It is worth mentioning that the current mainstream browsers take HTTP/1.1 as the first choice, soWhen the HTTP response header contains both Expires and Cache-Control, the browser will give priority to Cache-Control. For the case without Cache-Control, the browser will obey the instructions of Expires..
Browser Requests and Cache Cleanup
For mainstream browsers, there are generally three ways to request a page:
This method can be called forced refresh, which makes the Web page and all its components send requests directly to the web server without using cache negotiation, so as to obtain the latest version of all content.
You can also hold down Ctrl and click the browser’s refresh button to get the same result. In actual use, few users will do this.
This method is a general refresh, which we often use. It is equivalent to clicking the refresh button of the browser. It allows the browser to attach the necessary cache negotiation to the request, but does not allow the browser to directly use the local cache. In other words, it can make Last-Modified work, but is not valid for Expires.
Click the “Go” button in the browser address bar or jump to this page via hyperlink.
I think this method is used most by everyone, and there is another operation equivalent to this method, that is, enter the URL in the address bar of the browser and press enter key. this method is commonly used in Firefox because it does not have a “go” button. These methods allow the browser to obtain the data of the web page with the least number of requests. The browser will directly use the local cache for all content that has not expired. Therefore, the Expires flag is only valid for this method. You need not feel strange that the browser does not use the local cache after pressing F5.