November 2008 Newsletter

What Does Your Web Server Say to the Search Engines?

**** The 4specs Perspective

This newsletter contains suggestions on how to configure your web server to better serve your users and to enhance search engine robots indexing your web site. This newsletter is probably our most technical newsletter. You will need to discuss this with your web designer/consultant or IT department and see what can be implemented.

White Hat SEO - All of our newsletters are focused on white hat concepts of search engine optimization. Remember - the good guys in the cowboy movies always wore white hats? Bruce Clay has an excellent discussion on this:
[link no longer available]

Browser Dialog - Your user's browser has a hidden dialog with your server. You can control most of the dialog through proper server set up. This is an area that many web designers ignore or do not understand or do not understand the significance of the setup.

Check What Your Browser says - If you'd like to see the dialog from your browser to a server, click on the link below:
http://www.ericgiguere.com/tools/http-header-viewer.html

Return Dialog - In return the server has a dialog back to the browser about the information being sent. Using the WebDeveloper plug in for FireFox here are the headers when I downloaded an early version of this page:


Date: Sun, 12 Oct 2008 18:29:39 GMT 
Server: Apache/2.2.9 
Last-Modified: Sun, 12 Oct 2008 18:29:32 GMT 
Etag: "164a88-bf7-45912904b9b00" 
Accept-Ranges: bytes 
Cache-Control: max-age=43200 
Expires: Mon, 13 Oct 2008 06:29:39 GMT 
Vary: Accept-Encoding 
Content-Encoding: gzip 
Content-Length: 1309 
Content-Type: text/html 

200 OK

Use this tool if you would like to see the headers from your server:
http://www.rexswain.com/httpview.html

From this dialog, the search engine knows a lot about your server and helps the search engine determine how frequently to return and download the page. I propose that you want to provide as clear a picture of your information as you can to help the search engine as well as the user. You can improve the user's experience and the search engine's performance for your site by making minor changes to your server configuration. Here is more information about each field in the dialog:

Date - the date requested
Server - 4specs runs on a FreeBSD computer running Apache 2.2.9
Last-Modified - The file was last modified a few seconds before I downloaded the file [see below]
Etag - a 24 character hash summary of the page - when the page is modified, the ETag will change.
Accept-Ranges - not important to this discussion
Cache-Control - [see below]
Expires - [See below]
Vary - meaning the response will vary with the Accept_Encoding request - in this case Gzip [see below]
Content-Encoding - [see below]
Content-Length - helps the search engine and browser determine the size of the file.
Content-Type - will vary if you are requesting a web page or an Word or Excel document or an image
200 - is the response code. Could be others such as a 301, 302, 404, etc. [see below]

Last-Modified and Etag - These inform the user (individual's browser and search engine robot) when the page was last modified and provide a hash of what the page contained. When the user returns to the page, the browser sends the Last-Modified date and Etag back to the server. If the page has not been modified, the server will send a 304 code - saying the page has not changed. This makes it much faster for the user as they do not need to download the page or image again.

A content management system generally cannot provide the last-modified data as the page is generated on the fly and has no last modified date. This is why I recommend that the top 20-30 pages be static html even if there is a content management system used behind the top pages. A page generated with asp or php also cannot return the proper codes without special handling. Again, I recommend the top pages be static html.

My bias is to static pages as best for the user and best for search engine results. 4specs has about 1,200 pages of static html all maintained in Dreamweaver.

Make sure your web server supports the If-Modified-Since HTTP header sent by the browser. This feature allows your web server to tell Google whether your content has changed since we last crawled your site. Supporting this feature saves you bandwidth and overhead. Here is a statement from Google - look for Technical Guidelines and consider all of their recommendations:
https://support.google.com/webmasters/

You can see if your server properly sends a 304 code using this tool:
[link no longer valid]

While you are testing, also check that your server is returning a proper 404 code. Some servers are not configured to do so. It is important to tell the search engines that the page or image no longer exists. Use this tool looking for a page you know does not exist - x4r5.html for example. If you are not serving the proper header, adjust the server to do so:
http://www.rexswain.com/httpview.html

Cache-Control - tells the browser how long the page or images are valid for. When properly set the browser will not recheck the page when returning for the time you say to cache. There is no reason not to set a long cache for images to make return visits faster.

I have the html code, CSS file and javascript files on 4specs set for a 12 hour expire so people will not download the file a second time in a single working day. The images are set for a 5 year expire. The only caution is that if the image changes you need to change the file name. This is easy to set up on a server and can be checked with one of the Rex Swain links above.

Content-Encoding and Gzip - Just as you can zip a file to make it smaller, the server can serve the html page as a zipped file - called gzip or deflate. The browser will tell the server that it can accept gzip or deflate encoding. If you clicked on the second link above you should have seen this. 4specs has used gzip for over 6 years and is one of the reasons 4specs feels fast. A page with 60,000 bytes of html will be downloaded as a 12,000 byte gzipped file. This will make the page appear faster.

Some smarter search engines like Google even prefer compressed content, as it is less data (and therefore cheaper) for them to download and index. Depending on how your server works, this may be easy, difficult or not possible.

If you are in a shared server environment - especially on an inexpensive server package, say under $30 per month - you probably will not have access to set up gzipping. Large hosts (Yahoo, GoDaddy and others) are unlikely to permit you to set up gzipping, but if you work with a small provider or ad agency, they may be willing to make the changes.

If you are on a UNIX dedicated server or a VPS (Virtual Private Server) using Apache 2.0, this is easy to set up and involves changing just a few lines of code in the configuration files.

If you are on a UNIX dedicated server or a VPS (Virtual Private Server) using Apache 1.x, you will have to install mod_gzip to get this to work. I would recommend upgrading the server to Apache 2.0. Apache 2.0 has been a stable program for over a year.

If you are on a dedicated Microsoft server, gzip is more difficult and less commonly done. Here is some specific information:
[link no longer valid]

While you are working in the server configuration, I recommend you clean up any canonical problems and redirect all possible page links to a single url. For example, redirect example.com to www.example.com and redirect www.example.com/index.html to www.example.com/. This eliminates duplicate urls pointing to the same information. This may be important for your search engine rankings, although Google is better at this than before. This is easy to do on an Apache server. Here is how to it on a Microsoft server.
[link no longer works]

Your web designer or IT department should be able to work out what is required to achieve the suggested headers. I can provide you with the code I use at 4specs - just ask - although our code is specific to an Apache server.

Let me know if you have any questions or suggestions.

--------------------------------------

Colin Gilboy
Publisher - 4specs
Contact us