Long before a search engine can ever understand whether a webpage will satisfy a user’s search inquiry, the search engine must be in a position to locate the content. As we discussed, search engines use robot software to scour the Internet looking for websites. As we alluded to in an early section of the tutorial, search engines gather information about websites by sending out software “robots” (a.k.a.,. spiders, crawlers) to scan the Internet. To find their way from one site to another and to navigate within a single site search engines robots locate and follow links. These automated programs locate information, retrieve the data and store what is found in massive databases. Once additional software analyzes or “indexes” the acquired material it can then be made available to respond to a user’s search queries.
For marketers the process used by search engines to build their information repositories needs to be understood if a marketer’s website content is to be fully included. A marketer has no chance of using search engines to drive traffic to their site if information on the site is not contained within a search engine’s database. This, of course, means the site must be accessible to search engine robots.
But how does a search engine know where to find websites? In some cases a marketer can send a message to the search engine letting it know the web address of content. For instance, all three major search engines Google, MSN and Yahoo have online forms that allow marketers to submit a site’s main URL to trigger crawling. But this only gets the search engine spider to a site’s front door. To get to inside pages within the site requires the robot to either: 2) follow links found internally on the site, or 2) follow links appearing on an external site (i.e., another website). Unless a website consists of only one or a few pages it is unlikely that all pages can be found through external links. Rather, search engine robots must rely on the site itself for guidance in locating content found inside the site. This means a site must contain an internal linking system to guide a search engine as it indexes the site.
To insure search engine robots can find webpages through internal links, marketers should consider the following issues:
We should note that some of the material discussed below will require adjustments to a website’s operational side (e.g., adjustments on the web server). Marketers who are not familiar with technical aspects of operating a website are encouraged to discuss these issues with your technical contacts.
Building a Menu System
A website should be designed in such a way that makes it easy for both users and search engine robots to move around the site and allow access to all material contained within the site. In most cases navigating a site is handled via a menu system that includes links to important internal content areas. Some sites, particularly sites with a relatively small number of pages, provide access to nearly all internal pages through a single main menu. Alternatively, sites with many content areas often follow a hierarchical or drill-down design where there is a main menu containing links to important areas and then once in an area users are presented with a menu of links to further sub-areas. In some cases sub-areas may contain even more menus.
When designing a menu system, marketers should take the following into consideration:
- Design for Users But Within SE Marketing Parameters – Menu systems should first and foremost, be built for site visitors and be both intuitive (contains what a visitor expects it to contain and allows visitors to get where they want to go) and consistent (follows similar pattern and design from one page to another). However, as we will soon discuss, marketers should be cognizant of the potential obstacles menu designs present to search engines crawlers. So while a menu should be built for the website’s targeted audience it should be done with an understanding of how search engines see the menu.
- Text Menus May Be Better Than Image Menus – One additional bit of caution deals with menus created using images. As we discussed in previous tutorials, search engines are quite adept at recognizing content produced as text but often fall short at recognizing images. Many sites use a menu system built on linked images and not linked text. For instance, a menu may show an image containing the words “Our Products” that when clicked on takes a user to the marketer’s product page. Yet search engines do not view this as text. We will discuss the importance of this in greater detail in a later tutorial, but for now understand that search engines not only follow links but they also attempt to gain understanding about the link. In this example, a search engine attempts to understand what the link is pointing to (a product page). While this is easily done with text links it is more difficult with image links. If image links need to be used, the marketer should understand the importance of the ALT tag that is associated with the image link. Better yet, create menus that are text menus but use images as background and not as links.
Creating a Sitemap
While a good menu structure is important to aiding the crawling activity of search robots, for most medium-to-large sites, menus alone are often not sufficient to house links to all content contained in a site. This is particularly the case where a site has a large number of areas that are not easily pointed to from a main menu. Additionally, as we noted, menus delivered dynamically or as an image may not be easily understood by search engines.
For these situations it may be helpful to include a sitemap. A sitemap offers a single page guide to site content and is presented in the form of text links. A sitemap can help search engines locate hard to find pages. In fact, many search engines now strongly encourage sitemaps as a way to speed up the process of locating site content.
Using Page Redirects
There are times a marketer must change the file location of a webpage. This can occur for several reasons including:
- Domain Name Change – The marketer’s website domain has been changed often as the result of a corporate reorganization, such as a merger between two companies or the consolidation of several domains under a single domain.
- URL Renaming – The marketer may decide to rename a file in order to gain search engine advantages (to be discuss in a later tutorial).
- Page Replacement – The marketer has removed a webpage and would now like visitors to see something different (e.g., old product page to new product page).
In instances of URL or file name changes, or page removals, where the material is still to be viewed (i.e., not deleted), web marketers must direct search engines to the new location otherwise risk not having the webpage found by search engine robots. Depending on the server platform on which the website is hosted, the process requires manual entry to instruct the web server to direct request to the new location. For instance, for web servers operating in the Apache web server environment, instructions to automatically redirect a URL request is handled in a file called “.htaccess”. For users and search engines, the result of the redirect is transparent.
Managing Broken Links
While web marketers should do their best to direct users from old pages to new, invariably some links will lead users and search engines to a dead end. On the web this means the dreaded “page not found” screen will appear in the web browser when a web server is unable to locate the requested page. Marketers should understand that a “page not found” message (technically called a 404 error) is not always due to website mistakes. This error is triggered any time a web address cannot be located on a server. So any broken link pointing to the marketer’s website, whether on the marketer’s site or on an external site, will produce this error if the URL is not reachable. For example, if a major news website posts a favorable article about a marketer’s product but mistakenly lists a wrong web address the 404 server error will appear when readers attempt to click the link from the news website. Under this situation, search engines crawling the news site may follow the link to the marketer’s site but will quickly leave the site when it encounters the “page not found” error, thus preventing the indexing of the page at least until the robot returns to the marketer’s website which may be some time later.
To overcome the “page not found” dead end, marketers are wise to produce a customized 404 error page. This page should give the appearance of a regular webpage found on the site and should contain menus so that if a person or search engine robot reaches this page they are aware that a real site does exist. This approach will also allow a search robot to index other parts of the site by following the links.
Restricting Crawling Activity
Just as marketers can guide search engine robots to locations they want indexed, marketers can also prevent robots from gathering information from certain areas of a site. There are two main reasons to control the files a search engine will access. First, some areas of a website may contain information that a marketer would not like to be made public. For instance, the company may be developing new information on a product but is not yet ready to make it publicly available.
A second key reason to restrict robot access is to reduce the amount of traffic on a website’s server. Too much traffic can place strain on the server resulting in slower loading of all webpages.
To restrict robot traffic, websites follow something called the robot exclusion protocol. This involves the placement on the site of a file named “robot.txt” that contains instructions to let search engine robots know what is and is not accessible. For anyone interested in further information on how to write instructions for the robot.txt file this site has good information.