SEM: Accessible URLs

As we saw in our discussion of Site Navigation, it is vital that web marketers create a site structure that easily allows search engines to negotiate through the site in order to locate content.  This means creating an internal linking structure that enables search engines to locate all important pages.

As we’ve discussed, search engines seek out links, found both internally (within the site) and externally (found through another site), that are then followed (i.e., crawled by search engine robot software) in order to index a website (i.e., gather information).  Yet by itself a link is not enough to open the door and let a search engine do its thing.  Marketers must understand that the URL (i.e., web address) contained in the link can pose problems to search engine indexing activity.

In this part of our Search Engine Marketing discussion tutorial we explore four important considerations for ensuring the URL contained in a link is accessible to search engine robots.  These considerations include:

Search Engine Friendly URLs

As a search engine traverses a site in its attempt to locate content, it actively looks for links that contain a URL.  (It should be noted that not all links contain URLs.  Some, for instance, will take a site visitor to another place on the same page and not to a different page.)  Individual webpages are identified with a unique web address or Uniform Resource Locator (URL) that generally includes the site name followed by other identifiers.  In the beginning of the web these identifiers were associated with the location of a unique HTML file and often took the form: http://sitename/foldername/filename

These pages are considered “static” HTML pages since each page and all elements of the page are stored as a file and do not change unless the website operator changes individual files.  For instance, if the website owner wants to make a design change that affects the entire site she/he would need to adjust ALL site pages individually. 

However, as we’ve discussed in previous parts of this tutorial, many of today’s websites are generated dynamically through a combination of special programming language and an associated database that stores website information.  Yet even though many sites no longer use individual files as webpages (i.e., page is created as the user visits the site), the web address of each webpage is still unique.

So what does this mean for search engine marketing?  In general, search engines have had little problem navigating through sites built with static HTML webpages but the same is not true for dynamic sites.  The problem is that dynamic sites present a web address that is based on the scripting program that is being run.  For instance, the URL for a search on Amazon.com for books dealing with “marketing strategy” may produce a URL that looks like this:

http://www.amazon.com/gp/search/ref=br_ss_hs/104-9233860-6656738?platform=gurupa&url=index%3Dblended&keywords=marketing+strategy

Historically, search engines have experienced difficulty in indexing pages with a dynamically generated web address (the reasons are mostly technical in nature and beyond the scope of this tutorial).  And while today’s search engines are much better at indexing these pages, marketers are still encouraged to avoid using dynamically generated URLs especially on important information content pages. 

To correct this problem, marketers should seek website software that allows for the creation of “search engine friendly” (SEF) URLs.  Essentially SEF URLs rewrite the dynamic URL in a form that is more representative of the URL of a static HTML page.  For instance, for KnowThis.com the dynamic web address of this tutorial is:

http://www.knowthis.com/index.php?option=com_content&task=view&id=326&Itemid=646

By using methods of creating SEF URLs, the address for this tutorial takes on the more common HTML look:

http://www.knowthis.com/tutorials/search-engine-marketing/accessible-urls.htm

For web marketers, who are not skilled at handling adjustments to web servers, moving their site to SEF URLs is something that should be done in discussion with their web operations staff or outside web hosting company.

URL Naming

The naming scheme of a URL should not only be presented in a manner that is friendly to search engine crawling activity, but it should also be descriptive of what is contained on the page.  As we will see, descriptive naming of URLs may serve to benefit not only the search engine but also people who are exposed to the URL.

In general, a descriptive URL name reflects the title of the page.  For instance, a page titled “FAQ” may carry the URL name www.knowthis.com/faq.  But what happens if the page is titled “Frequently Asked Questions” and not just “FAQ”?  The URL name could be www.knowthis.com/frequentlyaskedquestions but the lack of separation between the words in “frequently asked questions” presents two problems: one for humans and one for search engines.

The Human Problem

In cases where this URL is visibly displayed as the URL the actual description of the page may not be readily apparent to anyone seeing it in their browser.  For example, as we will see in more detail in a later tutorial, getting sites around the Internet to link back to a site is very important in search engine marketing.  In most cases a webpage’s URL is not visibly displayed in the link but instead is contained as linked text such as KnowThis FAQs.  However, in other cases a site will simply display the full URL such as “Here is a site you should see: www.knowthis.com/frequentlyaskedquestions.”  Clearly, even though the page title is contained within the URL, the fact the words are connected may make it difficult for someone to quickly disseminate the purpose of the page behind the link.

The Search Engine Problem

It is believed that some search engines provide additional weighting to URLs that actually reflect the topic of the page but only if the URL is fully understood.  Unfortunately, run-on words are difficult for search engines to understand.  For instance, consider a page that contains an article possessing the following URL: 

http://www.knowthis.com/mattsmartinidealposition

Without separation the wording may leave open to interpretation the real title of the article.  For instance, the title could actually be one of the following:

  • Matts Mart in Ideal Position – possibly referring to a retail store
  • Matt Smart in Ideal Position – possibly referring to a person named Matt Smart
  • Matts Martini Deal Position – possibly referring to a company named Matt’s Martini

To overcome these problems web marketers should learn to use descriptive URL naming and also to separate words in the URL.  The method for separation is one that is open to much debate in the search engine marketing community, though a “character” separator is probably best.  The options most frequently used for separation are the hyphen (dash) “-” character and underscore “_” .character.  While the debate on which character is best will not be taken up in this tutorial, it is important for web marketers to use some type of separator between words in a URL.  This applies to both individual content naming and to categories/folders in which a content item is contained.

Session Identifiers

Another potential concern with URLs is the use of so called “session identifiers”.  A session identifier is a unique value that some websites assign to each visitor (including search engine robots) when they enter a site and is often appended to a webpage’s URL.  These identifiers are intended to aid in research gathering by allowing the web marketer to track individual visitors as they maneuver through the site. 

However, from a search engine’s point of view the inclusion of a unique session ID within the URL signals to the search engine that a new webpage exists since there is a new, unique URL.  Even though the page content is the same for all visitors to a page, the addition of the session ID to the URL suggests to search engines that a new page is now available on the site and, thus, is indexed as a new page.

Unfortunately, session IDs last only as long as the human or search robot visitor is on the site.  Consequently, every time a search engine returns to re-index a page, the page will not be found since the URL containing a session ID is no longer valid.  This is important because the algorithms used by search engines to determine rankings to a keyword search are much more receptive to webpages where the content is associated with a stable URL.  In this way when the search engine lists the site in a keyword query clicking on the link will actually direct the search engine user to a working page.  This, of course, improves the search engine’s ability to satisfy its users.  For search engines rewarding sites having stable URLs is one way to ensure quality when delivering search results.  They know users do not want to be presented with a result to their search query that when clicked leads to a “page not found” message.

Obviously, web marketers that use session IDs to gain insight into site visitor activity must consider the cost/benefit of dropping session IDs in favor of potentially improved search engine traffic.  The good news is that search engines are improving their ability to recognize session IDs and strip these from a URL without affecting the link.  But this ability is still inconsistent and web marketers seeking improved search engine traffic are advised to consider removing session IDs from their site’s URLs.

Password Protected Content

One final consideration for allowing search engines to access a site is to understand that content that is password protected will not be indexed.  This applies to whether individual content items are protected (e.g., articles) or whether the entire site is only accessible via password.  If the door is closed to the public it is most likely closed to search engines. 

Sites whose business model restricts access to some or all content but still want search engines to locate information, should consider offering non-restricted access to summaries or abstracts of password-protected materials.  For instance, the site can make available to the general public the title of restricted content along with some additional details, such as the first paragraph of written material.  In this way search engines are exposed to a portion of the content which is better than not having access at all.

Internet Marketing
SEM: Site Navigation