Ever wondered what the question marks and ampersands are for that you see in many URL’s? The fact is they can serve a lot of uses, and can also cause a lot of SEO headaches if not considered as part of your search engine optimization auditing.
A web address is the location of a physical page on the web. The following is the web address to the Google Finance page. http://www.google.com/finance
If a question mark (?) exists in a URL, then everything from the start of the question mark and onward is called the Query String. A query string is typically made up of name/value pairs, sometimes loosely called parameters.
In this URL, the web address is still the same, but now there is a query string (?catid=66529330) and it contains just one name/value pair. The parameter name is “catid” and the value is “66529330”.
Additional name/value pairs beyond the first set will always be separated with an ampersand (&) in the URL.
In all of the URL examples above, we are actually calling the same web page from Google… the Finance page. In the last two URLs however, we are passing in some extra information that Google uses to determine the type of information to display, and how to display it.
There are a five primary reasons that web developers will choose to use query strings in there URL’s.
Controlling Page Content
When dealing with a lot of content, it’s easier to build one physical web page that serves as a template, and pass in a value in the query string to control the data that fills that template. This is perhaps most common on retail websites. If you think retailers build a web page for every single product or category, think again. Typically retailers have only one category and product web page, but the content on this page changes based on the identifier that is passed into it.
example: http://www.mystore.com/product?productid=12345, http://www.mystore.com/category?id=54321
Controlling Page View State
Many pages need features like sorting, number of items, and pagination. On the product category pages of a retail site, you may wish to sort by price, view sixty items at a time, and see what is on page two.
Tracking Click Paths
Often, if you click “Add To Cart”, “Log In”, or “Register” on a web site and complete the action, the web site will send you back to the page you were last at. Developers will commonly use the URL query string as a mechanism for keeping track of where to send you back to.
Tracking Session State
Occasionally we also still run into the website that chooses to keep track of a particular site visitor by storing the session identifier in the URL as a query string. When you first add an item to a shopping cart, pretty much all web sites have to assign you a unique identifier. Most websites will keep that identifier in a cookie behind the scenes, but some will instead put that identifier in the URL. This is how the sites are able to keep track of the items that are in your shopping cart, without you having to be logged in.
Tracking Marketing Efforts
Most analytics packages have built in support for what are called “campaign codes”. Whether it is a paid search, banner ad, email or other type of online marketing campaign, marketers can pass values using a query string to a web page in order to capture a measure of the effectiveness of that tactic.
The most important SEO concept when dealing with URL’s is that search engines view every unique URL as a different web page by default.
We (humans) know that the following URL’s are the same, but just sorted differently.
Google however doesn’t by default, and if we let Google index all versions, we risk competing with our self in search results, and potentially splitting the link juice across all variations. Yes, the search engines have become better at recognizing these instances on their own, but why risk it. Typically the only types of query strings I like for search engines to index are the ones that just control the page content. Anything that controls view state, session state, click paths or contains tracking codes will cause me to put an SEO effort around keeping them out of search engines.
First, I will usually block the spiders from indexing the trouble URL’s using statements like below in the robots.txt file.
# Variable Excludes
Lately, I’ve also been touting the use of the new domain canonicalization tag that is supported by major search engines as a way to reinforce the proper default URL for a given page. This tag basically tells search engine spiders that “regardless of what URL you used to get to this page, here is the correct version for the search engine index”.
Other things to watch out for:
- Session ID’s in URL’s can cause spider traps that could throw search engines into endless loops and keep them from indexing fresh content on your site.
- Calendars that use the URL query string to keep track of the dates currently being viewed can also cause a spider trap if end dates aren’t specified.
Free tools like Xenu Link Sleuth or Sitemapbuilder.net can help you to identify the cases where query strings are problematic.