Becoming a Spider In a Few Easy Steps

I know what you’re thinking and no, I cannot help you to launch webs out of your hand, scale skyscrapers, or become Tobey Maguire, I’ll give you an image to make happy though. On the other hand, approaching your site as if you were a spider, bot, crawler or however you choose to call them may actually help you get a better understanding of how your most important visitors view your content.

As said by Google and many in the search marketing community, 2010 is the year to reduce your page loading time. First things first, you must take note of your web speed and any issues that may be present in this area. We cannot think about on-site factors until we make sure that the bots can get on the site. Several factors can lead to very long page load times including, more than a handful of externally referenced CSS files and JavaScript files, an abundance of images and or multimedia usage on site pages. Shoot for page sizes of 100kb but no more than 200kb. This can get tricky sometimes when you want to give visitors a fair representation of your site. Ways to decrease your page sizes include image compression, and compilation of externally referenced files to name a few.  A tool that I am fond of within this topic is the Web Speed Analysis tool. Remember, no one likes excess baggage and your site visitors as well as the crawling search engines will enjoy your site much more with quicker accessibility.

Next up, take a look at your W3C validation. We’ve cleaned up the file sizes and now we have to ensure that the crawling engines can read and assess the information on your site. No one likes rush hour traffic and this is what it is like for a crawling spider that hits stop and go traffic because you did not have clean code. Finding any code validation glitches are done with validation tools such as the Markup Validation Service. This report will tell you all of the code errors you have on a site page which may be a hiccup for a crawling bot. It is also a good practice to utilize a site spidering tool to mimic the crawling of your site much like a search engine spider would. I prefer OptiSpider and enjoy the opportunity to let this tool show me what pages or potential glitches a real bot may encounter when on the site. These tools can also show those forgotten pages on your server that need to be redirected.

In the next step, paint an hour glass on your back, ok, just kidding. We now know that search engines can access the site quickly, understand the code, now we just need to tell them where to go. The biggie here is one of the simplest. Ensure that the robots.txt file does not command search engine to neglect the entire site or major folders/directories/pages of the site. For more information on this I usually point people to the Wikipedia robots exclusion reference.  Now that we now where we don’t want them to go we can focus on where they should go. Every site should include and XML sitemap. This markup language gives information on site content to the most popular spider, “googlebot” in its most appealing format, kind of like spider food. Next, we want to make sure that we have a standard HTML sitemap created which links to all site pages or at least the categorical/higher level directories. It is best practice to link this page from the footer of every site page.

We’ve come a long way, fast loading pages, understandable code, and a pathway to where the spiders should travel. As we pretend to be spiders we must also remember our SEO caps and think about how they traverse the site. As we begin to think about the information architecture and how the site is “webbing” out I suggest you think of a handful or two of your most important pages. These pages of subjects will become the main navigation of the site. From these pages additional information will web out. This shows the search engine spiders a content hierarchy and with sound practices such as identical navigation on all site pages (no alternating links in the main nav), and breadcrumb navigation you can drive this point home even more. Another sound tip is to think about keyword relevance. Visit a link within the copy on your homepage. Assuming you passed remedial SEO you are linking internal site content from a keyword-rich text link. If so, click on the link. Does this new, quickly loaded page feature the previous link text’s keyword in the title element, heading and content? If not, then the homepage’s link text is incorrect, the linked page is incorrect, or your keyword usage on the linked page is incorrect.

This may take quite some time to complete across the site in regard to link text usage but time well sent. You are painting a picture of relevancy to the crawling search engine and the last thing we want to do is make them stop and scratch their heads. Do spiders have heads? Sorry, I don’t know that one.

While we have spread the process of pretending to be a search engine spider across a few steps, it will in some cases take a fair amount of time to ensure you have a truly search engine friendly site. I have never,

I repeat, never come across a site that passes all of these areas with flying colors. Then again, maybe that is a good thing because I like having a job.

Having trouble becoming a spider? Expert SEO consultation is only a click away.