Introduction to Web Crawlers

A search engine such as Google, Yahoo, or Bing is an index of known websites.  Each one has their own method of indexing as well as crawling.  Crawling refers to a program that reads web pages.  Another term for a web crawler is a bot.  These bots read millions of pages each day and send information about that page to be categorized with the rest of the pages out there. The bots look at a lot of factors to determine what the page is about. Then there is a completely different set of information the search engine uses to determine where in their search results the site will be. No one knows the exact formula the search engines use but we have a pretty good idea of what it takes to show up at the top of the search results. Most of my information will apply to Google since 70-80% of all searches are on Google.

Anatomy of a Web Page

There are a few elements of a page you need to consider.

  1. Title: The title of the page is what shows up on Google’s search results. If the title of your page is “Home” or “About Us” or “My Blog” You are missing a big piece of the puzzle.  The title of each page needs to contain keywords that will tell Google what the page is about.
  2. Filename: or URL, also helps Google categorize your page but not as important. If you can, name each file or blog post with descriptive keywords.
  3. Headings: these are section titles. If you have a longer page you should divide it up into sections with descriptive titles these titles should have h1, h2, h3, …  tags.  Text within these tags carry more weight than other text.
  4. Text: the rest of the text should be relevant content shared in a natural way. Don’t try to force keywords into your text.  You don’t want to negatively effective the readers experience in an effort to get more traffic. That is what we call counter productive.

Proof read each post, page you write with two audiences in mind; your readers and search engines.  Make it interesting for readers and clear and straight forward for the search engines to categorize your page.

Google PageRank

Now that Google has categorized your site, blog post or web page it now needs to decide where in the search results you will be. The first element is relevance and the second is credibility. I could have the best article on search engines but Google won’t place me in their results unless they think my site is important. Google has provided a tool to help webmasters know how important Google thinks they are. This tool is PageRank. PageRank is a number between 0-10. Sites with a PageRank of 10 or most important and 0 are least important. In simplistic terms Google will put the most relevant page up with the highest PageRank. To increase PageRank you need sites with high PageRank to link to your site. You can download Google Toolbar and enable PageRank which will show the PageRank of every page you visit. In future posts we will discuss techniques to get sites to link to you.

Search Engine Optimization

This is the act of improving a site so that it shows up higher is search results. The first step is to identify target keywords. Usually you can start with about 20 keywords to target. Second, you identify pages of your site and new pages to add and each page should target 1-3 keyword phrases. The Final step is get other sites to link to the pages. It is a long process to get to the top and, depending on the keywords, there is a lot of competition and money being poured into SEO so if you have little or no budget it takes a lot of patience, persistence and creativity to get to the top. Imagine if you wrote 3-5 articles per week to post on your blog. After one year you would have 120-360 unique pages and after 3 years you would have 500-1000 pages and if you can be that persistent there is no doubt that your audience will find you, your writing will improve and you will have found success. There is never a better time to start than right now.


2 Comments

Miles Pomeroy · August 25, 2009 at 8:03 pm

The following guide, put out by Google, provides a lot of hints including—and in addition to—those you mention.

http://www.google.com/webmasters/docs/search-engine-optimization-starter-guide.pdf

Micah · August 25, 2009 at 10:09 pm

That is a great resource. Another good place for up to date information is the blog of matt cutts. http://www.mattcutts.com/blog/ There are a lot of sites out there that will try and share tricks and yes there are tricks but the best way to get traffic is to provide valuable content that people are looking for.

Comments are closed.