Archive for the ‘Search Engines’ Category

The Anatomy of an Automated Search Engine

Monday, June 23rd, 2008

The creation of a large-scale search engine is an onerous task and one that involves huge challenges.

A perfect automated search engine is one that crawls the Web quickly to gather all documents regularly to keep them up-to-date. Plenty of storage space is required to efficiently store indices or the documents themselves.

The magnitude of data that has to be handled on the ever-growing Internet involves billions of queries daily. The indexing system of a search engine should be capable of processing huge amounts of data by using its space most efficiently and handling thousands of queries per second. The best navigation experience should be provided to the users in the form of finding almost anything on the Web, excluding junk results, with the use of high precision tools.

The anatomy of a search engine includes major applications such as those that allow for crawling the Web, indexing, and searching.

Web Crawling

Search engines today depend on spiders or robots (special software) designed to continuously search the Web to find new pages.

Web crawling is the most important aspect of a search engine and is the also most challenging. It involves interaction with thousands of Web servers and name servers. It is performed by many fast, distributed crawlers. They keep getting information regarding lists of URLs they need to crawl and store from a URL server. The crawlers start their travel with the most used servers and highly popular pages. Each crawler keeps hundreds of connections open at one time in order to retrieve Web pages quickly. The crawler has to look up the DNS, connect to the host, send a request and receive a response. It does not rank the Web pages, but retrieves copies of all them and stores them in a repository by compressing them. They’re later indexed and ranked based on different criteria. Everything from the visible text, images, alt tags, other non-HTML content, word processor documents, and more is indexed.

Crawlers usually visit the same Web pages repeatedly to ensure the site is a stable one and that the pages are being updated frequently. If a certain Web page is not functioning at some point, the crawlers are usually programmed to go back later to try again. However, if it is found that the page is either down continuously or not being updated frequently, they stay away for longer periods of time or index it slowly.

Crawlers also have the capability of following all the links found on Web pages, which they can then visit either right away or later. (more…)

Google To Boost Web Applications!

Sunday, April 6th, 2008

Online maps are extremely popular and millions of people use them every day to either find local businesses, to obtain driving directions, to see high-definition aerial images of places or even to check real-time traffic information.

Google, Yahoo, Microsoft and AOL compete with each other to improve their online mapping services, which have become an important part of local search websites.

Google is the unbeaten leader in web computing software delivered over the internet and literally runs inside a browser. However, many browser applications cannot do all those things that the powerful PC-based software can. Google has been trying to close that gap and finally achieved the fruits of its efforts.

Google said that third party developers can now use the programming interfaces to Google Earth, which is their 3-D visualization software. This will enable developers to embed Google Earth on websites.

Google Maps is currently being used by thousands of websites that have created applications, to be able to do various things like pointing the place where a crime has taken place or showing the various apartment rentals in various cities and even showing the path of airplanes in flight. These sites will now be able to improve on those applications with better visualization software from Google Earth. Developers will also be able to make use of Google Earth’s 3-D imaging, to create new applications to run on their sites. These applications will be embedded in the websites and will be accessible through a browser, and they will work even if users have not installed Google Earth on their computers. (more…)

Need Ideas For Your Website? Your Competition Is Your Best Tool!

Monday, February 18th, 2008

If you are looking to start an internet business, the first thing that has to be done is choosing a niche market. We know there is plenty of competition out there for every niche. But not everybody offers the same as you do.

The first thing that has to be done is to get a good idea on whether to pursue a niche market or not. This entails keyword research, which includes coming up with keywords that people are searching for and finding out how much competition there is for that particular keyword.

Initially, it is no good going for generic keywords that have a lot of competition. For example, if you type “Golf,” it is a generic term with millions of sites. If you target your keywords better, such as “Golf balls” or “Travel Golf,” you will find less number of searches but you are aiming at a particular audience in the niche. Remember! It is always better to be a big fish in a small pond than a small fish in a big pond.

After you have chosen your niche, choose a few more keywords specific to your niche. Now time to check on your competition. What are the keywords and phrases they are using? Right click on “view source” and look at the META description tag (you will find it at the top.) Are these keyphrases helpful to you? Narrow down your list. Search on Google with the keyphrases you are planning on using and see if any similar sites are coming up. If they are, you are on the right track. Some of these keyphrases will be used on your site in the content and some will be used as links.

To do well with your website, it is important to be high up in the search engine rankings. Anyone who tells you otherwise is wrong. How many times have you gone to the 3rd page of Google search results when looking for something? Most people click on the first few results only. (more…)

Google

Friday, April 20th, 2007

Google has quickly become the search engine of choice for a vast number of internet browsers, requiring web developers to focus heavily on optimizing page visibility for Google’s crawlers. Ensuring quick access to every file on a site can mean more pages get indexed, allowing for better keyword results. Many developers focus on directory structure to enhance results, a practice that can make most sites, especially those containing hundreds or thousands of pages, easy to manage. Still, a directory must be more than a random organization of folders and files. Good directories improve indexing speed and provide an organizational structure to the site. Consider the following examples when organizing your site. (more…)

Importance of Unique Content for a Website

Friday, March 16th, 2007

If content is king, unique content is the supreme master of the universe. It is important to have relevant content for both search engine optimization and visitors, but having unique, well-written content is crucial to the overall success of the online venture.

What is Unique Content?

Unique content, in its simplest form, is material on a website that is completely different from content anywhere else on the internet. It is unique. The term usually refers to written words on the page, but can apply to other areas, such as charts or graphics, as well.

Unique Content and SEO

As they have changed, search engines are growing much more concerned with unique content. Although nobody knows the exact algorithm for the major search engines, it has become very clear that unique content is rewarded by increased page rank, and duplicate content is punished – sometimes quite severely. Duplicate content is content that is identical to material found on another website.

Some websites that pull feeds from other websites or news services might seem to skirt around the unique content issue, but there is always more to a site than RSS feeds. The more original content on a page, the more highly it is regarded by search engines. If any portion of a website contains materials that are seen elsewhere, the owner can expect the subsequent penalty.
(more…)