How Google Finds Web Pages According to Search Query ?
Crawling is a process by which Google bot discovers new and updated pages on the web ( or crawl) billions of pages on the web . the program that does the fetching is called Google bot( also known as Robot , Bot or Spider) Google bot uses an algorithmic process : Computer Program determine which sites to crawl , how often, and how many pages to fetch from each site .
Google Crawl process pages with a list of web pages URLs , generated from previous crawl process and augmented with site map data provided by webmasters as Google bot visit each of these websites it detects links on each page and adds them to it’s list of pages to crawl. new sites , changes to existing sites and dead links are noted and used to update the Google Index . Google doesn’t accept payment to crawl a site more frequently and Google keep the search side of it’s business separate from it is revenue -generating Adwords service.
Google offer webmaster tools to give site owners granular choices about how Google crawls their site: they can provide detailed instructions about how to process pages on their sites, can request a recrawl or can opt out of crawling altogether using a file called “robots.txt”. Google never accepts payment to crawl a site more frequently — we provide the same tools to all websites to ensure the best possible results for our users.
Finding information by crawling
The web is like an ever-growing library with billions of books and no central filing system. We use software known as web crawlers to discover publicly available webpages. Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers.
Google bot processes each of the pages it crawl in order to compile a massive index of all the words it sees and their location on each page . in addition Google processes information included in key content tags and attributes . Google bot can process many but not all , content type for example Google cannot processes the content of same rich media files or dynamic pages.
Organizing information by indexing
When crawlers find a webpage, our systems render the content of the page, just as a browser does. Google take note of key signals — from keywords to website freshness — and we keep track of it all in the Search index.
The Google Search index contains hundreds of billions of webpages and is well over 100,000,000 gigabytes in size. It’s like the index in the back of a book — with an entry for every word seen on every web page we index. When we index a web page, we add it to the entries for all of the words it contains.
With the Knowledge Graph, Google continuing to go beyond keyword matching to better understand the people, places and things you care about. To do this, Google not only organize information about webpages but other types of information too. Today, Google Search can help you search text from millions of books from major libraries, find travel times from your local public transit agency, or help you navigate data from public sources like the World Bank.
3. Serving Results
When a user enter a query , Google’s machine search the index for matching pages and returns the results. Google believe are the most relevant to the user . relevancy is determined by over 200 fetches , one of which is Page Rank for a given page , pagerank is the measure of the importance of a page based on the incoming links from the other pages,. In simple terms , each links to a page on your site from another site adds to your sites pagerank. not all links are equal. Google works hard to improve the user experience by identifying impact and other practice that negatively impact search results. the best types of links are those that are based on the quality of your content.
In order for your site to rank well in search results pages . it’s important to make sure that Google can crawl and index your site correctly by following Google’s webmaster guideline you can improve page rank of your site.
Google’s related searches , spelling suggestion and Google suggest features are designed to help users save time by displaying related terms, common mis spelling and popular quires the key words used by these features are automatically generated by Google’s web crawler and search algorithms . if a site ranks well for keywords , it’s because Google have algorithmically determined that it’s content is more relevant the user query.
To see indexed pages in your site , use the site operator , like this site : Google.com.( note do not use space between the operator and google.com the URL) you can perform the search on a whole domain or limit it to a certain sub domain or sub directory for example site:google.com/webmaster to exclude pages from your search , use i minus sign before the operator for example the search
site:google.com-site :adwords.google.com gives you all the indexed pages on the Google .com domain without the pages from adwords .googler.com . the catch operator shows you an achieved copy of a page indexed by Google for example cache:google.com displays the last indexed version of the Google homepage , along with the information about the date the cache was created . you can also view a plain text version of the page . the is is useful because it shows how Google bot sees the page .