The basic principle and process of search engine of Shanghai White Dragon

10 Sep 2017 admin

The new page

3. spiders crawl, but now the page has been deleted.

2. spider have been, but there are changes to the page content page.

1. has never been caught spiders.

What is the most important The

search engine? Some people would say that is the accuracy of the search results, some people will say that the richness of query results, but these are not the most deadly local search engine. For the search engines, the most deadly is the query time. Just imagine, if you query a keyword in Shanghai love interface, it takes 5 minutes to put your query results back to you, it must be you soon abandoned love Shanghai.

A

for different search engines will differ from its starting point for the capture, love Shanghai, Mr.Zhao is more inclined to the latter. In a way the index page link completion mechanism of love Shanghai official blog published "(address: 贵族宝贝stblog.baidu-tech贵族宝贝/? P=2057) in the paper, it clearly pointed out that" spider "will try to detect the release cycle, with reasonable frequency to" check ", from which we can infer that in the index library love in Shanghai. >

every webmaster as long as your site is not serious drop right, then through the website backstage server, you can find diligent spider visit your site, but you have not thought about from the programming point of view, the spider is how to do? For this, the parties have different opinions. There is a saying that the spider is from seed station (or high weight station), from high to low according to the weight of the layer. Another version of spider crawling in the URL collection is no obvious sequence, the search engine will be based on your website content update rules, automatically calculate the best time is when you climb the site, and then grab.

In fact,

page collection, spiders crawl the web is what we used to say. Then the spider (called noble baby robot), they were interested in the page is divided into three categories:

so how effective the discovery of the three page and crawl, is the original intention and purpose of spider program design. Then there is a problem related to the starting point of a spider crawling.

page gathering.

search engine in order to meet the requirement of speed demanding (now query time unit commercial search engines are the number of microsecond level), so the cache supports query needs, that is to say we have received in the query search results are not timely, but the server cache has good results. Then the general process of search engines is what look like? We can understand three segment. This is just in front of the three period of work flow in the explanation and review some detailed technical details will be used separately to explain other articles.

Leave a Reply

Your email address will not be published. Required fields are marked *