How do you find spider crawling rules through web logs

for you webmaster, Baidu included is concerned about the most important. Understanding Baidu spider crawling law, so as to better improve the collection of the situation is also must master. Many websites are currently using virtual space and can provide logs. Log refers to the root folder in the web site under the logfiles folder, which dates.Txt text file, there are a lot of introduction, through the HTTP view, return the command that way to view spiders, here is not introduced. More websites now do not provide a log format that can be viewed by software. More like the following log format, as follows:

, 03:28:34, GET, /goods.php,, 20034696, 390

first 03:28:34 access time

The page /goods.php accessed by

second, GET, and get represents the access to

third accesses the source IP

of the web site

fourth 200 successfully access

fifth 34696390 represents the record size

is the format of the log, how to analyze, a look at the head are big. Log every website has more than 1M, thousands of records will not see dizziness.

attention, tell everyone a tip. After long-term observation, found that Baidu’s spider source, server, IP address, is a domain under the following network segment. What do you mean, that is, all of them start with 202.108, and the IP addresses are similar to IP? 202.108.X.X?. The IP address of this network segment is located in Beijing Netcom cable building, belonging to the backbone of the national Internet backbone, and now this section of IP has disappeared. Then, log out of your log and use ctrl+f to find out if there is a IP for this segment. Some words, just look for the time to visit, then you can find out the time Baidu spider access your web site of the law. That is the lever for updates the role of ah.

finally, welcome Paizhuan, absolutely original own experience. Please go down, thanks to

Leave a Reply

Your email address will not be published.Required fields are marked *