September 18, 2008

Googlebot

Look at this crap:

(In order to deuglify the front page, I'm putting this below the fold.)

- -> /Chizumatic/cref.shtml/Chizumatic/else/reviews/else/nitpicks/nitpicks/tmw/KaleidoStar.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/translation/else/nitpicks/tmw/tmw/reviews/NinjaNonsense.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/else/reviews/translation/reviews/nitpicks/else/Mangas.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/else/tmw/reviews/tmw/nitpicks/reviews/Grenadier.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/else/reviews/translation/nitpicks/nitpicks/reviews/Vandread.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/else/reviews/translation/nitpicks/nitpicks/tmw/ALDVD06.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/else/tmw/reviews/reviews/nitpicks/tmw/DefenseOfSae.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/translation/nitpicks/tmw/reviews/translation/reviews/Colorful.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/translation/nitpicks/reviews/else/translation/tmw/Linux.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/else/reviews/translation/translation/nitpicks/tmw/Mahoromatic.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/tmw/reviews/nitpicks/tmw/translation/reviews/MiyukiChan.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/tmw/translation/translation/reviews/else/reviews/HappyLesson.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/translation/else/tmw/nitpicks/translation/reviews/Magikano.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/reviews/else/tmw/translation/tmw/reviews/Colorful.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/translation/else/reviews/reviews/nitpicks/else/quicktakes.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/tmw/nitpicks/reviews/translation/reviews/TwelveKingdoms1.shtml 66.249.70.167
- -> /Chizumatic/cref.shtml/Chizumatic/translation/nitpicks/reviews/else/translation/reviews/Shana.shtml 66.249.70.167

Damned if I know how the googlebot got onto that kind of a URL on my server, but the result is that it's stuck in a recursion loop. The way my server is set up, every one of those is returning the "cref.shtml" file, which contains all sorts of links. The Googlebot has been trying to recursively search this bogus sub-structure for three hours now, and finally I got fed up and blocked it in my firewall.

In a couple of days I'll go in and unblock it again, and see if it is still obnoxious.

Posted by: Steven Den Beste in Site Stuff at 04:51 PM | Comments (3) | Add Comment
Post contains 182 words, total size 3 kb.

1 Sounds like that page should be specifically called out in your /robots.txt file. I know some search engines only pay attention to the first matching entry in robots.txt, so if it's already there, that could be why it still gets searched.

Yahoo once tried to index, in parallel, every recipe in my cookbook search engine, which wrecked my machine for most of a day, and it was because they only honored the first block in robots.txt.

-j

Posted by: J Greely at September 18, 2008 05:36 PM (9Nz6c)

2

Except that I want Google to see "/Chizumatic/cref.shtml" and everything beneath it, at least everything that's really beneath it. That's the index of my reviews. The problem here is that they've somehow gotten onto the idea that "cref.shtml" is a directory name, not a readable file.

Apache on my machine is ignoring everything after the second term of that URL, and returning the cref file every time.

Posted by: Steven Den Beste at September 18, 2008 05:54 PM (+rSRq)

3 I believe you'll get the right results if you include the trailing slash, e.g. "/Chizumatic/cref.shtml/", but you might want to test it with Google's webmaster tools.

(side note: the "Allow: /" in your robots.txt seems odd; Google reads Allow before Disallow, so it should be a no-op, but the behavior doesn't seem to be defined)

-j

Posted by: J Greely at September 18, 2008 09:24 PM (2XtN5)

Hide Comments | Add Comment

Enclose all spoilers in spoiler tags:
      [spoiler]your spoiler here[/spoiler]
Spoilers which are not properly tagged will be ruthlessly deleted on sight.
Also, I hate unsolicited suggestions and advice. (Even when you think you're being funny.)

At Chizumatic, we take pride in being incomplete, incorrect, inconsistent, and unfair. We do all of them deliberately.

How to put links in your comment

Comments are disabled. Post is locked.
8kb generated in CPU 0.008, elapsed 0.0168 seconds.
20 queries taking 0.0114 seconds, 20 records returned.
Powered by Minx 1.1.6c-pink.