Again, yesterday and today we get hits in our server’s log from what looks like a robot with IP address 209.237.230.104
:
209.237.230.104 - - [03/Jan/2006:18:44:19 +0900] "GET /~lenglet HTTP/1.0" 301 324 - 209.237.230.104 - - [03/Jan/2006:18:44:20 +0900] "GET /~lenglet/ HTTP/1.0" 200 31782 - 209.237.230.104 - - [03/Jan/2006:18:44:20 +0900] "GET /atom.xml HTTP/1.0" 404 282 - 209.237.230.104 - - [03/Jan/2006:18:44:21 +0900] "GET /rss.xml HTTP/1.0" 404 281 - 209.237.230.104 - - [03/Jan/2006:18:44:21 +0900] "GET /index.xml HTTP/1.0" 404 283 - 209.237.230.104 - - [04/Jan/2006:16:47:22 +0900] "GET /~lenglet HTTP/1.0" 301 324 - 209.237.230.104 - - [04/Jan/2006:16:47:22 +0900] "GET /~lenglet/ HTTP/1.0" 200 31782 - 209.237.230.104 - - [04/Jan/2006:16:47:23 +0900] "GET /atom.xml HTTP/1.0" 404 282 - 209.237.230.104 - - [04/Jan/2006:16:47:23 +0900] "GET /rss.xml HTTP/1.0" 404 281 - 209.237.230.104 - - [04/Jan/2006:16:47:23 +0900] "GET /index.xml HTTP/1.0" 404 283 -
This looks exactly like the hits I recently got from Feedster’s crappy robot which was looking for RSS feeds from my web page. I had to send an email to Feedster, which they quickly responded to, and they soon stopped hitting our server.
Are they back with a revenge? Or have they sold the code of their buggy robot to someone else? Anyway, 209.237.230.104
is none of their addresses: it belongs to United Layer, an ISP which is probably hosting the robot that generates the hits I observed.
I have addedd yet another entry in my Apache .htaccess
configuration file to deny any access to 209.237.230.104
… When will these people learn how to respect standards, including the Robots Exclusion Standards?!