<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rdf:RDF [
<!ENTITY % HTMLlat1 PUBLIC
 "-//W3C//ENTITIES Latin 1 for XHTML//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent">
]>
<rdf:RDF
 xmlns="http://purl.org/rss/1.0/"
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:dc="http://purl.org/dc/elements/1.1/"
 xmlns:content="http://purl.org/rss/1.0/modules/content/"
 xmlns:admin="http://webns.net/mvcb/"
>
<channel rdf:about="http://www.berabera.info/oldblog/lenglet">
<title>Web</title>
<link>http://www.berabera.info/oldblog/lenglet/archives/cat_1/index.html</link>
<description>technical articles, scientific publications, personal stuff, etc.</description>
<dc:language>en-us</dc:language>
<dc:creator>Romain Lenglet</dc:creator>
<dc:date>2007-05-09T13:32:14+09:00</dc:date>
<admin:generatorAgent rdf:resource="http://nanoblogger.sourceforge.net" />
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2007/05/index.html#e2007-05-09T11_47_54.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2006/04/index.html#e2006-04-11T18_25_12.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2006/01/index.html#e2006-01-04T17_42_13.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-12T17_10_41.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_57_18.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_18_58.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-02T19_05_29.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2005/11/index.html#e2005-11-07T12_32_33.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2005/05/index.html#e2005-05-19T12_37_51.txt" />
<rdf:li rdf:resource="http://www.berabera.info/oldblog/lenglet/archives/2005/05/index.html#e2005-05-17T19_37_36.txt" />
</rdf:Seq>
</items>
</channel>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2007/05/index.html#e2007-05-09T11_47_54.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2007/05/index.html#e2007-05-09T11_47_54.txt</link>
<title>New blog</title>
<dc:date>2007-05-09T11:47:54+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
Since I changed my job in December 2006, I have finally moved this blog to my new personal server, but only for archival purposes.
I will no more update this blog.
Please visit my new website and blog at <a href="http://www.berabera.info/">http://www.berabera.info/</a>.
I hope that you will enjoy the new design as much as I do!
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2006/04/index.html#e2006-04-11T18_25_12.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2006/04/index.html#e2006-04-11T18_25_12.txt</link>
<title>Collection of reusable web designs</title>
<dc:date>2006-04-11T18:25:12+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
I just found the <a href="http://www.oswd.org/">Open Source Web Design (OSWD) web site</a>, which offers a collection of reusable web designs contributed by many authors.
They currently have 1620 web designs, and all those I have seen are very nice!
</p>
<p>
My only concern is about licensing.
<a href="http://www.oswd.org/site/usage/">As stated on the web site</a>, the choice of a license if let to each author.
Some designs may therefore not be <a href="http://www.gnu.org/philosophy/philosophy.html">free “as in freedom”</a> or <a href="http://www.opensource.org/docs/definition.php">open source</a>, although all must be free of charge (free “as in beer”).
So, be careful when chosing a design!
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2006/01/index.html#e2006-01-04T17_42_13.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2006/01/index.html#e2006-01-04T17_42_13.txt</link>
<title>Here is a feed crawler again...</title>
<dc:date>2006-01-04T17:42:13+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
Again, yesterday and today we get hits in our server's log from what looks like a robot with IP address <code>209.237.230.104</code>:
</p>
<pre>
209.237.230.104 - - [03/Jan/2006:18:44:19 +0900] "GET /~lenglet HTTP/1.0" 301 324 -
209.237.230.104 - - [03/Jan/2006:18:44:20 +0900] "GET /~lenglet/ HTTP/1.0" 200 31782 -
209.237.230.104 - - [03/Jan/2006:18:44:20 +0900] "GET /atom.xml HTTP/1.0" 404 282 -
209.237.230.104 - - [03/Jan/2006:18:44:21 +0900] "GET /rss.xml HTTP/1.0" 404 281 -
209.237.230.104 - - [03/Jan/2006:18:44:21 +0900] "GET /index.xml HTTP/1.0" 404 283 -
209.237.230.104 - - [04/Jan/2006:16:47:22 +0900] "GET /~lenglet HTTP/1.0" 301 324 -
209.237.230.104 - - [04/Jan/2006:16:47:22 +0900] "GET /~lenglet/ HTTP/1.0" 200 31782 -
209.237.230.104 - - [04/Jan/2006:16:47:23 +0900] "GET /atom.xml HTTP/1.0" 404 282 -
209.237.230.104 - - [04/Jan/2006:16:47:23 +0900] "GET /rss.xml HTTP/1.0" 404 281 -
209.237.230.104 - - [04/Jan/2006:16:47:23 +0900] "GET /index.xml HTTP/1.0" 404 283 -
</pre>
<p>
This looks exactly like <a href="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_18_58.txt">the hits I recently got from Feedster's crappy robot</a> which was looking for <acronym title="Really Simple Syndication">RSS</acronym> feeds from my web page.
I had to send an email to Feedster, which they quickly responded to, and they soon stopped hitting our server.
</p>
<p>
Are they back with a revenge?
Or have they sold the code of their buggy robot to someone else?
Anyway, <code>209.237.230.104</code> is none of their addresses: it belongs to <a href="http://www.unitedlayer.com/">United Layer</a>, an ISP which is probably hosting the robot that generates the hits I observed.
</p>
<p>
I have addedd yet another entry in my Apache <code>.htaccess</code> configuration file to deny any access to <code>209.237.230.104</code>...
When will these people learn how to respect standards, including the <a href="http://www.robotstxt.org/">Robots Exclusion Standards</a>?!
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-12T17_10_41.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-12T17_10_41.txt</link>
<title>Ooops... typo</title>
<dc:date>2005-12-12T17:10:41+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
I just found out that the description of my web page (“technical articles, scientific publications, personnal stuff, etc.”) had a typo since last may: “personnal” should be written “personal”.
What a shame... ^_^
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_57_18.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_57_18.txt</link>
<title>Registered to Feedster</title>
<dc:date>2005-12-08T15:57:18+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
Although <a href="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_18_58.txt">I have experienced problems with Feedster's robot</a>, I have decided to give <a href="http://www.feedster.com/">Feedster</a> a chance and I have registered on their web site, although I prefer using a standalone news feed aggregator such as <a href="http://akregator.sourceforge.net/">KDE's Akregator</a> than a web-based aggregator such as Feedster.
</p>
<p>
<a href="http://feedster.com/claimfeed.php?key=0f1d23e7ead49b6f03a266d867602548">Here</a> is the special cryptic link that they asked me to put into my blog to be able to “claim” control of my weblog RSS feed in the Feedster search engine (do not click that link: it is not meant to be clicked).
</p>
<p>
By the way, the web interface for user registration is also buggy and unusable because <a href="http://feedster.com/register.php">the registration form</a>, when submitted, points to <a href="http://feedster.net/register.php">a nonexistent page</a>.
I guessed that only the host name of the latter URL was wrong, and should be <code>feedster.com</code> instead of <code>feedster.net</code>.
I have therefore simply configured my system to resolve the <code>feedster.net</code> host name to the same IP address as <code>feedster.com</code>, so that the form points to <code>feedster.com</code> instead: this made the registration form work nice!
To make my web browser access <code>feedster.com</code> instead of <code>feedster.net</code>, I have simply added the following line to my <code>/etc/hosts</code> file in my Linux system (<code>64.95.116.10</code> is the IP address of <code>feedster.com</code>):
</p>
<pre>
64.95.116.10    feedster.net
</pre>
<p>
In addition, forms are present twice in <a href="http://feedster.com/login.php">the login page</a>, for no reason.
</p>
<p>
Those are strong signs of “web site rot”...
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_18_58.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-08T15_18_58.txt</link>
<title>Problems with Feedster's robot</title>
<dc:date>2005-12-08T15:18:58+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
Starting from yesterday, I have found many hits in our web server's logs (four hits every 30 minutes) from a robot at IP address <code>64.95.116.1</code>.
According to <code>whois(1)</code>, this address belongs to "Feedster".
This is how I have discovered the existence of the <a href="http://www.feedster.com/">Feedster</a> blog search engine...
</p>
<p>
<strong>Special note to the person who registered my web page into Feedster yesterday:</strong> this is nice to you, but you should have better registered my real RSS feed URL (at <a href="http://www.berabera.info/oldblog/lenglet/rss.xml">http://www.csg.is.titech.ac.jp/~lenglet/rss.xml</a>) instead of my web page (at <a href="http://www.berabera.info/oldblog/lenglet/">http://www.csg.is.titech.ac.jp/~lenglet/</a>), because the result is a lot of hits to unexistent URLs from the <em>dumb</em> Feedster robot, cf. an extract of our web server's logs:
</p>
<pre>
...
64.95.116.1 - - [08/Dec/2005:12:49:33 +0900] "GET /~lenglet HTTP/1.1" 301 336 -
64.95.116.1 - - [08/Dec/2005:12:49:33 +0900] "GET /~lenglet/ HTTP/1.1" 200 27650 -
64.95.116.1 - - [08/Dec/2005:12:49:34 +0900] "GET /atom.xml HTTP/1.1" 404 294 -
64.95.116.1 - - [08/Dec/2005:12:49:34 +0900] "GET /index.xml HTTP/1.1" 404 295 -
64.95.116.1 - - [08/Dec/2005:12:49:34 +0900] "GET /rss.xml HTTP/1.1" 404 293 -
64.95.116.1 - - [08/Dec/2005:13:17:48 +0900] "GET /~lenglet HTTP/1.1" 301 336 -
64.95.116.1 - - [08/Dec/2005:13:17:48 +0900] "GET /~lenglet/ HTTP/1.1" 200 27650 -
64.95.116.1 - - [08/Dec/2005:13:17:49 +0900] "GET /atom.xml HTTP/1.1" 403 298 -
64.95.116.1 - - [08/Dec/2005:13:17:49 +0900] "GET /index.xml HTTP/1.1" 403 299 -
64.95.116.1 - - [08/Dec/2005:13:17:49 +0900] "GET /rss.xml HTTP/1.1" 403 297 -
...
</pre>
<p>
So if you could correct the URL of my feed in your Feedster account, or do anything to stop those wrong accesses, it would be very nice, thanks.
</p>
<p>
Here is why I say above that Feedster's robot is <em>dumb</em>:
</p>
<ol>
<li>It does not respect the <a href="http://www.robotstxt.org/">Robots Exclusion Standards</a>, which consists for web robots such as Feedster's to access a file named <code>robots.txt</code> on every accessed web server to check if its accesses are welcome. Not only Feedster's robot does not respect this standard, which is <strong>disrespectful</strong>, but also it accesses feeds every 30 minutes, which I consider excessive.</li>
<li>It seems to incorrectly interpret <code>&lt;link rel="alternate".../&gt;</code> elements in HTML page headers. For instance, <a href="http://www.berabera.info/oldblog/lenglet/">my XHTML web page</a>, which has been accessed every 30 minutes by Feedster's robot, contains such elements in its header which seem to be incorrectly interpreted by the robot.
This leads to accesses to unexistent URLs (as shown in the web logs above with the <code>403</code> HTTP error codes): it should have accessed <a href="http://www.berabera.info/oldblog/lenglet/atom.xml"><code>/~lenglet/atom.xml</code></a> instead of <code>/atom.xml</code>, etc.</li>
</ol>
<p>
For information, here are the <code>&lt;link rel="alternate".../&gt;</code> elements in my web page headers:
</p>
<pre>
&lt;link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="./atom.xml"&gt;
&lt;link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="./rss.xml"&gt;
&lt;link rel="alternate" type="application/rss+xml" title="RSS 1.0" href="./index.xml"&gt;
</pre>
<p>
I am certain that accesses by Feedster's robot to <code>/atom.xml</code>, <code>/index.xml</code> and <code>/rss.xml</code> are due to its interpretation of those <code>&lt;link rel="alternate".../&gt;</code> elements, because since I have denied any access to my web page to the robot it does no more try to access <code>/atom.xml</code>, <code>/index.xml</code> and <code>/rss.xml</code>.
Here are the lines that I have added into my root <code>.htaccess</code> file, to specifically deny access to my web page to Feedster's robot:
</p>
<pre>
&lt;Limit GET&gt;
order allow,deny
deny from 64.95.116.1
allow from all
&lt;/Limit&gt;
</pre>
<p>
Now, I still get accesses from Feedster's robot every 30 minutes, but those accesses are now denied and I am now getting those lines in our web server's logs:
</p>
<pre>
...
64.95.116.1 - - [08/Dec/2005:14:01:02 +0900] "GET /~lenglet HTTP/1.1" 403 298 -
64.95.116.1 - - [08/Dec/2005:14:20:14 +0900] "GET /~lenglet HTTP/1.1" 403 298 -
...
</pre>
<p>
When they will have corrected my feed's URL, I will probably re-enable access for that robot, but still they should correct their robot implementation...
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-02T19_05_29.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2005/12/index.html#e2005-12-02T19_05_29.txt</link>
<title>Secrets of good hypertext</title>
<dc:date>2005-12-02T19:05:29+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
The <a href="http://www.artlebedev.com/">Art. Lebedev Studio</a> are the designers of the wonderful <a href="http://www.artlebedev.com/portfolio/optimus/">Optimus keyboard</a> and <a href="http://www.artlebedev.com/portfolio/mus/">Mus computer mouse</a>, among other things.
</p>
<p>
In addition, they publish many interesting in-depth <a href="http://www.artlebedev.com/mandership/">articles about design</a> on their web site, some articles concerning web design.
Article number 83, titled <a href="http://www.artlebedev.com/mandership/83/">“Secrets of good hypertext”</a>, is about how to put good hyperlinks in web pages.
</p>
<p>
Today, I have tried to polish the hyperlinks in my blogs's articles according to this article, but this still requires some work.
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2005/11/index.html#e2005-11-07T12_32_33.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2005/11/index.html#e2005-11-07T12_32_33.txt</link>
<title>How to disable a plugin in NanoBlogger in Debian GNU/Linux</title>
<dc:date>2005-11-07T12:32:33+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web, Debian GNU/Linux</dc:subject>
<description><![CDATA[<p>
To disable a plugin in NanoBlogger version 3.2 one must rename the plugin script, e.g. <code>tidy.sh</code>, to modify its extension (by convention into <code>.off</code>).
</p>
<p>
In the <code>nanoblogger</code> Debian GNU/Linux package, plugins are installed in the <code>/usr/share/nanoblogger/plugins/</code> directory.
In order to enable/disable plugins, it is therefore necessary to rename files in that directory, e.g. for the <code>tidy.sh</code> plugin:
</p>
<pre>
sudo mv /usr/share/nanoblogger/plugins/makepage/tidy.sh \
        /usr/share/nanoblogger/plugins/makepage/tidy.off
</pre>
<p>
The problem with that method is that when the <code>nanoblogger</code> package is updated, the <code>tidy.sh</code> file is reinstalled in <code>/usr/share/nanoblogger/plugins/makepage/</code> along with the <code>tidy.off</code> file, which re-enables the plugin.
To make the renaming permanent accross package updates, one must use Debian's diversion mechanism:
</p>
<pre>
sudo dpkg-divert --divert /usr/share/nanoblogger/plugins/makepage/tidy.off \
    --rename --add /usr/share/nanoblogger/plugins/makepage/tidy.sh
</pre>
<p>
This renames the <code>tidy.sh</code> file into <code>tidy.off</code>, and in all subsequent updates of the package the new <code>tidy.sh</code> files will also be renamed automatically.
</p>
<p>
To re-enable the plugin, simply remove the diversion:
</p>
<pre>
sudo dpkg-divert --divert /usr/share/nanoblogger/plugins/makepage/tidy.off \
    --rename --remove /usr/share/nanoblogger/plugins/makepage/tidy.sh
</pre>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2005/05/index.html#e2005-05-19T12_37_51.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2005/05/index.html#e2005-05-19T12_37_51.txt</link>
<title>Script for publishing BibTeX bibliographies as NanoBlogger articles</title>
<dc:date>2005-05-19T12:37:51+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
I have written <a href="http://www.berabera.info/oldblog/lenglet/nanoblogger/bibtex2article.sh">a script in Bourne Shell (<code>bibtex2article.sh</code>)</a>, available under the GNU GPL license, to publish sets of bibliographic entries as NanoBlogger articles from .bib BibTeX files.
It generates only XHTML1.1-compliant UTF-8-encoded bibliography articles.
This script is based on <a href="http://www.lri.fr/~filliatr/bibtex2html/"><code>bibtex2html</code></a>, <a href="http://www.gnu.org/software/libiconv/"><code>iconv</code></a> and <code>sed</code>.
</p>
<p>
I use it for my own bibliography, cf.:
<a href="http://www.berabera.info/oldblog/lenglet/publications/publications.html">the summary</a>,
<a href="http://www.berabera.info/oldblog/lenglet/publications/publications_abstracts.html">the detailed view with abstracts</a>,
and <a href="http://www.berabera.info/oldblog/lenglet/publications/publications_bib.html">the pretty-printed Bibtex file</a>.
</p>]]></description>
</item>
<item rdf:about="http://www.berabera.info/oldblog/lenglet/archives/2005/05/index.html#e2005-05-17T19_37_36.txt">
<link>http://www.berabera.info/oldblog/lenglet/archives/2005/05/index.html#e2005-05-17T19_37_36.txt</link>
<title>Switching to NanoBlogger</title>
<dc:date>2005-05-17T19:37:36+09:00</dc:date>
<dc:creator>Romain Lenglet</dc:creator>
<dc:subject>Web</dc:subject>
<description><![CDATA[<p>
I have switched to <a href="http://nanoblogger.sourceforge.net/">NanoBlogger</a> for my web page.
It is a very lightweight content management system implemented in Bourne Shell.
</p>]]></description>
</item>
</rdf:RDF>

