August 30th, 2007

  • snej

if you want bots to cache, make the resources cacheable!

LiveJournal's bot policy page says:

"You are encouraged to cache the results of your bot's requests, which saves us bandwidth and CPU time. Bots making repeated requests on the same resource (URL) in a short amount of time will be blocked."

However, the HTTP responses LiveJournal sends for FOAF data are, per the heuristics in RFC 2616, nearly uncacheable. They don't include Last-Modified or Expires headers. There's sort of a cognitive disconnect going on here.
$ curl -I
HTTP/1.0 200 OK
Date: Thu, 30 Aug 2007 22:27:28 GMT
Server: Apache
Cache-Control: private, proxy-revalidate
Vary: Accept-Encoding
Content-length: 45098
Keep-Alive: timeout=30, max=100
Connection: keep-alive
Content-Type: application/rdf+xml; charset=utf-8

Clearly a custom application can cache the data however it wants. But it would be a lot more convenient if we could take advantage of HTTP-level caching support in web client frameworks. I've just spent much of the day struggling with such a framework, trying in vain to convince it to cache FOAF resources so I didn't have to re-invent the wheel. :-/