Being the lazy guy I am, I used the swish-e
full-text index engine to do all the hard work. The remaining chore was just a little scripting to get the data into and out of swish-e. My scripts are all in Python, so hopefully you can read that :)
Needless to say, this is only about an hour of work, but you're welcome to it. There's a tarball here
. Included are:
ljlib.py # reading data from LJ and formatting it for use elsewhere
ljswish.py # a wrapper script that outputs text formatted for swish-e
search.py # a CGI script that uses swish-e to search the index and return results
swish/lj.config # a swish-e config file that sets things up so you can index using ljswish.py
There's no documentation. There might be bugs. This code is totally insecure, and will expose your username and password on the internet (consider modifying it to use SSL or the htpassword stuff). The search CGI only returns the top 25 matches. No warranty, your mileage may vary, I am not responsible if it kills your family and eats your cat, etc.
Let me know if you have any questions. (I suggest reading the swish-e docs, of course. They're pretty straightforward.)
The biggest issue with swish-e as a tool is that it doesn't do incremental indexing, so keeping the index totally up-to-date is a challenge. Personally, I don't care, since I'm just searching my own stuff, and if it's more recent than a few days I probably don't need a text search engine to find it. So I just re-index everything periodically and leave it at that.
You might also look at Feedster, which supports searching LJ like so
. Of course, it can't see your private entries...