?

Log in

No account? Create an account
Client server API without a password - LiveJournal Client Discussions — LiveJournal [entries|archive|friends|userinfo]
LiveJournal Client Discussions

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Client server API without a password [Feb. 21st, 2004|12:29 pm]
LiveJournal Client Discussions

lj_clients

[sui66iy]
[mood |quixoticquixotic]
[music |Phorever People-The Shamen-Boss Drum]

It occurred to me this morning that it would be amusing to quickly scan all my friends and pull up the music they've been listening to lately. The "obvious" way to do this is to parse the HTML on my friends page and pull the Music field. But that's brittle, and parsing HTML is generally not fun. So I thought I'd just use the client-server API (I've used the flat API in the past to make backups and build a text index of my journal).

This would work fine, if all I cared about were the music I've been listening to lately, since I know my password. But I want to scan other people's journals. However, it looks like I can't without sending a password. Is there any way to grab this kind of data? It doesn't seem to be in the RSS feed.

Since I'm not interested in accessing private data, I wonder why the API requires a password. Perhaps it's to prevent people from building scanners like this? ;-)
linkReply

Comments:
[User Picture]From: sui66iy
2004-03-03 03:10 pm (UTC)

Re: Care To Share?

Being the lazy guy I am, I used the swish-e full-text index engine to do all the hard work. The remaining chore was just a little scripting to get the data into and out of swish-e. My scripts are all in Python, so hopefully you can read that :)

Needless to say, this is only about an hour of work, but you're welcome to it. There's a tarball here. Included are:

ljlib.py # reading data from LJ and formatting it for use elsewhere
ljswish.py # a wrapper script that outputs text formatted for swish-e
search.py # a CGI script that uses swish-e to search the index and return results
swish/lj.config # a swish-e config file that sets things up so you can index using ljswish.py

There's no documentation. There might be bugs. This code is totally insecure, and will expose your username and password on the internet (consider modifying it to use SSL or the htpassword stuff). The search CGI only returns the top 25 matches. No warranty, your mileage may vary, I am not responsible if it kills your family and eats your cat, etc.

Let me know if you have any questions. (I suggest reading the swish-e docs, of course. They're pretty straightforward.)

The biggest issue with swish-e as a tool is that it doesn't do incremental indexing, so keeping the index totally up-to-date is a challenge. Personally, I don't care, since I'm just searching my own stuff, and if it's more recent than a few days I probably don't need a text search engine to find it. So I just re-index everything periodically and leave it at that.

You might also look at Feedster, which supports searching LJ like so. Of course, it can't see your private entries...
(Reply) (Parent) (Thread)
[User Picture]From: shmuelisms
2004-03-04 02:08 pm (UTC)

Thank You

I was actually thinking of creating the "indexer" myself, as I want it customized in various ways, some specific to LJ. But I'll give it a look, and they have interesting resources as well. I will update this community if I get something done with this.

I didn't think it was much work, but web-programming is not my forte', so any pointers are better than none. I'm a Real-Time Geek.
(Reply) (Parent) (Thread)
[User Picture]From: sui66iy
2004-03-04 03:02 pm (UTC)

Re: Thank You

swish-e is open source, so you might consider using it as a starting point. There are bunches of other open source text indexers out there, though. Half an hour with Google is your friend ;-)
(Reply) (Parent) (Thread)