?

Log in

No account? Create an account
Client server API without a password - LiveJournal Client Discussions [entries|archive|friends|userinfo]
LiveJournal Client Discussions

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Client server API without a password [Feb. 21st, 2004|12:29 pm]
LiveJournal Client Discussions

lj_clients

[sui66iy]
[mood |quixoticquixotic]
[music |Phorever People-The Shamen-Boss Drum]

It occurred to me this morning that it would be amusing to quickly scan all my friends and pull up the music they've been listening to lately. The "obvious" way to do this is to parse the HTML on my friends page and pull the Music field. But that's brittle, and parsing HTML is generally not fun. So I thought I'd just use the client-server API (I've used the flat API in the past to make backups and build a text index of my journal).

This would work fine, if all I cared about were the music I've been listening to lately, since I know my password. But I want to scan other people's journals. However, it looks like I can't without sending a password. Is there any way to grab this kind of data? It doesn't seem to be in the RSS feed.

Since I'm not interested in accessing private data, I wonder why the API requires a password. Perhaps it's to prevent people from building scanners like this? ;-)
linkReply

Comments:
From: evan
2004-02-21 02:25 pm (UTC)
you'd have to call getevents on each of your friends. that's expensive.
(Reply) (Thread)
[User Picture]From: sui66iy
2004-02-21 05:15 pm (UTC)

Re:

True, though simplicity of code is more important to me than speed of execution for this application. However, it seems to be a moot point, because I can't call getevents on a friend of mine unless I know the friend's password. Or am I mistaken?

(Or are you trying to say that it's expensive for the server, and hence not supported unless the password is known?)
(Reply) (Parent) (Thread)
From: evan
2004-02-21 05:22 pm (UTC)

Re:

Expensive on the server, yeah. But I don't think that was the reason. (Many users would be upset to discover a clietn could pull all of their public entries, even though it'd be just as simple to download a bunch of webpages and clients can do that right now. But many users don't really understand what's going on.)

A simpler way to do it would be to make a custom journal style that puts out your friends page in XML. (You can find them on google somewhere.) Then you can literally just read the metadata off of your friends page.
(Reply) (Parent) (Thread)
[User Picture]From: sprote
2004-02-22 06:35 pm (UTC)

Re:

Journalert does it by fetching your friends page using a custom journal style that generates XML. Then you just run it through an off-the-shelf parser. The only drawback is that only paid accounts are allowed to specify custom styles when fetching pages. Oh, also, it was a bit tricky to get and send the right LJ login cookie to be able to access friends-only posts...
(Reply) (Parent) (Thread)
[User Picture]From: shmuelisms
2004-03-03 11:33 am (UTC)

Care To Share?

I'm just getting started on this topic. I want to create a d/l and index tool for an LJ (I am considering making it an online tool, written in PHP). I see you have already done something like this. So would you mind sharing /giving me some pointers?
(Reply) (Thread)
[User Picture]From: sui66iy
2004-03-03 12:10 pm (UTC)

Re: Care To Share?

Being the lazy guy I am, I used the swish-e full-text index engine to do all the hard work. The remaining chore was just a little scripting to get the data into and out of swish-e. My scripts are all in Python, so hopefully you can read that :)

Needless to say, this is only about an hour of work, but you're welcome to it. There's a tarball here. Included are:

ljlib.py # reading data from LJ and formatting it for use elsewhere
ljswish.py # a wrapper script that outputs text formatted for swish-e
search.py # a CGI script that uses swish-e to search the index and return results
swish/lj.config # a swish-e config file that sets things up so you can index using ljswish.py

There's no documentation. There might be bugs. This code is totally insecure, and will expose your username and password on the internet (consider modifying it to use SSL or the htpassword stuff). The search CGI only returns the top 25 matches. No warranty, your mileage may vary, I am not responsible if it kills your family and eats your cat, etc.

Let me know if you have any questions. (I suggest reading the swish-e docs, of course. They're pretty straightforward.)

The biggest issue with swish-e as a tool is that it doesn't do incremental indexing, so keeping the index totally up-to-date is a challenge. Personally, I don't care, since I'm just searching my own stuff, and if it's more recent than a few days I probably don't need a text search engine to find it. So I just re-index everything periodically and leave it at that.

You might also look at Feedster, which supports searching LJ like so. Of course, it can't see your private entries...
(Reply) (Parent) (Thread)
[User Picture]From: shmuelisms
2004-03-04 11:08 am (UTC)

Thank You

I was actually thinking of creating the "indexer" myself, as I want it customized in various ways, some specific to LJ. But I'll give it a look, and they have interesting resources as well. I will update this community if I get something done with this.

I didn't think it was much work, but web-programming is not my forte', so any pointers are better than none. I'm a Real-Time Geek.
(Reply) (Parent) (Thread)
[User Picture]From: sui66iy
2004-03-04 12:02 pm (UTC)

Re: Thank You

swish-e is open source, so you might consider using it as a starting point. There are bunches of other open source text indexers out there, though. Half an hour with Google is your friend ;-)
(Reply) (Parent) (Thread)