?

Log in

No account? Create an account
Best Practices Question - LiveJournal Client Discussions [entries|archive|friends|userinfo]
LiveJournal Client Discussions

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Best Practices Question [Aug. 26th, 2004|11:05 am]
LiveJournal Client Discussions

lj_clients

[royhuggins]
My project isn't actually a client but my question is probably well answered by people who work on clients. So...

I have a collaborative storytelling project that interacts with LJ. It does so at a relatively superficial level (using RSS feeds and screen-scraping) and I'm curious if it would be best to use the LJ client API for the functions it performs.

Function 1) The storytelling system supports linking to stories that are posted in the author's LJ. To avoid dead links, it will check the author's journal (via the RSS feed) to make sure that the story is properly posted in his/her journal.

Function 2) When there are signs that a reader has had trouble finding an author's story in his/her LJ, the system will perform a follow-up check to make sure that the story is still posted in the author's journal. For this function, it screen-scrapes the page that exists at the LJ entry's permalink URL (e.g. http://www.livejournal.com/~royhuggins/123456789.html.) The screen-scrape is necessary because the post may be so old that it is no longer in the RSS feed. The guy's on lj_dev pretty clearly stated that this use of screen-scraping is okay. But I'm wondering if there is a better way.

Question: I haven't studied the client API beyond what mentions people have made in this community and in lj_dev. Before diving into the docs, I would like to see if people who work with it think that it would be worth my time. Would my bot be able to use it to perform the above functions? Would it be better to use the API than the methods I described above?

Thanks very much!
linkReply

Comments:
[User Picture]From: marksmith
2004-08-26 07:23 am (UTC)
That should be fine. I would avoid hitting entry pages more than once every day or two. You should cache the results for at least 24 hours, because that saves us having to load old entries repeatedly. :)

There is no way to determine if an entry exists, though. That might be an interesting protocol mode, though, to determine an entry's security, author, number of comments--and therefore, whether it exists or not. Hm.
(Reply) (Thread)
[User Picture]From: royhuggins
2004-08-26 07:29 am (UTC)
I would avoid hitting entry pages more than once every day or two. You should cache the results for at least 24 hours, because that saves us having to load old entries repeatedly. :)

Great feedback, thanks. That shouldn't be a problem. I would hope that the bots, once built to a target level of robustness, would never need to follow-up check a post more than once or twice in it's whole life. I just need to build in some way of recognizing when there is an actual problem with author's posts and when LJ is just going through a period of low responsiveness. :)

I think a protocol for recognizing if a post still exists would be great. :) It could provide a useful heuristic for me, at the very least, so that I don't have to bother with actually requesting the page for a post that is no longer there.
(Reply) (Parent) (Thread)