?

Log in

No account? Create an account
Not sure if this the right place to post this request... This is… - LiveJournal Client Discussions [entries|archive|friends|userinfo]
LiveJournal Client Discussions

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Sep. 15th, 2004|03:32 pm]
LiveJournal Client Discussions
lj_clients
[djsgc]
Not sure if this the right place to post this request...

This is what I want to do;

Pull the data that shows how many posts, comments, date journal started, last journal entry etc.

Then be able to pull data for everyone on my friends list.

Insert into some kind of graph so I can see highest number of posts comments, oldest journal, redundant journals etc.

Then

Make it useable by some of my LJ friends

The way I think it may be done;

Pull data *somehow* so it can be written into a html file *or similar*

Analyse data using some kind of web log analyser *or similar*

Now I am sure this has already been done, I want to do this firstly for my idle curiosity and secondly as if I can do this then

a> I learn something new

b> I can use my new found knowledge for other data etc

I don't want someone to *tell* me how to do this, just some good advice, starting points, similar code/scripts.

Any advice would be really helpful!

Robbie
linkReply

Comments:
[User Picture]From: boggyb
2004-09-15 12:39 pm (UTC)
I think the only way to do that is to scrape the userinfo page (which Brad & co. may not like). The foaf data might contain this stuff, but you'd have to check it.
(Reply) (Thread)
From: djsgc
2004-09-15 03:05 pm (UTC)
thanks but not sure how this helps...

Maybe I am taking on to big a task...
(Reply) (Parent) (Thread)
From: perplexes
2004-09-15 04:29 pm (UTC)
You're talking about a reasonable amount of work.

Let me clarify what data you would like out of LiveJournal:

- Number of comments posted
- Number of comments received
- Date journal started
- Last journal entry date
- Number of journal entries

Then, you would like to pull these data out of each of your friends.

Lastly, would you like to generate graphs for each friend? All friends in one graph showing each type of data? Both? Neither?

The data that you want to pull sounds a lot like userinfo.bml?mode=full. You'll have to screen scrape it, since I don't think either of LiveJournal's APIs (XML-RPC or flat interface) will give you that data.

This is a process of getting the HTML and parsing it in one of many, many ways. My favorite is to use Tidy to take HTML and generate well-formed XHTML or XML, and then use X-Path expressions to pull out the data you need.

To analyze the data and make graphs, there are many free graphing tools available for any language you're working in. One that I was recently introduced to was JFreeChart which has a pretty simple interface for Java, if you'd like to go that way.

THEN, to have a tool that your friends can use, you'll probably want some sort of interactive website which pulls data off LJ once in a while (or on demand), generates some graphs, and makes the graphs available for viewing. This is another large industry - LJ itself uses PERL for dynamic page generation. Some other ways to go might be PHP, JSP, or ASP.

I think that's about it; three distinct steps to having an alternative web interface to LiveJournal data. Hope that helps.
(Reply) (Thread)
From: perplexes
2004-09-15 04:35 pm (UTC)
Oh, by the way, screen-scraping userinfo.bml is a bad idea since it's still a database intensive process. So, if you do decide to do this, do it very slowly. Maybe over a period of a few hours, and then cache the data for a day or two.
(Reply) (Parent) (Thread)
[User Picture]From: marksmith
2004-09-15 04:40 pm (UTC)
Pull the data that shows how many posts, comments, date journal started, last journal entry etc.

For information like that, see:

http://www.livejournal.com/bots/
http://www.livejournal.com/doc/server/ljp.csp.protocol.html

Then be able to pull data for everyone on my friends list.

This you're not really allowed to do. There's no way to download information for someone else's journal without screen scraping, which is not allowed. We tend to ban clients that we notice are screen scraping.

Good luck, and do keep in mind the rules on clients linked above. (The lj_dev post--read it.)
(Reply) (Thread)
[User Picture]From: mcfnord
2004-09-15 05:22 pm (UTC)
Do you plan to update the bot policy? And do you warn before you ban?
(Reply) (Parent) (Thread)
[User Picture]From: marksmith
2004-09-15 05:54 pm (UTC)
Update what how? The policy has not changed.

We warn if the user-agent is properly formed. Else, we just ban. No way to figure out who's doing it if they're not following the rules. No way that we care to invest time into doing, that is. If people are considerate to us and our site, we'll be considerate to them and theirs.
(Reply) (Parent) (Thread)
[User Picture]From: mcfnord
2004-09-15 06:31 pm (UTC)
The bot policy page says
      You are encouraged to use these resources instead of "screen-scraping" user pages.
You say
      We tend to ban clients that we notice are screen scraping.
and
      You should never have to parse HTML.
Do you believe these are the same policy?
(Reply) (Parent) (Thread)
[User Picture]From: marksmith
2004-09-15 06:38 pm (UTC)
Yep.
(Reply) (Parent) (Thread)
[User Picture]From: mcfnord
2004-09-15 08:49 pm (UTC)
Some day I would like to speak with you on the phone on this issue. I seem to lack a certain context that would be more clear that way.
(Reply) (Parent) (Thread)
From: perplexes
2004-09-17 01:23 am (UTC)
Could you point me to where in the bot or protocol specification it tells how to get userinfo like posts-to-date, number of comments received/given, and date journal started?

Thanks.
(Reply) (Parent) (Thread)