Log in

No account? Create an account
Not sure if this the right place to post this request... This is… - LiveJournal Client Discussions — LiveJournal [entries|archive|friends|userinfo]
LiveJournal Client Discussions

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

[Sep. 15th, 2004|03:32 pm]
LiveJournal Client Discussions
Not sure if this the right place to post this request...

This is what I want to do;

Pull the data that shows how many posts, comments, date journal started, last journal entry etc.

Then be able to pull data for everyone on my friends list.

Insert into some kind of graph so I can see highest number of posts comments, oldest journal, redundant journals etc.


Make it useable by some of my LJ friends

The way I think it may be done;

Pull data *somehow* so it can be written into a html file *or similar*

Analyse data using some kind of web log analyser *or similar*

Now I am sure this has already been done, I want to do this firstly for my idle curiosity and secondly as if I can do this then

a> I learn something new

b> I can use my new found knowledge for other data etc

I don't want someone to *tell* me how to do this, just some good advice, starting points, similar code/scripts.

Any advice would be really helpful!


From: perplexes
2004-09-15 04:29 pm (UTC)
You're talking about a reasonable amount of work.

Let me clarify what data you would like out of LiveJournal:

- Number of comments posted
- Number of comments received
- Date journal started
- Last journal entry date
- Number of journal entries

Then, you would like to pull these data out of each of your friends.

Lastly, would you like to generate graphs for each friend? All friends in one graph showing each type of data? Both? Neither?

The data that you want to pull sounds a lot like userinfo.bml?mode=full. You'll have to screen scrape it, since I don't think either of LiveJournal's APIs (XML-RPC or flat interface) will give you that data.

This is a process of getting the HTML and parsing it in one of many, many ways. My favorite is to use Tidy to take HTML and generate well-formed XHTML or XML, and then use X-Path expressions to pull out the data you need.

To analyze the data and make graphs, there are many free graphing tools available for any language you're working in. One that I was recently introduced to was JFreeChart which has a pretty simple interface for Java, if you'd like to go that way.

THEN, to have a tool that your friends can use, you'll probably want some sort of interactive website which pulls data off LJ once in a while (or on demand), generates some graphs, and makes the graphs available for viewing. This is another large industry - LJ itself uses PERL for dynamic page generation. Some other ways to go might be PHP, JSP, or ASP.

I think that's about it; three distinct steps to having an alternative web interface to LiveJournal data. Hope that helps.
(Reply) (Thread)
From: perplexes
2004-09-15 04:35 pm (UTC)
Oh, by the way, screen-scraping userinfo.bml is a bad idea since it's still a database intensive process. So, if you do decide to do this, do it very slowly. Maybe over a period of a few hours, and then cache the data for a day or two.
(Reply) (Parent) (Thread)