Log in

No account? Create an account
LiveJournal Client Discussions [entries|archive|friends|userinfo]
LiveJournal Client Discussions

[ userinfo | livejournal userinfo ]
[ archive | journal archive ]

Database/PHP efficiency? [Aug. 4th, 2006|04:40 pm]
LiveJournal Client Discussions


I'm writing, essentially, a php-based RSS aggregator. The program downloads the RSS feeds for each of my friends and stores them in a database.

The PHP script then takes each of the RSS feeds, parses them, orders them, and spits them back on to the screen.

Should I just store the entire RSS feed in a db text field and use PHP to sort them (using PHP's array sorts?), or should I have a separate table where each record is an individual post from the feed, and then use mysql to sort them? It just seems like that is an extra table that can get very large (tens of thousands of rows) and add an unnecessary layer of complexity to the database?

Which is better/more efficient?

[User Picture]From: ibneko
2006-08-04 11:46 pm (UTC)
[whispers]You might get a better response if you ask over at php?[/endwhispers]
(Reply) (Thread)
[User Picture]From: gargoylemusic
2006-08-05 12:51 am (UTC)
I had no idea that was there. thank you!
(Reply) (Parent) (Thread)
[User Picture]From: jwm
2006-08-05 12:49 am (UTC)

This is your classic speed versus size trade off. Parsing text is generall y slow, so if you need to parse the XML of the RSS feed to pull out the individual entries and sort those into an aggregate, that will be slow, too. If you parse once (per RSS refresh), place the body and date into indexed fields, then retreival will be fast, but you'll need the additional overhead of indexes.

In actuallity, replacing the XML representation of structure and order with the binary representation in the database might even be cheaper to store.

(Reply) (Thread)
[User Picture]From: clayfoot
2006-08-05 07:06 am (UTC)
You don't need the whole entry. All you need is the date and the unique URL of each entry. Sort all of those by date. When you want to look at just the top n entries, fetch the RSS feeds of just the top n entries you want.
(Reply) (Thread)