08-04-2008, 07:41 AM | #1 | ||
Pro Starter
Join Date: Aug 2001
Location: Willow Glen, CA
|
"Basic" Computer Programming Question
So the last time I coded was 9 years ago in my lower division C++ classes. However, I'm trying to run a basic little program that will pull data from a group of HTML pages and consolidate it into workable strings and integers.
An example of the page I am trying to read is here: Wow Web Stats I'd like to be able to run the program and have it pull out and store an array of strings that contains the names on that page, and then an array of integers that contains the DPS of those players. I don't need any of the other information on the page. If I wanted to take this to the next level (which I eventually do), I'd want it to look at each individual split (hovering over "v Full Report" pulls up individual battles) and pull data from each. Where would be a good place to start on a project like this, or a good place I could read up on this sort of algorithm? Also, would a program like this eat up a lot of bandwidth from that site? I don't want to bog down their servers at all.
__________________
Every time a Dodger scores a run, an angel has its wings ripped off by a demon, and is forced to tearfully beg the demon to cauterize the wounds.The demon will refuse, and the sobbing angel will lie in a puddle of angel blood and feathers for eternity, wondering why the Dodgers are allowed to score runs.That’s not me talking: that’s science. McCoveyChronicles.com. |
||
08-04-2008, 08:10 AM | #2 |
Awaiting Further Instructions...
Join Date: Nov 2001
Location: Macungie, PA
|
I recommend learning PHP since any server you'll most likely rent will run LAMP. They are parsing log files so you'll want to most likely use stuff like fopen (to open a file), explode a string to break it apart at the separators (comma, space, etc) and then feed those into an array and foreach through it.
Taking it to the next level, jpgraph is an open source graph package which is wonderful and can help you create neato little graphs. For some reason, the site is Temporarily Unvailable. Hrmm. And for DB stuff I highly recommend this book (I still use their 2001 edition which I believe is the first). I still refer to this book.
__________________
|
08-04-2008, 10:01 AM | #3 |
College Benchwarmer
Join Date: Dec 2003
|
wha?
|
08-04-2008, 10:21 AM | #4 |
High School Varsity
Join Date: Oct 2000
Location: Old Forge, PA
|
If I were going to do something like this, I would build a quick application that would perform a HTTP Request, and parse the data pulled back. Looking at the source of that page, parsing it wouldn't be too difficult - it seems that all of the data on that page is already neatly laid out in a comma delimited array.
The hard part would be to build the program to create the HTTP Request. I would Google the words "HTTP Request" along with whatever language you happen to be using, and chances are you'd find some sample code to start working with. I've built things like this in VBScript, and I'm pretty sure I got 90% of the code I used from samples on the Net. I wouldn't worry about flooding their servers either with a program like this - it would just look like another web page hit to them. If there ever was a problem, you could just pull the data from whereveer that page is getting the data.
__________________
There are three things I have learned never to discuss with people...religion, politics, and the Great Pumpkin. - Linus Van Pelt |
08-04-2008, 07:12 PM | #5 |
Pro Starter
Join Date: Aug 2001
Location: Willow Glen, CA
|
Thanks for the responses.
Bonegavel - they're doing all the work parsing the log files, I want to manipulate the data they come up with afterwards. While I'd love to gather the data myself and not even involve them, it would take a herculean effort, I think, to come up with accurate data, and since it's already there I'll just try to piggyback off of them. As for the rest, it all sounds good, and I'm going to look up that book, thanks! Ronnie - that's the basic idea I had, but had no idea how to go about it. Thanks! Tredwel - sounds like this might be more efficient than the scraping method. I'll probably give both a try and see what I can come up with. I wasn't really all THAT worried about flooding their server - I wouldn't hit it more than a few times a week. But if I get the program to the point where I can hit multiple pages off the first report, the hits might start increasing.
__________________
Every time a Dodger scores a run, an angel has its wings ripped off by a demon, and is forced to tearfully beg the demon to cauterize the wounds.The demon will refuse, and the sobbing angel will lie in a puddle of angel blood and feathers for eternity, wondering why the Dodgers are allowed to score runs.That’s not me talking: that’s science. McCoveyChronicles.com. |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) | |
Thread Tools | |
|
|