In search of a scraper

Status
Not open for further replies.

RainbabyRain

New member
May 27, 2008
3
0
0
I recently bumped into a few websites that are scraping questions / answers from yahoo answers and then turning it to a FAQ website. Does anyone have any idea where I can get this scraper?

I am also looking for a RSS Scraper to scrape more then one video website.

Help appreciated, thank you.
 


I guess I can do that, but do you have any suggestions of "Must reads" before I start?

There is a book called "Webbots, Spiders, and Screen Scrapers" by Michael Schrenk. Teaches you how to write scrapers like these in PHP. Best book on this subject in my opinion.
 
I've got that book and also recommend it. It was quite easy to understand and I've been able to put it into practice even with my poor php.

For those that have the book, but are also average -> decent at php...

What do you think of the libraries that come with the book? The guy hardly ever uses regular expressions so although with the libraries he supplies you can get the job done, I'm wondering if it's a substantially inferior way to get properly acquainted with the subject.
 
I was wondering about that as well.

The idea behind the libraries is OK, but he seems to shy away from regular Expressions.
Which, I think, is OK actually.

Regular expressions are a tough beast, so if you don't know how to do them, you might well end up with non-working scripts just because of them.

So either learn them, or do without, as this guy does.

::emp::
 
I've got that book and also recommend it. It was quite easy to understand and I've been able to put it into practice even with my poor php.

For those that have the book, but are also average -> decent at php...

What do you think of the libraries that come with the book? The guy hardly ever uses regular expressions so although with the libraries he supplies you can get the job done, I'm wondering if it's a substantially inferior way to get properly acquainted with the subject.


Yeah, I noticed that as well. I used the book as a foundation; just to learn how to write scrapers and bots. Another problem with this author is that almost all of his code is procedural, very little OOP. When writing a more complex bot or scraper, you're not going to want to do the entire thing procedural....
 
I kind of liked the lack of regex in the book... regex is voodoo.. I think if he had of used regex too much it would of taken away from the purpose of the book... I mean if you know regex then no problem, but if you dont, he would of needed several extra chapters just explaining how the regex works...

also..

oop is advanced php in my opinion.. I think the book is focused more toward people just beginning to learn the language, if he would of used oop alot i think it would of targetted the book toward a smaller niche....

because i know php I used the book as a foundation to work from, and then added my own twists on things... if you are just starting out with php all of the scripts work out of the box and are easy to follow

I think the book is well thought out and does exactly what it says on the tin...

PS
sorry to turn this thread into a book review :)
 
Status
Not open for further replies.