Programming Language Used To Scrape?

ttmwttmw

New member
Jul 2, 2009
35
0
0
I'm thinking about getting on it with another programming language, and like the sound of scraping shiz...maybe eventually a price checking bot of some kind?

Just wondered what language is the best for this type of thing? Python? Perl? I've heard it might be possible in PHP too? I just haven't got a clue where to start really.

I've got a bit of programming knowledge with some PHP, Actionsript, MySQL and basic Java etc... just basic stuff really but i'm looking to step it up.

I'm not set on scraping only, but it'd be cool. Python or Perl seem to be pretty highly rated. Input anyone? cheeeeaarrssss! :)
 


Don't make me copy and paste my question again...

I'm looking for some input on what PROGRAMMING LANGUAGE is preferred in here for scraping and bot type stuff related to internet marketing. I know its different languages for different functionality, but i'm looking for the IM input.

If that's what i wanted i'd have asked you to link me to uBot or ScrapeBox...but i already have them...i'm looking for a little more advanced stuff. Post your toss in DP, or post me a Dickroll, but not a shitty torrent search.
 
Don't make me copy and paste my question again...

I'm looking for some input on what PROGRAMMING LANGUAGE is preferred in here for scraping and bot type stuff related to internet marketing. I know its different languages for different functionality, but i'm looking for the IM input.

If that's what i wanted i'd have asked you to link me to uBot or ScrapeBox...but i already have them...i'm looking for a little more advanced stuff. Post your toss in DP, or post me a Dickroll, but not a shitty torrent search.

don't get uppity, the link he posted is to a book about how to use PHP to write scrapers and bots. It's literally the exact answer to your question, and is a quality book to get started with botting.

Click through before bitching next time.
 
what are you trting to do that scrapebox and ubot can't?

Mainly i'd like t oautomate things with a cron or something, and just learn more about the language on thw whole...see where it leads :)

I find uBot a little buggy too...i'm not taking anything from the makers but i'd like to know how to do the things myself anyway, doing different things with strings and searching text etc...like i say i'm not taking anything away from uBot, i think its great for what it does, and will definitely get better with time but i'd like to learn to script myself.
 
don't get uppity, the link he posted is to a book about how to use PHP to write scrapers and bots. It's literally the exact answer to your question, and is a quality book to get started with botting.

Click through before bitching next time.

I clicked through, and i'm not trying to be a dick, but its a link to a pretty general search, to a few different ebooks and random pages from what i can see. Not too hard to put a "try this ebook" or just "good ebook" on the end of the reply now is it.
 
Ah yes! I forgot i didn't want an ebook...

Desperate attempt to find something funny to post? could be.

Thanks for the second part, and useful answer...i have googled it but wanted to know if there were any preferred languages in the IM world, there may be reasons i dont know of that make one better than another, and if one is preferred i might be able to get more support around here...(long shot i know ;) )
 
if you want to learn try .net. Scraping is pretty easy. Plus you can learn how to fill forms, scrape almost any html tag. Its easy like shit.
 
The question should be: what language can you not scrape in? Personally I use C#, but any mainstream language will do.
 
The question should be: what language can you not scrape in? Personally I use C#, but any mainstream language will do.

Really? I've been doing a bit of looking around and Python does seem to be a popular one with quite a big following, and as i understand it's quite new so has the benefit of looking back at other languages. I've had a little play with it on Komodo Edit and it seems pretty easy to learn so far...but i still need to look into .net and Perl maybe.

I'm sure i looked into C# a while back and had a hard time with it, but that might have been because i wasn't too good with programming then in general. Cheers anyway, just need a bit more research i think.
 
C# is most likely not a mainstream choice, but I've found it to be the most powerful for my needs (and yes, I am a windows-only guy out of habit). For doing raw scraping with C# look into the HttpWebRequest class. For simple logins + scraping look into HttpWebRequest + CookieContainer. For doing browser automation look into WatiN. Doing multithreading in C# is relatively simple as well, if you want to scale things up.
 
I write my scrapers in php just because I like the language, there usually okay as long your looking for hundreds of thousands of results instead of millions but then you'd just use proxies etc..

If it's more advanced then you might want something multi thread.
 
You can write a scraper in pretty much any language out there.

You'll probably find the most documentation out there in PHP, but PHP is also much slower than other languages. If speed isn't an issue start there, PHP is still plenty fast for most stuff. Stop worrying about being over loaded because you're going to be nothing but overloaded for a while.
 
yup C# is hard to get into quickly. If your sole purpose is to learn scraping and browser automation i suggest go for vb.net. Its a child play.It would take you max. 1 week to learn the basics of windows forms and controls. And after that you can learn how to scrape, automatically login etc. watch this video on youtube :

http://www.youtube.com/watch?v=AV2tk2FTM0g