Learning to Scrape - Who wants some?

Status
Not open for further replies.


I like my women alive.

Just like my women, I like my coffee dark, smooth, and sweet.
 
There's a million different ways to 'skin this cat'.

Sure is. The important thing is the end result. If you can pull info from a scrape, who cares how you did it. I use a wrapper exe script that loops the curl.exe command line through a list of urls. Then I parse the files locally to pull what I need. It's a method I used to pull over 900,000 records (leads actually).

Thanks for posting your methods
 
This guy volunteers his help and gets toally shut down. Nice WF members. This community could be so much more w/o the egos

This community would be dogshit without the egos.
SHUT THE FUCK UP!

& Mattgatten, PM'd you.
 
The JS method has one major advantage. If you put it on a site, and post the grabbed contents to a server side script, you will scrape sites using your visitors ip. I guess that would be worth something in a lot of cases.
 
If you're running it locally you don't chew up bandwidth on the server either. I know space and bandwidth limitations are virtually no problem these days but I do have one community site that can chew up 20gb on a good month and it puts me close to the limit on that hosting account. :)
 
If you're running it locally, the JS layer is just a lot of hassle. Using your visitors is, as far as I can see, the only advantage of running this thing in JS.

I can then cut and paste it to wherever I want or if I need to store it in a DB, I can set up a local web server with php and mysql and write a script to receive and insert my data. I used a local web server/db combo and this same technique to harvest nearly 800k completed ebay auction listings in a 24 hour period.
Why use JS at all to fetch the data, if you're gonne use php to store it? Much easier to just grab the url with php, use php's dom functions, and you're set.
 
Ok, this is my last post on this thread.

Some folks prefer PHP, some don't.
Some prefer hosted servers, some don't.
yadda, yadda, yadda.

I said in an earlier post that I also combine this JS with Adobe Air runtime to create stand alone applications that use the XHR in various libs. Imagine creating an application, much like ajava app, that is platform independent and all that good stuff but you only had to write javascript (that you already know) to create it.

This was just an example to let folks see what can be done without installing the Adobe Air runtime. I also wanted folksthe source if they wanted to try their own hands at it.

From now one, I'll either put to see up some Air Apps or just keep my scripts to myself. I've noticed that everyone on WF has their own way and it's also the 'only way' or 'only correct way'. Presenting alternatives, for whatever reason, is simply fuckin' futile. Ok, I'm done.

Later Gators,
 
Matt,

I need to pull a lot of content on a particular product for a domain that isnt live yet.

Should build a site with wamp and run a scaper from my pc, or is there an easier way to stockpile what I need?
 
... I've noticed that everyone on WF has their own way and it's also the 'only way' or 'only correct way'. Presenting alternatives, for whatever reason, is simply fuckin' futile ...

Yep, you guys remind me of my OCD, it's my way or the highway, wife ... without the cum guzzling.

While I adore the PHP/SQL combo and have almost solely used it for the past decade we just need to face the fact that javascript has it's benefits. I, honestly, wished I had been playing with it at least part time as there is much power there (malicious power, but power nonetheless) that PHP just can't touch. There is something to be said about client side scripting.

Thanks for the script Matt, I'm sure it's helped someone who isn't posting the cons here on this thread.
 
My only problem is that using Air or JS or any client-side tech just doesn't scale for dealing with tons of data.
 
learning

Hi good day
Hi everyone I'm just new here and thanks to the information that have a learning to scrape. Is that easy and simple to learn that? And would I know that do you have any web site? I am very interested how to scrape. Thank you for the information you given us.
___________________________________
Anna Ford
Start your internet marketing career at the Profit Work From Home System Profit Work From Home
 
learning

Hi good day
Hi everyone I'm just new here and thanks to the information that have a learning to scrape. Is that easy and simple to learn that? And would I know that do you have any web site? I am very interested how to scrape. Thank you for the information you given us.
__________________________________________
Anna Ford


Start your internet marketing career at the Profit Work From Home System Profit Work From Home.
 
Status
Not open for further replies.