website scrapers?

ubaidabcd · Feb 11, 2008

are there any free web page scrapers? or maybe some shareware with a fully functional free trial.

thanks

audax · Feb 11, 2008

ubaidabcd · Feb 11, 2008

I just installed it but, now there's something wrong with Firefox now (i'm on vista). I'm getting an error starting firefox and then it comes up but then it freezes. Had to uninstall. I'm gonna see if I can find some information about it freezing.

audax · Feb 11, 2008

Requires Java to be installed. java.com: Hot Games, Cool Apps

xmcp123 · Feb 11, 2008

Code:

<?php 
$url=$_GET['url'];
$data=file_get_contents($url);//get our data(doesnt work on dreamhost, use curl)
$data=str_replace(">",">\r\n",$data);
$data=strip_tags($data);
$spl=explode("\r\n",$data);//split the data up where there were tags(we need lotsa rows of data)
for($i=0; $i<sizeof($spl); $i++)
{
//your function to insert into mysql DB here
}

?>

Not tested, not endorsed, not anything. But coding a scraper is not too hard.

ubaidabcd · Feb 11, 2008

I don't know any programming so don't know how I could put that script to use.

As for the Piggy Bank plug-in. I downloaded Java with that link, re-installed. same problem. I googled the problem and it seems it could have something to do with vista. somebody else reported the same problem without a resolution.

dubber609 · Feb 11, 2008

The problem with scrapers is that they need to be updated every time the site structure changes. If there are any good publicly released scrapers the sites that are being scraped will quickly change their structure and the scraper becomes obsolete.

If you want a scraper, IMHO there are 2 options:
1 - Hire someone(I'll build you a simple scraper with no cookies or CAPTCHA problems quickly and cheaply)
2 - Learn PHP and build your own

If you chose to take option 1, PM me or find someone on the various freelance sites on the web

For option 2, here are two sites that I recommend to get started(I apologize because these links already exist somewhere in the forum, but I can't find the threads).
VERY BASIC info on scraping:
Basic PHP Web Scraping Script Tutorial - Oooff.com

Tutorial on basic syntax you will need to make a descent scraper:
Tizag Tutorials
goto the PHP tutorial section

emp · Feb 12, 2008

Anyone in this business refusing to learn a bit of scripting has a splintery broomstick coming for them.

No, no lube.

::emp::

Stanley · Feb 12, 2008

emp said:
Anyone in this business refusing to learn a bit of scripting has a splintery broomstick coming for them.

No, no lube.

::emp::

That's a common misconception. I've got this program called "Download Shit to your Brain v4.2", it does all the work for me.

nis · Feb 12, 2008

I call that program TV.

thedamian · Feb 12, 2008

ubaidabcd said:
I don't know any programming so don't know how I could put that script to use.

<RANT>

Actually it looks you're experiencing the problem MOST non-technical people have when trying to get help from a programmer.

If you have zero programming experience, I don't see how a "screen scrapper" will help you.

Usually they're done to get information off a website, but normally to be used by a program, or inserted into a database or excel spreadsheet or something.

Normally when talking to a programmer you have to present the "problem" not what you think is the "solution" because programmers are used to answering the question asked.

If you present your REAL problem we may be able to give you solutions like

- Print it (no seriously, sometimes this DOES solve a problem like, you want to see what it looked like last week)

- Copy and paste (yes. If you highlight sections of the page they WILL copy to word, or notepad (removing formatting) )

Remember, programmers are just as stupid as everyone else and sometimes have their mind-reading skills turned off.

That's what you have project managers like me. Our mind-reading skills are turned on, and for the few of us that don't have them turned on, we know how to ask questions ;-)

</RANT>

emp · Feb 12, 2008

LOL

::emp:: <-- project manager

spazdr8cr · Feb 12, 2008

i second the recommendation on the oooff tutorials ... learn some scripting and you will become all-powerful

ubaidabcd · Feb 12, 2008

I'm not to the point where I need to learn scripting. I have enough on my plate as it is. I can outsource code dirt cheap. My time is more valuable right now.

I wasn't going to fully explain what I needed the scraper for publically like this, anybody who was interested in doing the work already PM'd me.

nickycakes · Feb 12, 2008

Scraping Websites for Fun and Profit | NickyCakes.com

eliquid · Feb 12, 2008

dubber609 said:
The problem with scrapers is that they need to be updated every time the site structure changes. If there are any good publicly released scrapers the sites that are being scraped will quickly change their structure and the scraper becomes obsolete.

If you want a scraper, IMHO there are 2 options:
1 - Hire someone(I'll build you a simple scraper with no cookies or CAPTCHA problems quickly and cheaply)
2 - Learn PHP and build your own

If you chose to take option 1, PM me or find someone on the various freelance sites on the web

For option 2, here are two sites that I recommend to get started(I apologize because these links already exist somewhere in the forum, but I can't find the threads).
VERY BASIC info on scraping:
Basic PHP Web Scraping Script Tutorial - Oooff.com

Tutorial on basic syntax you will need to make a descent scraper:
Tizag Tutorials
goto the PHP tutorial section

Wrong to some degree dude.. i have a scraper and it does not matter what site it is scraping or if it changes the "format" of the HTML. Its in PHP too.

Basically, all you need to do is get the URL ( or URLS ) and cUrl the page you want, then you store all of in a string and strip all the non container tags like <span>, javascript, css, etc... then you look at your container tags like <div>, <p>, <ul>, etc and thats the info you need. Simply replace or remove the containers themselves and you are left with the "root" or "main" content of the page everytime.

There is a few more things you can do to clean it up with some homegrown filters, but none of this rely on the HTML architecture of the page itself, even when it changes.

dubber609 · Feb 12, 2008

Yeah, I guess I'm partially wrong, you could just make a generic scraper to clean a site of tags, but I don't really see any practical use. The scripts I build typically do some hardcore scraping and parsing to pull off very specific pieces of data. If i'm looking for specific urls on a page and not every url, then I have to look for unique start and end strings
'/<a href(.+?)\/a>/' - just doesn't work for me, I need something else.

So I look for container tags as you stated. However, if the particular url I'm looking for is now placed into a table cell instead in an ordered list, then I have to update my script accordingly.

if you are scraping indiscriminately then yes - your scraper will work for most sites on the net. However, the more targeted that the script is, the more likely site structure will mess it up.

xmcp123 · Feb 12, 2008

dubber609 said:
Yeah, I guess I'm partially wrong, you could just make a generic scraper to clean a site of tags, but I don't really see any practical use. The scripts I build typically do some hardcore scraping and parsing to pull off very specific pieces of data. If i'm looking for specific urls on a page and not every url, then I have to look for unique start and end strings
'/<a href(.+?)\/a>/' - just doesn't work for me, I need something else.

So I look for container tags as you stated. However, if the particular url I'm looking for is now placed into a table cell instead in an ordered list, then I have to update my script accordingly.

if you are scraping indiscriminately then yes - your scraper will work for most sites on the net. However, the more targeted that the script is, the more likely site structure will mess it up.

Very true. It is relatively easy though to write a regex something that can deal with the occasional list or odd formatting without modification...

LotsOfZeros · Feb 13, 2008

textpipe works great

thedamian · Feb 17, 2008

xmcp123, you already came up with some fantastic code.
That does some very good work if you all you need is to see the site with all it's format stripped off (which essentially is the only thing a "generic" scrapper can do)

But ubaidabcd is of course correct is saying "he doesn't have time to learn" but if you don't have time to learn just be prepared to trust the people "you're paying to do the dirty work" and get people that you can trust (and know what they're doing)

Good luck.

website scrapers?

Banned

New member

Banned

New member

New member

Banned

New member

New member

Banned

New member

Señor Member

New member

$1.34/day

Banned

Banned

Serpwoo.com

New member

New member

^^^ Bi-Winning ^^^

Señor Member