<?php
$url=$_GET['url'];
$data=file_get_contents($url);//get our data(doesnt work on dreamhost, use curl)
$data=str_replace(">",">\r\n",$data);
$data=strip_tags($data);
$spl=explode("\r\n",$data);//split the data up where there were tags(we need lotsa rows of data)
for($i=0; $i<sizeof($spl); $i++)
{
//your function to insert into mysql DB here
}
?>
Anyone in this business refusing to learn a bit of scripting has a splintery broomstick coming for them.
No, no lube.
::emp::
I don't know any programming so don't know how I could put that script to use.
The problem with scrapers is that they need to be updated every time the site structure changes. If there are any good publicly released scrapers the sites that are being scraped will quickly change their structure and the scraper becomes obsolete.
If you want a scraper, IMHO there are 2 options:
1 - Hire someone(I'll build you a simple scraper with no cookies or CAPTCHA problems quickly and cheaply)
2 - Learn PHP and build your own
If you chose to take option 1, PM me or find someone on the various freelance sites on the web
For option 2, here are two sites that I recommend to get started(I apologize because these links already exist somewhere in the forum, but I can't find the threads).
VERY BASIC info on scraping:
Basic PHP Web Scraping Script Tutorial - Oooff.com
Tutorial on basic syntax you will need to make a descent scraper:
Tizag Tutorials
goto the PHP tutorial section
Very true. It is relatively easy though to write a regex something that can deal with the occasional list or odd formatting without modification...Yeah, I guess I'm partially wrong, you could just make a generic scraper to clean a site of tags, but I don't really see any practical use. The scripts I build typically do some hardcore scraping and parsing to pull off very specific pieces of data. If i'm looking for specific urls on a page and not every url, then I have to look for unique start and end strings
'/<a href(.+?)\/a>/' - just doesn't work for me, I need something else.
So I look for container tags as you stated. However, if the particular url I'm looking for is now placed into a table cell instead in an ordered list, then I have to update my script accordingly.
if you are scraping indiscriminately then yes - your scraper will work for most sites on the net. However, the more targeted that the script is, the more likely site structure will mess it up.