As part of one of my personal online ventures, I wrote a business directory scraper to gather initial data for my site. The script is written in PHP (using cURL) and driven by a MySQL database of US zipcodes. Using it, for the niche category I was interested in, I extracted info for ~36,000 stores across the US.
I would not like to reveal the business directory site here. However, feel free to PM me and I will send you the name. Hint - the directory is available in the US and UK.
KEY INFO
- Price: $80
- Written in PHP/MySQL
- Uses cURL
- No resell rights
- Does not include customization support
DETAILS
Here is how the script works...
The script is driven by a companion MySQL database of ~39,000 zipcodes. The table includes the following fields: zipcode, city name, state name, latitude, longitude, list of nearby zipcodes, flag (for whether zipcode has been scraped).
Based on specific business keywords (e.g. 'chinese restaurant,' 'physicians'), the script will...
1) perform a search for each zipcode.
2) scrape business info (i.e. store name, address, city, state, zipcode, phone #) on the first page of results.
3) set flag in database that zipcode has been scraped.
As a precaution, I have built in 10 second delays between zipcode searches and a 20 minute delay per every 1000 searches. This is because the business directory will block your IP if you perform too many searches too quickly to appear humanly plausible. I learned this through trial and error, and had several IPs blocked. No refund will be issued if your IP is blocked. Unfortunately, you will just need to find work-arounds in getting a new IP (e.g. continue from a Starbucks, friend's house).
If you are unfamiliar with how scraping works, this is may be great way for you to understand the concepts by dissecting the code. It's a very simple script, only 65 lines including comment lines.
Please feel to reply with any questions, except for the business directory name. For the name, again, PM me and I will send it to you.
Thanks.
Dave
I would not like to reveal the business directory site here. However, feel free to PM me and I will send you the name. Hint - the directory is available in the US and UK.
KEY INFO
- Price: $80
- Written in PHP/MySQL
- Uses cURL
- No resell rights
- Does not include customization support
DETAILS
Here is how the script works...
The script is driven by a companion MySQL database of ~39,000 zipcodes. The table includes the following fields: zipcode, city name, state name, latitude, longitude, list of nearby zipcodes, flag (for whether zipcode has been scraped).
Based on specific business keywords (e.g. 'chinese restaurant,' 'physicians'), the script will...
1) perform a search for each zipcode.
2) scrape business info (i.e. store name, address, city, state, zipcode, phone #) on the first page of results.
3) set flag in database that zipcode has been scraped.
As a precaution, I have built in 10 second delays between zipcode searches and a 20 minute delay per every 1000 searches. This is because the business directory will block your IP if you perform too many searches too quickly to appear humanly plausible. I learned this through trial and error, and had several IPs blocked. No refund will be issued if your IP is blocked. Unfortunately, you will just need to find work-arounds in getting a new IP (e.g. continue from a Starbucks, friend's house).
If you are unfamiliar with how scraping works, this is may be great way for you to understand the concepts by dissecting the code. It's a very simple script, only 65 lines including comment lines.
Please feel to reply with any questions, except for the business directory name. For the name, again, PM me and I will send it to you.
Thanks.
Dave