Scraping Businesses and downloading a file

Pb.com

New member
Mar 22, 2011
658
3
0
In broad terms, let's say I want a specific pdf file that is hosted on a company's site and I want to get this file for as many businesses as possible in the U.S.

What is the best way to do this?

My thoughts:
1. Create a master list of businesses (cross reference various directories)
2. Extract Name of company, address, phone, site URL
3. Have mechanical turks go through the list of URL's, paste url to exact file location (www.joesbiz.com/thefile.pdf) or the page that the information I'm looking for is on (www.joesbiz.com/thefilepage.html)
4. Download the file/take a screenshot of the page and convert it into a pdf, stripping the extra parts out.

I apologize for speaking in generic terms ahead of time.
 


In broad terms, let's say I want a specific pdf file that is hosted on a company's site and I want to get this file for as many businesses as possible in the U.S.

What is the best way to do this?

My thoughts:
1. Create a master list of businesses (cross reference various directories)
2. Extract Name of company, address, phone, site URL
3. Have mechanical turks go through the list of URL's, paste url to exact file location (www.joesbiz.com/thefile.pdf) or the page that the information I'm looking for is on (www.joesbiz.com/thefilepage.html)
4. Download the file/take a screenshot of the page and convert it into a pdf, stripping the extra parts out.

I apologize for speaking in generic terms ahead of time.

you'd want to either write a scraper or use a tool that does this.
 
If you need company data from different sources like Yelp, Yellowpages, Linkedin Companies, BBB, etc. Then I can help you, dont reinvent the wheel.
I have already coded a tool and constantly adding more features.