In broad terms, let's say I want a specific pdf file that is hosted on a company's site and I want to get this file for as many businesses as possible in the U.S.
What is the best way to do this?
My thoughts:
1. Create a master list of businesses (cross reference various directories)
2. Extract Name of company, address, phone, site URL
3. Have mechanical turks go through the list of URL's, paste url to exact file location (www.joesbiz.com/thefile.pdf) or the page that the information I'm looking for is on (www.joesbiz.com/thefilepage.html)
4. Download the file/take a screenshot of the page and convert it into a pdf, stripping the extra parts out.
I apologize for speaking in generic terms ahead of time.
What is the best way to do this?
My thoughts:
1. Create a master list of businesses (cross reference various directories)
2. Extract Name of company, address, phone, site URL
3. Have mechanical turks go through the list of URL's, paste url to exact file location (www.joesbiz.com/thefile.pdf) or the page that the information I'm looking for is on (www.joesbiz.com/thefilepage.html)
4. Download the file/take a screenshot of the page and convert it into a pdf, stripping the extra parts out.
I apologize for speaking in generic terms ahead of time.