Hola boys and girls. I'm looking for a seasoned PHP programmer who can write and provide support for a multi-search engine scraper.
These are not major search engines, they are for a specific vertical.
I need a script that will do the following
1. I enter a search term, "Tom Cruise"
2. The script searches the search engine for "Tom Cruise" and stores in a database all information found (link titles, URLs, descriptions, etc)
3. The script quarantines new results and gives me a screen where I can login and look at everything new that has been scraped. From this screen I can change, delete, or publish these new entries. This can be as simple as a field in the database that has a binary flag for whether or not the entry has been reviewed and approved.
4. The script will then automatically revisit the search results weekly to find any new results that may have appeared since the last scrape. It will dupe-check them against what is already in the database, and if they're new, it will put them in the quarantine.
5. The script will automatically revisit the URLs from the search results once a week to make sure they are not 404'ed, 301/302'ed, 500, etc. If they are, it will alert me.
6. The script will simulate the behavior of a real user, meaning spoofing the referer, user agent, random timeouts, etc.
7. I will need support for the script in case one of the search engines i'll be scraping decides to change their code and it breaks your regex/preg_match.
Off the top of my head this is all I'll really need (plus the setup of the database). I started writing it myself since it's really quite simple but with all of my other projects and my full-time job I really don't have time to do this one myself.
We're taking about a scraper, with a review screen, and some chron jobs to check that links are still alive week-to-week.
Looking for quotes and deadlines for this project, as well as credentials. You will be asked to sign a non-compete and NDA. This does involve working with some adult material so if that puts you off your lunch, go hang at disney.com. The rest of you who want to get paid, send me your estimates and your credentials because I'm looking to have this built soon.
Oh, and all the information you need to scrape is on the search results pages... you don't actually need to scrape any of the actual sites or pages in the search results. A good php programmer should be able to bang this out in a few hours.
Edit: This is also a proprietary script and not something I'm building to sell. This is for my own selfish use.
These are not major search engines, they are for a specific vertical.
I need a script that will do the following
1. I enter a search term, "Tom Cruise"
2. The script searches the search engine for "Tom Cruise" and stores in a database all information found (link titles, URLs, descriptions, etc)
3. The script quarantines new results and gives me a screen where I can login and look at everything new that has been scraped. From this screen I can change, delete, or publish these new entries. This can be as simple as a field in the database that has a binary flag for whether or not the entry has been reviewed and approved.
4. The script will then automatically revisit the search results weekly to find any new results that may have appeared since the last scrape. It will dupe-check them against what is already in the database, and if they're new, it will put them in the quarantine.
5. The script will automatically revisit the URLs from the search results once a week to make sure they are not 404'ed, 301/302'ed, 500, etc. If they are, it will alert me.
6. The script will simulate the behavior of a real user, meaning spoofing the referer, user agent, random timeouts, etc.
7. I will need support for the script in case one of the search engines i'll be scraping decides to change their code and it breaks your regex/preg_match.
Off the top of my head this is all I'll really need (plus the setup of the database). I started writing it myself since it's really quite simple but with all of my other projects and my full-time job I really don't have time to do this one myself.
We're taking about a scraper, with a review screen, and some chron jobs to check that links are still alive week-to-week.
Looking for quotes and deadlines for this project, as well as credentials. You will be asked to sign a non-compete and NDA. This does involve working with some adult material so if that puts you off your lunch, go hang at disney.com. The rest of you who want to get paid, send me your estimates and your credentials because I'm looking to have this built soon.
Oh, and all the information you need to scrape is on the search results pages... you don't actually need to scrape any of the actual sites or pages in the search results. A good php programmer should be able to bang this out in a few hours.
Edit: This is also a proprietary script and not something I'm building to sell. This is for my own selfish use.