automated site/content scraping service

Status
Not open for further replies.

feydr

New member
Apr 6, 2009
6
0
0
Rating - 0%
0   0   0
yo everyone, just a heads up that datathug.com is going to be going live here shortly and we are looking for a few beta testers to help us tweak the UI to be more user-friendly

a quick rundown on what datathug is:

basically it's a online pay-by-the-page content scraper -- no downloads, point-n-click idiot-proof usage, fast and secure -- we route everything through proxies and take our time so as to not hit the target sites so hard (50k pages per site per day is a typical job) -- yes we could go multi-threaded and grab the entire site in a few hours but that is not what we are trying to do to stay under the radar -- it also will login to protected pages for you so if you want to dupe some super secrets from your competitors there you go

while this tool was mainly developed to grab content it obviously can do data to

say you want to steal the dictionary from investopedia.com or maybe you want all the recipes from cooks.com -- this tool does this

you add a bookmarklet to your toolbar/favorites/whatever and whatever page you are on you can activate it from where you'll be able to highlight certain sections of the page you want -- after you do that we'll either crawl or use the indexed pages to find all the corresponding pages you need and with a click of a button your job is being processed

later on you can go through your scraped content and scrub it (to remove any incriminating information ;) or you can do mass updates

let me give you an example:

say we are wanting to dupe an article directory -- we'd go ahead and select to take all the metas information and the article

after scraping we'd scrub out the article directory name from the metas and replace it with our content -- then we'd maybe change the links on the authors page to our own offers

after that is done we'd go ahead and export it

export options (as of now) are sql, wordpress articles, csv, etc...

we scrape images as well right now but do not have plans for video/music anytime soon (maybe in the future)

also, the toolbar has the ability to select links from a section of a page to add to your joblist so just in case the site doesn't have rss or your target urls are not easily obtainable

also, we have an affiliate program shaping up to promote the site and affiliates can take 40% off of every link that is scraped from members they sign up

I might try and post some screenshots/videos of it here soon to give you a better idea
 


Status
Not open for further replies.