Anyone know where I'd buy a database dump of EzineArticles?



When you get it coded up make sure it deals with changing css sheets, and ip banning.

They make other frequent changes as well to stop scrapers. I've written half a dozen over the last year and they change so much it's just a pain in the ass.
 
I wrote a tutorial on it a couple weeks ago using EzineArticles as an example: PHP Tutorial 2: Advanced Data Scraping Using cURL And XPATH | Matthew Watts

It's honestly not hard to do yourself. Just add database functionality and make sure to add a function in the cURL call to insert a proxy to use.

Wish I skimmed that before I started, I didn't see that they had a next button on their site so I ended up spamming Google trying to find all the article URLS.

Did you guys have problems initially or get banned later on?. I've hit it ~5k times so far and haven't been banned yet.(proxies obv)
 
Wish I skimmed that before I started, I didn't see that they had a next button on their site so I ended up spamming Google trying to find all the article URLS.

Did you guys have problems initially or get banned later on?. I've hit it ~5k times so far and haven't been banned yet.(proxies obv)

I didn't get banned, just a message comes up, "hey I see you're using ___ no problem, enter captcha and keep reading" :)

Never thought of scraping cache, excellent idea!
 
this is why automating a browser is infinitely better than botting with cURL...watir/celerity to the rescue, run 10 concurrent instances with liberal pauses and a dedicated proxy per instance...ezine isn't going anywhere, it's not a race