Web Scraping New Guy with a Question

Status
Not open for further replies.

mattgatten

New member
May 1, 2008
150
0
0
<Flame-Proof-Undies>

Hi guys, just found this site today. I gotta say, 'It's right up my alley'. Wow! So much to see and learn.

Here's my deal, I've been a web programmer for about the past 15 years. (Notice I didn't say, a 'Web Designer') I have absolutely horrible creative skills. Therefore, I stay away from that part of it.

I'm currently a Sr. Software Engineer for a big company and mainly spend my days at work writing shell scripts, java, and JavaScript til I'm blue in the face.

On the side, I scrape web pages for fun. Actually, I've applied my scraping skills at work to harvest metrics from elaborate websites for statistical reporting, capturing user clicks, etc.

I've written a few 'private pieces' just for kicks. I have one hunk of code that is a client based web scraper. That is, it uses NO server side scripting. No PHP, JSP, ASP, or any of that shit. It's purely client based. However, it experiences the the whole cross-browser compliance issue and I didn't want to spend the time to 'browser-sniff/proof it'. I've run across some new technology that should allow me to code it once and have it perform equally well outside of any browser, and on any of the major operating systems. The dilemma is I know exactly 'dick' about gauging interest in such things.

In it's current form, it's an Ebay scraper. I have run modifications of it to harvest hundreds of thousands of completed auctions in a 24 hour period. I did this merely to understand when the best time to end an auction would be. Consequently, it's Mondays between 5 and 7 pm and it can vary over 30% for the really hot items.

Based on your vast experience, would I be able to 'monetize' something like this (or my skills) to some degree? While I normally use the stuff I do for legitimate reasons, I was recently asked to scrape the web for realtor information and sold that to a mortgage broker. I gotta say, I liked it. Evil? Maybe. Fun? Fuck yeah! I want more.

Any advice? Sorry this is so long!

Thanks in advance,
Matt

</Flame-Proof-Undies>
 


There are definitely ways to monetize good scraping skills. The most straightforward way is to find sources of information that are dispersed or poorly presented and put it together in a clean web site that seamlessly incorporates advertising and affiliate offers. The million dollar question is what types of information will generate income and how can you get traffic to that site (search engine traffic, etc.).

Look at Zillow.com as an example. Most of the information on that site is buried somewhere on county government web sites that people can't be bothered to look up and use. When you bring that data together in one place, sprinkle in a nice GUI and some pricing estimates, you suddenly have a profitable web site.

There's also a good thread in the Affiliate Marketing Forum called "Easy Way To Make 100k/Year" that talks about scraping YellowPages.com to make money.
 
monetizing the information that you scrape would probably be much better than trying to do so with your skills as it sounds like you already have a job.

Just get with one of the designers at your place of business and get them to build the front end while you tackle the backend and throw some sites up. Check the log files to see what kind of searches are coming in and then exploit them.

No need for flame proof panties with that as your 1st post

BTW: is the monday 5-7 thing when most of the auctions end or is that when identical items are going for the maximum value?
 
Maximum value. Worst time is actually Friday nights between 4 and 5 am. As you can imagine, the time of day is the biggest variation. It's 30-40% in some cases. Day of week contributes 5-6%. On some of the items, I saw a consistent 45% difference. The number I listed above was an overall average across many items.

Did you also know that when I would scrape searches based on the name Apple, I was seeing items expire (auction end) at a rate of 2 or 3 per second? I couldn't understand why I was getting duplicate item numbers in my scrape. That was the reason. :) Wow! They see some 'hella-fied' traffic.

If you guys want to see a video of a basic version of my scraper just shoot me a PM or e-mail. I posted a reply on this thread with a link to it but I don't think the mods approved it. At least not yet.

Thanks for NOT killing the 'shithead' guys!
 
PS I wouldn't mind dabbling in a collaborative project with someone who needs data scraped from somewhere. If not a project, give me some tips on what would be good to scrape. I mean, I've seen the YellowPages scrape script for sale. Are there any others that folks would pay for CSV files of massive volumes of data? Just curious.

Thanks guys,
Ya'll fuckin' rock!
 
depending on how good and easy to use your software is, I'm sure there are plenty of folks who would buy it, for that matter.
 
I haven't actually thought about that angle of it too much. That could be a possibility. I have written 'fully client-side' applications that output raw html to the browser (in the video you can see it) and also to local filesystems and databases.

I've been scraping for about 2 years now for my own amusement and, to be honest, I haven't seen too many that I couldn't harvest. I've actually considered writing one that lets the user select what gets scraped, provides a sample data preview, then outputs the entire data set to a CSV or some other 'transportable' format.

The reason I stick with client side is that any internet connection will work. No reason to get your own ISP, Hosting provider, etc involved or pissed off because you're running some 'shady' operation on a server. My thoughts were, I could pull up outside of a Panera or Starbucks or poorly secured wireless network and let it rip while drinking a cup of coffee. hahaha
 
Status
Not open for further replies.