Export YellowPages Data to CSV, 100% free

Status
Not open for further replies.

HBZSoftware.com

New member
Jul 11, 2008
360
9
0
LA
Rating - 100%
7   0   0
http://www.wpreviewengine.com/yp.zip

It is built in PHP (run it on your server) and uses XPath to scrape the data, which means it's more reliable than anything that uses regex - but it also means you need XPath (most hosts support it). WF's very own LegitPHP (a kickass programmer) wrote the XPath queries, I wrote the rest of it.

If you're building local directories this is just what you need, and it's hard to beat free.

You can import this data directly into WPReviewEngine using the CSV import feature. I wasn't going to give this away for no reason at all... :D

edit: just a note... you do need to chmod the directory you're putting this in to 0777, otherwise it won't be able to save the CSV files it generates
 


I wrote a YP scraper a long time ago, but I'm looking forward to checking out the code just to see what I can learn from it.
 
I knew this was coming. Good job Louis this is an excellent addon for the review plugin.
 
Can you search by industry, zip, city, etc?

Yes. This can grab anything YellowPages.com's business section will display.

Just go to YellowPages.com, and in the "Find a business" section, do ANY search with ANY parameters. Refine it as you wish on the results page with the distance, category, etc. links. Paste the URL of the final page with the results you want into the exporter.
 
I wrote a YP scraper a long time ago, but I'm looking forward to checking out the code just to see what I can learn from it.

Did you use regex? If so, definitely check out the code. XPath is far superior, and it's finally become standard now that PHP 5 has essentially been adopted everywhere. It's basically SQL for XML documents. If you're writing any scraper, XPath is the way to go.
 
Uploaded the files into a subdomain.

Can get the script to load if the directory permissions are 0755, when I chmod to 0777, as per the directions, I get:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Xpath is enabled on the server. Any ideas?
 
I wrote a YP scraper a long time ago, but I'm looking forward to checking out the code just to see what I can learn from it.

I wrote one a while ago too for dabbling in niche sites, I'll have to see how they compare.

Oh yeah, and xPath? Pussies, j/k :xomunch:
 
Uploaded the files into a subdomain.

Can get the script to load if the directory permissions are 0755, when I chmod to 0777, as per the directions, I get:

Internal Server Error

The server encountered an internal error or misconfiguration and was unable to complete your request.

Xpath is enabled on the server. Any ideas?
You're running cPanel latest. Check your error_log from within cPanel, it'll be reporting SoftPermissionException errors. This is because for some god forsaken reason, cPanel thinks its smarter than web developers, and decides to not let scripts run in writable directories (*).

My recommendation would be for either you or Louis to hack the script to output to a subdirectory, so that you can chmod that subdirectory to be writable.

However, try chmodding to a+rwx g+rw o+rw instead of rwxrwxrwx. Ignore HBZ's recommendation of 0777, its probably unnecessary. Or even try just owner writable, no one else.


(*) I understand the security risks, when they exist, and when they don't.
 
Did you use regex? If so, definitely check out the code. XPath is far superior, and it's finally become standard now that PHP 5 has essentially been adopted everywhere. It's basically SQL for XML documents. If you're writing any scraper, XPath is the way to go.
Yeah I use regex usually. I tried XPath back when I first wrote it (over a year ago) but the documentation I could find was contradictory and I ended up abandoning XPath out of frustration. I still haven't looked at this code yet, but I will.
 
You're running cPanel latest. Check your error_log from within cPanel, it'll be reporting SoftPermissionException errors. This is because for some god forsaken reason, cPanel thinks its smarter than web developers, and decides to not let scripts run in writable directories (*).

My recommendation would be for either you or Louis to hack the script to output to a subdirectory, so that you can chmod that subdirectory to be writable.

However, try chmodding to a+rwx g+rw o+rw instead of rwxrwxrwx. Ignore HBZ's recommendation of 0777, its probably unnecessary. Or even try just owner writable, no one else.


(*) I understand the security risks, when they exist, and when they don't.

Your probably wrong (although I haven't look at the script directly). Typically when you have write issues it's because the folder isn't owned by apache, which forces you to chmod it 777 (a huge error).

You need to chown (change owner) to that of apache, which in itself is risky but much less risky than chrmod 777'ing it. If it's just writing to the folder you can safely chmod it 644 and then not worry about arbitrary code being ran.

That's my 2 cents, but at the same time you need root access.
 
My recommendation would be for either you or Louis to hack the script to output to a subdirectory, so that you can chmod that subdirectory to be writable.

However, try chmodding to a+rwx g+rw o+rw instead of rwxrwxrwx. Ignore HBZ's recommendation of 0777, its probably unnecessary. Or even try just owner writable, no one else.


(*) I understand the security risks, when they exist, and when they don't.

Thanks but no luck
755 - Generates empty CSVs
766 - Forbidden - You don't have permission to access / on this server.
 
Your probably wrong (although I haven't look at the script directly). Typically when you have write issues it's because the folder isn't owned by apache, which forces you to chmod it 777 (a huge error).

You need to chown (change owner) to that of apache, which in itself is risky but much less risky than chrmod 777'ing it. If it's just writing to the folder you can safely chmod it 644 and then not worry about arbitrary code being ran.

That's my 2 cents, but at the same time you need root access.

Argh, don't have root access on the particular host I'm testing this on. Will try contacting support without going into specifics as I'm sure they wouldn't be happy about running the script on their servers...
 
HBZSoftware.com is the shit!

I bought his review plug in and have installed it on over 30 sites (both of my SEO clients as well as my own site).

I'll be sure to download this little baby and give it a run :)
 
Status
Not open for further replies.