Trying to write a proxy scraper...

bsmaat

New member
Jun 19, 2007
58
1
0
Hi,
so this is my first experience with scrapers, and my knowledge of PHP isn't great.

I'm trying to write quick proxy scraper to get me the ip and port using curl, particularly for this website, Proxy Lists. Sorted by type. List #1

My problem: I can get the proxy IP just fine, but I'm struggling to get the port. From firebug, I see something like

PHP:
<script type="text/javascript">
  document.write(":"+i+w+i+w)
</script>
:8080

It appears that the $rawdata for curl does not read this. When I echo $rawdata; the port is simply not printed on the screen. I guess this is something to do with running the javascript. Is there any way around this?

Thanks
 


They are using javascript as a way to encode the port to stop people like you scraping it. Either scrape from sites that don't use the javascript or find a way to decode it yourself. From the looks of things they have a dictionary that maps characters to letters, i to 8 and w to 0, look around in the source code and try to find this dictionary.
 
From memory this page rotates the variables, too: the letters are not hardcoded to a particular number. Was fiddling with this idea before but found it wasn't worth my time in PHP. Easier to take them from other pages in the short term.
 
Didn't see you had a link for the site, have a look in the head and you will find this, you'll need to parse it in order to decode the proxy port.
Code:
<script type="text/javascript">

y=9;v=8;t=2;r=4;l=5;q=6;s=3;w=1;p=0;x=7;</script>