basic scraping question

Status
Not open for further replies.

Stanley

Banned
Jun 24, 2006
3,399
43
0
San Diego
I'm using file_get_contents to scrape a few hundred pages, but the load time for each page is significant due to images and flash. What would be a more efficient method of getting that content without downloading the extraneous junk?
 


Well, php won't download any of that crap, only the file you're telling it to download.

Unless you mean you want to parse out all the embedded objects and image tags in the file to reduce load time after it's already been scraped and you open it with your browser, I don't think there's much you can do to speed up file_get_contents calls without using a faster server.
 
I don't know if any of these are more efficient, (although I think I remember reading that just about anything is more efficient than file_get_contents()) you could always use these instead:
curl
file()
fope()
 
I tested file_get_contents vs curl()

Test 1: 5.6382260322571
Test 2: 4.8649210929871

Test 1: 18.499569177628
Test 2: 16.000211000443

Test 1: 21.442551136017
Test 2: 15.969302177429

Test 1 is file_get_contents
Test 2 is curl()
 
Status
Not open for further replies.