Web Crawler

Status
Not open for further replies.

CShoemaker

not related to shoemoney
Jul 10, 2008
622
5
0
I'm looking for a web crawler that I can use to crawl website's source codes. Has to be able to do the following.

-Search websites source code and be able to detect if a certain piece of code is in the HTML then pull the URL into a TXT file
-If it finds a website with the defined piece of code it then crawls the source for a different piece of code and if that is present it will then not pull the URL.
-Can input a list of URL's for it to crawl.
-A not manual method of finding websites to crawl.

Anything like this exist or will I have to get a coder to code it for me? What would be the best language to get it coded in?
 


Read this.
Blue Hat SEO-Advanced SEO Tactics » Complete Guide To Scraping Pt. 2 - Crawling

Use this:
http://www.bluehatseo.com/wp-content/uploads/2006/11/crawlercgi.txt

-Search websites source code and be able to detect if a certain piece of code is in the HTML then pull the URL into a TXT file
If I understood you correctly in the code replace this
Code:
if(($crawldynamic == 0) && ($$link[2] !~ /\?/)){
    &list;
}
elsif(($crawldynamic == 1) && ($$link[2] =~ /\?/)){
    &list;
}


with this

Code:
if($page_parser =~ "whatever im looking for") {
    if(($crawldynamic == 0) && ($$link[2] !~ /\?/)){
        &list;
    }
    elsif(($crawldynamic == 1) && ($$link[2] =~ /\?/)){
        &list;
    }
}
-If it finds a website with the defined piece of code it then crawls the source for a different piece of code and if that is present it will then not pull the URL.
This makes no sense. How would you know whether or not it has piece of code in it before you pull it? For this I recommend the Telepathy Plugin.

-Search websites source code and be able to detect if a certain piece of code is in the HTML then pull the URL into a TXT file
You PUSH TO a text file or PULL FROM a text file not PULL INTO and PUSH FROM.
 
Last edited:
  • Like
Reactions: fm1234
I figured either you or XMCP would respond. Thanks a lot. +rep. oh and eli do you mind if I msg you, I've got a question about one of your sciprts. Really appreciated, thanks a lot.
 
Status
Not open for further replies.