Scan a list of URLs for text?

Status
Not open for further replies.

mike82

New member
Mar 3, 2007
121
0
0
I have a list of URLs and I want to scan them for certain text, and identify on each URL how many times that word is repeated. Is there software that will let me do this?

Thanks
 


i seriously need something like this, no joke

i'm looking for something free hopefully
 
I'm feeling generous today. Create two files. One with a list of URLs (one per line) and one with a list of keywords to scan for. Have fun.

Code:
import pycurl
from StringIO import StringIO
import re
import sys

def getpage(url):
  c = pycurl.Curl()
  c.setopt(pycurl.URL, url)
  resp = StringIO()
  c.setopt(pycurl.WRITEFUNCTION, resp.write)
  try: c.perform()
  except: return None

  return resp.getvalue()

def scanpage(html, keywords):
  count = {}
  for kwd in keywords:
    count[kwd.strip()] = len(re.findall(kwd.strip(), html, re.I))
  return count

if __name__ == '__main__':
  if len(sys.argv) < 3:
    print 'Usage: python scanner.py [url_file] [kwd_file]'
    sys.exit()

  f = file(sys.argv[1], 'r')
  urls = f.readlines()
  f.close()

  f = file(sys.argv[2], 'r')
  keywords = f.readlines()
  f.close()

  res = {}
  for url in urls:
    html = getpage(url.strip())
    if html is None: break
    res[url.strip()] = scanpage(html, keywords)

  print res
 
Do this for each url in the list:


Code:
$url = 'your url here';
$term = 'what to look for';
$count = substr_count(strip_tags($url), $term);
if($count > 0) {//do stuff here}
 
@ everyone except davidr:

pretty sure he means get the text from the page AT each url =P

haha, now that's fuckin hilarious!
@mike82 bust open a unix cmd line and enter:


for url in `cat urls.txt`;do lynx $url -dump | grep "my_keyword" | wc -l && echo $url;done



*that'll be $37.00, send it via paypal :rasta:
 
Status
Not open for further replies.