WickedFire - Affiliate Marketing Forum - Internet Marketing Webmaster SEO Forum

Go Back   WickedFire - Affiliate Marketing Forum - Internet Marketing Webmaster SEO Forum > Free Section > Shooting The Shit

Shooting The Shit This is where the action is for all webmasters alike. Anything goes, seriously. Come meet and network with your peers, it's a fun way to take a break out of your busy day of posting at other boring forums.


Welcome to the WickedFire - Affiliate Marketing Forum - Internet Marketing Webmaster SEO Forum forums.

You are currently viewing our boards as a guest which gives you limited access to view most discussions and access our other features. By joining our free community you will have access to post topics, communicate privately with other members (PM), respond to polls, upload content and access many other special features. Registration is fast, simple and absolutely free so please, join our community today!

If you have any problems with the registration process or your account login, please contact contact us.

Closed Thread
 
LinkBack Thread Tools Display Modes
Old 07-12-2008, 08:44 PM   #1 (permalink)
Member
 
Join Date: Mar 2007
Posts: 81
mike82 has a spectacular aura about
Question Scan a list of URLs for text?

I have a list of URLs and I want to scan them for certain text, and identify on each URL how many times that word is repeated. Is there software that will let me do this?

Thanks
mike82 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-12-2008, 08:53 PM   #2 (permalink)
 
nickycakes's Avatar
 
Join Date: Sep 2007
Location: Beantown, MA
Posts: 1,481
nickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond repute
quick log on your second account and promote your product
__________________
Nickycakes.com: Reformed Blackhat
Nickycakes' Newbie Guide
#cakes irc.freenode.net
nickycakes is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-12-2008, 08:58 PM   #3 (permalink)
Little Member
 
argh01's Avatar
 
Join Date: Oct 2007
Posts: 432
argh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond reputeargh01 has a reputation beyond repute
I bet you could whip something up in perl in about 5 minutes and sell it for $37.00
__________________
WordPress and phpBay Pro FTW!
argh01 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-12-2008, 09:14 PM   #4 (permalink)
Member
 
Join Date: Mar 2007
Posts: 81
mike82 has a spectacular aura about
i seriously need something like this, no joke

i'm looking for something free hopefully
mike82 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-12-2008, 09:57 PM   #5 (permalink)
Senior Member
 
DavidR's Avatar
 
Join Date: Aug 2006
Location: WA
Posts: 494
DavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond reputeDavidR has a reputation beyond repute
I'm feeling generous today. Create two files. One with a list of URLs (one per line) and one with a list of keywords to scan for. Have fun.

Code:
import pycurl
from StringIO import StringIO
import re
import sys

def getpage(url):
  c = pycurl.Curl()
  c.setopt(pycurl.URL, url)
  resp = StringIO()
  c.setopt(pycurl.WRITEFUNCTION, resp.write)
  try: c.perform()
  except: return None

  return resp.getvalue()

def scanpage(html, keywords):
  count = {}
  for kwd in keywords:
    count[kwd.strip()] = len(re.findall(kwd.strip(), html, re.I))
  return count

if __name__ == '__main__':
  if len(sys.argv) < 3:
    print 'Usage: python scanner.py [url_file] [kwd_file]'
    sys.exit()

  f = file(sys.argv[1], 'r')
  urls = f.readlines()
  f.close()

  f = file(sys.argv[2], 'r')
  keywords = f.readlines()
  f.close()

  res = {}
  for url in urls:
    html = getpage(url.strip())
    if html is None: break
    res[url.strip()] = scanpage(html, keywords)

  print res
DavidR is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-12-2008, 11:29 PM   #6 (permalink)
Senior Member
 
Enigmabomb's Avatar
 
Join Date: Feb 2007
Posts: 495
Enigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond repute
"Regular Expressions"
__________________
http://www.MyAuntIsHot.com
Web comedy designed to make you feel more normal.
Enigmabomb is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-12-2008, 11:30 PM   #7 (permalink)
Senior Member
 
Enigmabomb's Avatar
 
Join Date: Feb 2007
Posts: 495
Enigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond reputeEnigmabomb has a reputation beyond repute
Also, MySQL Full text searching. It'll even rank them for you
__________________
http://www.MyAuntIsHot.com
Web comedy designed to make you feel more normal.
Enigmabomb is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-13-2008, 03:29 AM   #8 (permalink)
Member
 
Musashi's Avatar
 
Join Date: Sep 2007
Posts: 42
Musashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond repute
cat urls.txt | grep "my_keyword" -o -n
Musashi is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-13-2008, 04:11 AM   #9 (permalink)
Vista Ready
 
jryan21's Avatar
 
Join Date: Nov 2007
Location: nashville
Posts: 379
jryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond reputejryan21 has a reputation beyond repute
Do this for each url in the list:


Code:
$url = 'your url here';
$term = 'what to look for';
$count = substr_count(strip_tags($url), $term);
if($count > 0) {//do stuff here}
__________________
http://www.faking.net - social engineering at its finest
jryan21 is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-13-2008, 10:20 AM   #10 (permalink)
 
nickycakes's Avatar
 
Join Date: Sep 2007
Location: Beantown, MA
Posts: 1,481
nickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond reputenickycakes has a reputation beyond repute
@ everyone except davidr:

pretty sure he means get the text from the page AT each url =P
__________________
Nickycakes.com: Reformed Blackhat
Nickycakes' Newbie Guide
#cakes irc.freenode.net
nickycakes is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Old 07-13-2008, 03:24 PM   #11 (permalink)
Member
 
Musashi's Avatar
 
Join Date: Sep 2007
Posts: 42
Musashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond reputeMusashi has a reputation beyond repute
Idiot

Quote:
Originally Posted by nickycakes View Post
@ everyone except davidr:

pretty sure he means get the text from the page AT each url =P
haha, now that's fuckin hilarious!
@mike82 bust open a unix cmd line and enter:


for url in `cat urls.txt`;do lynx $url -dump | grep "my_keyword" | wc -l && echo $url;done



*that'll be $37.00, send it via paypal
Musashi is offline  
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Closed Thread

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Selling: list of unregistered English-word Domains zany zoroaster Sell, Buy & Trade 1 06-04-2008 05:20 PM
Free list management software phil9922 Shooting The Shit 2 02-17-2008 11:43 AM
Anyone interested in a list of a Million+ Trackback URLs? linkwhore Sell, Buy & Trade 4 06-01-2007 05:39 PM
Top Earning Keywords (List) Jescro Affiliate Marketing 13 02-15-2007 10:51 AM
Introduction and more... edgee Shooting The Shit 9 09-26-2006 06:04 PM


All times are GMT -4. The time now is 04:16 PM.


Powered by vBulletin® Version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.2.0
WickedFire.com Copyright © 2008 - WickedFire is an international registered Trademark of Coastal Synergy LLC. You may not use any of our trademarks, copyrights, content, or images without a written approval by members of Coastal Synergy LLC.