scraping url from text file

mrjuve

New member
Nov 7, 2011
716
0
0
Hi,

I have a text document with 1000+ links like this one User | okanmokan99 | Submitted | GovMp.com and i need a scraper or something to extract the actual urls from submissions based on keywords.

Like i import the list then type the keyword "erken rezervasyon" and the tool to extract the urls that are with that keyword.

Please someone help me to find good software. I spend all day long trying to find something good.

thank you
 


Hi,

I have a text document with 1000+ links like this one User | okanmokan99 | Submitted | GovMp.com and i need a scraper or something to extract the actual urls from submissions based on keywords.

Like i import the list then type the keyword "erken rezervasyon" and the tool to extract the urls that are with that keyword.

Please someone help me to find good software. I spend all day long trying to find something good.

thank you

Use a Linux shell.
Let's asume your file with links is build like this:
"http://www.cnn.com, erken rezervasyon, 28-11-2011"

$awk -F, '/erken rezervasyon/ {print $1} linklist.txt
output: "http://www.cnn.com"

"-F," sets the delimeter to ",".
Without the -F flag awk uses spaces and tabs as default delimeters.

$awk '/erken rezervasyon/ {print $3} linklist.txt
would give us: "rezervasyon"

$awk -F, '/erken rezervasyon/ {print $3} linklist.txt
would give us: "28-11-2011"

You can pipe the output into a new file with:
$awk -F, '/erken rezervasyon/ {print $1} linklist.txt >> newfile.txt

Lookup awk and sed, two very powerfull tools.