scraping url from text file

mrjuve · Nov 22, 2011

Hi,

I have a text document with 1000+ links like this one User | okanmokan99 | Submitted | GovMp.com and i need a scraper or something to extract the actual urls from submissions based on keywords.

Like i import the list then type the keyword "erken rezervasyon" and the tool to extract the urls that are with that keyword.

Please someone help me to find good software. I spend all day long trying to find something good.

thank you

yast · Nov 27, 2011

mrjuve said:
Hi,

I have a text document with 1000+ links like this one User | okanmokan99 | Submitted | GovMp.com and i need a scraper or something to extract the actual urls from submissions based on keywords.

Like i import the list then type the keyword "erken rezervasyon" and the tool to extract the urls that are with that keyword.

Please someone help me to find good software. I spend all day long trying to find something good.

thank you

Use a Linux shell.
Let's asume your file with links is build like this:
"http://www.cnn.com, erken rezervasyon, 28-11-2011"

$awk -F, '/erken rezervasyon/ {print $1} linklist.txt
output: "http://www.cnn.com"

"-F," sets the delimeter to ",".
Without the -F flag awk uses spaces and tabs as default delimeters.

$awk '/erken rezervasyon/ {print $3} linklist.txt
would give us: "rezervasyon"

$awk -F, '/erken rezervasyon/ {print $3} linklist.txt
would give us: "28-11-2011"

You can pipe the output into a new file with:
$awk -F, '/erken rezervasyon/ {print $1} linklist.txt >> newfile.txt

Lookup awk and sed, two very powerfull tools.

mrjuve · Nov 27, 2011

thanks for your help, but i have no idea how to do that

profitordie · Nov 27, 2011

I have tool that could help you... pm me your email and I will send it to you...

Search

Search

scraping url from text file

mrjuve

New member

yast

New member

mrjuve

New member

profitordie

Banned