Image Crawler

Status
Not open for further replies.

emp

New member
Jun 29, 2006
7,465
211
0
Another one, based on the first script.

Code:
<HTML>
<body>
<h2>Google Image Crawler</h2>

    <FORM action ="<?php echo($SERVER['PHP_SELF']); ?>" method ="get">
        <INPUT Type="TEXT" name="query" size="30"/>
        <INPUT type="SUBMIT" value="Get this!"/>
    </FORM>
    
<?php
    // Setting the variables 
    $GooglePrefix = "http://images.google.com/images?q=";
    $query = $_GET['query'];
    if ($query != NULL)
    {
        echo "Looking for ".$query."<br>";
        $loop =0;
            $CompleteUrl = $GooglePrefix.$query;
            $res = $res.webFetcher($CompleteUrl); // we use the function webFetcher to get the page
        echo "<hr>";
        $resultURLs = do_reg($res, "/,.http(.*)\",/U");
        //Displaying the images
            for ($i = 0; $i < count($resultURLs); $i++) //we use the length of the returned array to count.
            {
                $text = $resultURLs[$i]; //$text is set to the item in the result we are at
                {
                if (!preg_match("/google/", $text, $matches)) 
                    echo "<img src=http".$text."><br>";
                }
            }
        echo "done";
    }
    function do_reg($text, $regex) //returns all the found matches in an array
    {
        preg_match_all($regex, $text, $regxresult, PREG_PATTERN_ORDER);
        return $regresult = $regxresult[1];
    
    }
    function webFetcher($url)
    {
        /* This does exactly what it is named after - it fetches a page from the web, just give it the URL */
        $crawl = curl_init(); //the curl library is initiated, the following lines set the curl variables
        curl_setopt ($crawl, CURLOPT_URL, $url); //The URL is set
        curl_setopt($crawl, CURLOPT_RETURNTRANSFER, 1); //Tells it to return the results in a variable
        $resulting = $resulting.curl_exec($crawl);  //curl is executed and the results stored in $resulting
        curl_close($crawl);     // closes the curl procedure.
        return $result = $resulting;
    }
?>
So this is basic screenscraping for n00bs, now go and make me proud, boys.

And share some of your scripts.

::emp::
 


errr..?

This thing gets the first page of google image results -> 20 images.
It actually grabs the full size images and displays them.

Neat for whenever you need an image.

Really more for learning and adapting / integrating in your projects.

Ideas:
- Have a checkbox next to each image so you are able to save each checked file
- Make that a radio selection and have only ONE file added to a DB
- etc..

::emp::
 
hey emp,

whre do we put it in?(paste in html file? which area? . what is the benefit of adding this?

i ask to learned and clarify doubts.

thanks.

mr worm
 
It is PHP code.

If you have PHP installed and curl enabled, you are ready to go.

::emp::
 
How about adding a checkbox, if checked then the Google Images URL has &imgtype=face on the end.
 
Uhm check my code for advance!:rasta::rasta:

Demo Here
http://boy.us.com/Code/google/
Code Here:
PHP:
<?php
session_start();
$_SESSION['count']=0;
?>
<link href="client/style.css" rel="stylesheet" type="text/css">
<center>
<h2>Google Image Leecher</h2>
Coder: <b>o0DarkEvil0o</b>
<form action="./" method="POST">
<table width="600">
    <tr>
        <td width="400" align="center">
            Search For
        </td>
        <td width="100" align="center">
            Max width             
        </td>
        <td width="100" align="center">
            Max height
        </td>
        <td width="100" align="center">
            Min width 
        </td>
        <td width="100" align="center">
            Min height
        </td>
    </tr>
    <tr>
        <td width="400">
            <input type="text" maxlength="40" size="40" name="res" value="<? echo $_POST['res'];?>">
        </td>
        <td width="100">
            <input type="text" maxlength="5" size="10" name="mxw" value="<? echo $_POST['mxw'];?>">
        </td>
        <td width="100">
            <input type="text" maxlength="5" size="10" name="mxh" value="<? echo $_POST['mxh'];?>">
        </td>
        <td width="100">
            <input type="text" maxlength="5" size="10" name="mnw" value="<? echo $_POST['mnw'];?>">
        </td>
        <td width="100">
            <input type="text" maxlength="5" size="10" name="mnh" value="<? echo $_POST['mnh'];?>">
        </td>
    </tr>
</table>

<input type="submit" value="Leech Now">
</form>
<hr width="600">
<table width="900">
<tr>
    <td width="700" align="center"><b>Image Link</b></td>
    <td width="100" align="left"><b>Dimesion</b></td>
    <td width="100" align="left"><b>Image Size</b></td>
</tr>

<?

ini_set('max_execution_time',0);
ini_set('time_limit',0);
ini_set('memory_limit',-1);

$linkarr=array
(
    'http://images.google.com/images?q=',
    '&imgsz=medium|large|xlarge&ndsp=20&svnum=100&hl=en&start=',
    '&sa=N'
);
$searcharr=array
(
    'dyn.Img(',
    ');dyn.updateStatus();//-->'
);
 $s2=array('<span id=maxLimit>','</span>');

function getResult($str,$hash)
{
    $p=array();
    $p[0]=strpos($str,$hash[0]);
    $p[1]=strpos($str,$hash[1],$p[0]);
    return substr($str,$p[0],$p[1]-$p[0]);
}

function CutLink($Link, $Count)
{
    if($Link=='')return 'Truy Cập trực tiếp';
    if(substr($Link,0,7)!='http://')$Link='http://'.$Link;
    $Link=str_replace('index.php','', $Link);
    $Len=strlen($Link);
    $Link1=$Link;
    if($Len>$Count)$Link1= substr($Link, 0, $Count-3).'...';
    return '<a href="'.$Link.'" target="_blank" '.$style.' onmouseover="Tip(\''.$Link.'\')">'.$Link1.'</a>';
}

function Dr_Row($link, $dimesion, $size)
{
    $str ='<tr>';
    $str.='<td align="left"><a href="'.$link.'">'.CutLink($link, 100).'</a></td>';
    $str.='<td>'.$dimesion.'</td>';
    $str.='<td>'.$size.'</td>';
    $str.='</tr>';
    return $str;
}

$searchstring=$_POST['res'];
iF($searchstring=='')die();
?>
<div align="center" id="status">Leeching <b>[<?=$searchstring;?>]</b>...</div>
<?
$searchstring=str_replace(' ','+',$searchstring);
for($k=0;$k<10000;$k++)
{
$link=$linkarr[0].$searchstring.$linkarr[1].($k*20).$linkarr[2];

$content=file_get_contents($link);
$maxres=strip_tags(getResult($content,$s2));
$maxres=str_replace(',','',$maxres);
$maxres=intval($maxres);
if($maxres/20<$k) die('<script>document.getElementById(\'status\').innerHTML=\'Done, '.$_SESSION['count'].' images is detected!\';</script>');
$arr=split('dyn.Img',getResult($content,$searcharr));
$chuoi=array();
$mxw=intval($_POST['mxw']);
$mxh=intval($_POST['mxh']);

if($mxw==0)$mxw=100000;
if($mxh==0)$mxh=100000;

$mnw=intval($_POST['mnw']);
$mnh=intval($_POST['mnh']);

foreach ($arr as $t)
{
    $chandoi=split('","',$t);
    $size=split(' - ',$chandoi[9]);
    $dimension=split(' x ',$size[0]);
    $w=intval($dimension[0]);
    $h=intval($dimension[1]);
    $sizes=intval($size[1]);
    if( ($w <= $mxw) && ($w >= $mnw) && ($h <= $mxh) && ($h >=$mnh) && ($chandoi[3]!='') )
    {
        echo Dr_Row($chandoi[3], $size[0],$size[1]);
        $_SESSION['count']++;
    }
}
}
?>
<script>
document.getElementById('status').innerHTML='Done, <? echo $_SESSION['count'].' of '.$maxres; ?> images is detected!';
</script>
</table>

</center>
 
Status
Not open for further replies.