email confirmation bot

hehejo

Developer
Sep 22, 2009
803
12
18
Switzerland
www.peakinformatik.com
When I do the scraping of the links, I get the errors 400/403/501 depending on the website (for example wordpress, digg)... But it works for some sites (for example iBlog)...

Probably something is fucked up with my curl code, but I can't see it...

PHP:
<?php
$hostname = '{mail.domain.com:995/pop3/ssl/novalidate-cert}INBOX';
$username = 'confirm+domain.com';
$password = 'mypassword';

// connect...
$inbox = imap_open($hostname,$username,$password) or die(imap_last_error());

// get new emails
$emails = imap_search($inbox, 'NEW');
if($emails) {
    foreach($emails as $email_number) { // for each mail...
        $message = imap_fetchbody($inbox,$email_number,1); // get body from mail...
    
    // remove line breaks
    $data_with_no_breaks = preg_replace('/\n/si', ' ', $message);
    $regex = '/http:(.+?) /si';
    preg_match_all($regex, $data_with_no_breaks, $matches, PREG_PATTERN_ORDER);

    $res = webFetcher($matches[0][0]);
    echo $matches[0][0];
    echo $res;
    
    
    }
} 
imap_close($inbox); // close the connection...

function getValue($item, $query, $end){
  $item = stristr($item, $query);
  $item = substr($item, strlen($query));
  $stop = stripos($item, $end);
  $val = substr($item, 0, $stop);
  return $val;
}

function webFetcher($url) {
  $agent = rnduseragent();
  $ch = curl_init();     
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
  curl_setopt($ch, CURLOPT_TIMEOUT, 10);
  curl_setopt($ch, CURLOPT_REFERER, "http://sn105w.snt105.mail.live.com"); 
  curl_setopt($ch, CURLOPT_USERAGENT, $agent);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);             
  $result = curl_exec($ch);
  curl_close($ch);                            
  return $result;                  
}

function rnduseragent(){
    $arr = array();
    $arr[0] = "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.1.6) Gecko/20070725 Firefox/2.0.0.6";
    $arr[1] = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)";
    $arr[2] = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)";
    $arr[3] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)";
    $arr[4] = "Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 5.1; .NET CLR 1.1.4322)";
    $arr[5] = "Opera/9.20 (Windows NT 6.0; U; en)";
    $arr[6] = "Opera/9.00 (Windows NT 5.1; U; en)";
    $arr[7] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.50";
    $arr[8] = "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; en) Opera 8.0";
    $arr[9] = "Mozilla/4.0 (compatible; MSIE 6.0; MSIE 5.5; Windows NT 5.1) Opera 7.02 [en]";
    $nr = rand(0, 9);
    $result = $arr[$nr];
    return $result;
}
?>
 


Well I would assume wordpress does not offer a imap server for you to connect too... To my knowledge they do not host email accounts for you? If they do they might also not support IMAP. A lot will only allow POP3, and some only web mail
 
Well I would assume wordpress does not offer a imap server for you to connect too... To my knowledge they do not host email accounts for you? If they do they might also not support IMAP. A lot will only allow POP3, and some only web mail

I can read the email and extract the links plus scrape the websites, but it only works on some sites.

The script handles confirmation emails you get for creating an account etc
 
hehe... your not really giving enough information here, plus each service will be completely different.... Where is your problem. Is it not scaping the URL from Wordpress's email? Is it not going to the site correctly? It might be your box's IP is blacklisted and thus why it does not work. Without more information there is no way to know what the issue is... You need to break the problem down and see what step it stops at.

Example for a wordpress blog:
- first check to make sure your receiving the email...
- Once you know it shows up in your IMAP, make sure that coding is receiving the mail (do a print of each mail to make sure its getting sent to the PHP script).
- Once you know that happens, then make sure the URL is being pulled correctly. I would assume u can do that by echoing $matches[0][0] before you fetch it.
- Finally make sure when u go to the page it sendss a success message by checking what HTML is sent back for it.


Their is a extremely slim chance that 1 "generic" script will work for each and every site, especially bigger sites like wordpress and digg. They likely have multiple url's in their email, meaning you might be pulling the incorrect url, and likely have filters in place to prevent the same IP from massively activating accounts.

 
Well it is receives the emails and I already echo the links, they are all correct so far. Then on some sites there come the mentioned error messages. The url's I fetch are correct, when I copy paste them into a browser it works.

Can't be the IP, I'm testing the script from my local machine... Must be something with curl (or cookies?) hmm
 
use a packet sniffer (wireshark) to see what CURL is sending out and what they are sending back...
 
The problem is it's probably not sending the request if you don't see it in wireshark. If you want I can post up a more custom HTTP class i wrote for a client in a little bit. On the phone right now with a client so can't this exact second, but it will allow you to customize the headers out and see everything that comes in more easily, including SSL.
 
The problem is it's probably not sending the request if you don't see it in wireshark. If you want I can post up a more custom HTTP class i wrote for a client in a little bit. On the phone right now with a client so can't this exact second, but it will allow you to customize the headers out and see everything that comes in more easily, including SSL.

well it is sending the packet, but with the header checksum 0x0000 for whatever reason...
 
I dont get what you mean as of "Header checksum". Can u paste exactly what it is sending (header and request) here?
 
@wy}E_!@+J
"PFf^P<GET /a/tBLPYFQBZHXPpB7uR0bJlMBN-.BZHXPp5J/update?PROFILE_URL=digg.com/users/username?OTC-em-we1 HTTP/1.1
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30)
Host: e.digg.com
Accept: */*
Referer: Sign In

something is fucked up here...
 
the stuff before is likely the TCPIP headers so that is ok, you however might be required to have other fiends in there. Example: it might require it to be SSL. See if the url has HTTPS. You also might need cookies as you said. The key is to see what your browser sends, and what your bot sends, and that there equal