Need help with regex...

hehejo · Dec 22, 2009

Download File

Do you have AIM?

ashbeats · Dec 22, 2009

Ahh.. The line breaks were linux breaks. (\n)

The previous snippet did not work as it was looking for windows styled breaks. (\r\n).

This should work. :2drinkspit:

Code:

<?


function kpCriterion2Array($data)
{
    //remove html entities
    $ascii_data = unhtmlentities($data);
    
    // remove line breaks (Changed To Remove Linux Line Breaks)
  [COLOR=LemonChiffon] [B] $data_with_no_breaks = [COLOR=Orange]preg_replace([/COLOR][COLOR=Lime]'/\n/si',[/COLOR] '', $ascii_data[COLOR=Orange])[/COLOR];[/B][/COLOR]
    
    
    // Regex ( Extract to array) 
    $regex = '/new kpCriterion\(\'\[([^\]]+)\]\', ([\d.]+),\'([\d.]+)\', \'([\d.]+)\', ([\d.]+), ([\d.]+), ([\d.]+),\'([\d.$]+)\', ([\d.]+),\'(.*?)\', ([\d.]+),([\d.]+),([\d.]+),monthlyVariation,([\d.]+),\'([\d.]+)?\',kpView\.MATCH_EXACT,([\d.]+)\)\);/si';
    preg_match_all($regex, $data_with_no_breaks, $matches, PREG_PATTERN_ORDER);

    return $matches;
        
}


function unhtmlentities($string)
{
    // replace numeric entities
    $string = preg_replace('~&#x([0-9a-f]+);~ei', 'chr(hexdec("\\1"))', $string);
    $string = preg_replace('~&#([0-9]+);~e', 'chr("\\1")', $string);
    // replace literal entities
    $trans_tbl = get_html_translation_table(HTML_ENTITIES);
    $trans_tbl = array_flip($trans_tbl);
    return strtr($string, $trans_tbl);
}


// Usage
$html = file_get_contents("http://encodable.com/uploaddemo/files/curl.html"); // my local dump of the html output

print_r( kpCriterion2Array($html) );


?>

hehejo · Dec 22, 2009

Thanks a lot.

It's not scraping the results with a comma in the number yet, but that's an adjustment I should be able to make by now... ;-)

Search

Search

Need help with regex...

hehejo

Developer

ashbeats

Member

hehejo

Developer