clean all excluding hyperlinks with php regex

Yurium

New member
Sep 27, 2007
47
1
0
I have text file which contais several links and html formatted text (all tags under control). Want to filter it from all (text and html) using preg_replace. I want to keep links with anchor only.

Example of my file.

Code:
1337289282¶sentence, page 1¶<div class="postbody">
   <div class="postdate">20.05</div>
      <h2 class="panel"><a href="http://domain.com/page.html">Anchor</a></h2>
      <div class="anoncetxt"> Text ... text</div>
      <div class="postdate">21.05</div>
      <h2 class="panel"><a href="http://domain.com/page.html">Anchor2</a></h2>
      <div class="anoncetxt"> Text ... text</div>
      </div>

As I understand I need regexp to remove all exlude

Code:
 <a href="http://domain.com/page.html">Anchor</a>
  <a href="http://domain.com/page.html">Anchor2</a>

As I understand I need regex to remove all excluding links tags and anchors. Suggest me regex, please.
 


You need a regex to EXTRACT links. There are a lot of examples out there. Just google.
 
this function extracts 5 latest links from cashed files for each category and puts it to different variables.

Code:
function extractLink($filePath, $linkCount = NULL) {
    $fileContent = file_get_contents($filePath);
    preg_match_all('/<h2.+<a[^<]+<\/a><\/h2>/U', $fileContent, $matchContent, PREG_PATTERN_ORDER);

    if ($linkCount != NULL) {
        return array_slice($matchContent[0], 0, $linkCount);
    } else {
        return $matchContent[0];
    }
}

// example of the use of the function
$path1 = './data/cash/file1.txt';
$beginners = extractLink($path1, 5);

$path2 = './data/cash/file2.txt';
$strategies = extractLink($path2, 5);

$path3 = './data/cash/file3.txt';
$brokers = extractLink($path3, 5);


{
    foreach ($cats1 as $cat1) {
        echo $cat1;
    }

    foreach ($cats2 as $cat2) {
        echo $cat2;
    }

    foreach ($cats3 as $cat3) {
        echo $cat3;
    }
}