Extract URL(s) from Link(s)

Save to StumbleUpon Stumble Upon it!   Save to Del.icio.us Save to Del.icio.us

This is a script which extracts URLs from Links. The function gets the content from the HREF attribute and ignores the non-urls like: “javascript: openWindow()”.

Using Regular Expressions

<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/
*/

$url = 'http://www.php.net/';

// Fetch page
$string = FetchPage($url);

// Regex that extracts the urls from links

$links_regex = '/<a[^/>]*'.

'href=["|\']([^javascript:].*)["|\']/Ui';

preg_match_all($links_regex, $string, $out, PREG_PATTERN_ORDER);

echo "<pre>"; print_r($out); echo "</pre>";

function FetchPage($path)
{
$file = fopen($path, "r"); 

if (!$file)
{
exit("The was a connection error!");
} 

$data = '';

while (!feof($file))
{
// Extract the data from the file / url

$data .= fgets($file, 1024);
}
return $data;
}
?>

 The archive is made using WinZip 12.0. If you're having problems unzipping it, consider using WinRar, WinAce or a similar software to extract the files from the archive.

Be notified when we have new posts by subscribing to  RSS BitRepository RSS Feed.
   Save to StumbleUpon   

Similar entries

One Response to “Extract URL(s) from Link(s)”

  1. Thank you. Will give it a try.


Comment on this post!

Subscribe to BitRepository RSS Feed
[Advertise with us]