Extract URL(s) from Link(s) with PHP

Posted on September 4, 2008, under PHP,  Bookmark it

This is a script which extracts URLs from Links. The function gets the content from the HREF attribute and ignores the non-urls like: “javascript: openWindow()”.

Using Regular Expressions

<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/
*/

$url = 'http://www.php.net/';

// Fetch page
$string = FetchPage($url);

// Regex that extracts the urls from links

$links_regex = '/<a[^/>]*'.

'href=[\"|\']([^javascript:].*)[\"|\']/Ui';

preg_match_all($links_regex, $string, $out, PREG_PATTERN_ORDER);

echo "<pre>"; print_r($out); echo "</pre>";

function FetchPage($path)
{
$file = fopen($path, "r"); 

if (!$file)
{
exit("The was a connection error!");
} 

$data = '';

while (!feof($file))
{
// Extract the data from the file / url

$data .= fgets($file, 1024);
}
return $data;
}
?>

Do you wish to receive the latest updates as soon as they are posted? Get our RSS Feed or Subscribe to the Newsletter!

Get our RSS Feed!

2 Replies to "Extract URL(s) from Link(s) with PHP"

  1. Thank you. Will give it a try.

  2. The second slash on line 14 should be replaced with a backslash.

Leave a Reply


* = required fields

  (will not be published)


XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Note: If you want to post CODE Snippets, please make them postable first!
(e.g. <br /> should be converted to &lt;br /&gt;)