Extract URL(s) from Link(s)
by Gabriel on 04/09/08 at 10:54 am
Greetings! Subscribe to my RSS feed or get my latest post directly in your mailbox. Thanks for visiting!
This is a script which extracts URLs from Links. The function gets the content from the HREF attribute and ignores the non-urls like: “javascript: openWindow()”.
Using Regular Expressions
<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/
*/
$url = 'http://www.php.net/';
// Fetch page
$string = FetchPage($url);
// Regex that extracts the urls from links
$links_regex = '/<a[^/>]*'.
'href=["|\']([^javascript:].*)["|\']/Ui';
preg_match_all($links_regex, $string, $out, PREG_PATTERN_ORDER);
echo "<pre>"; print_r($out); echo "</pre>";
function FetchPage($path)
{
$file = fopen($path, "r");
if (!$file)
{
exit("The was a connection error!");
}
$data = '';
while (!feof($file))
{
// Extract the data from the file / url
$data .= fgets($file, 1024);
}
return $data;
}
?>
Be notified when we have new posts by subscribing to
BitRepository RSS Feed.
BitRepository RSS Feed.Please spread the word! |
|
One Comment
tre
Sep 5th, 2008
Thank you. Will give it a try.
Leave a Comment