Saturday, July 4th, 2009

Extract URL(s) from Link(s)

by Gabriel on 04/09/08 at 10:54 am

Save to StumbleUpon Stumble Upon it!     Save to Del.icio.us Save to Del.icio.us    Share on Twitter! Share on Twitter!

Greetings! Subscribe to my RSS feed or get my latest post directly in your mailbox. Thanks for visiting!

This is a script which extracts URLs from Links. The function gets the content from the HREF attribute and ignores the non-urls like: “javascript: openWindow()”.

Using Regular Expressions

<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/
*/

$url = 'http://www.php.net/';

// Fetch page
$string = FetchPage($url);

// Regex that extracts the urls from links

$links_regex = '/<a[^/>]*'.

'href=["|\']([^javascript:].*)["|\']/Ui';

preg_match_all($links_regex, $string, $out, PREG_PATTERN_ORDER);

echo "<pre>"; print_r($out); echo "</pre>";

function FetchPage($path)
{
$file = fopen($path, "r"); 

if (!$file)
{
exit("The was a connection error!");
} 

$data = '';

while (!feof($file))
{
// Extract the data from the file / url

$data .= fgets($file, 1024);
}
return $data;
}
?>

Be notified when we have new posts by subscribing to BitRepository RSS Feed.
Support us!Did you like this post?
Please spread the word!
Save to StumbleUpon  Save to Del.icio.us  Share on Twitter!    

One Comment

tre

Sep 5th, 2008

Thank you. Will give it a try.

Leave a Comment