Scraping Data: PHP Simple HTML DOM Parser
Posted on December 12, 2008, Filled under PHP,
Bookmark it
PHP Simple HTML DOM Parser, written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts that use complicated regexes to extract information from web pages.
Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website:
// Create DOM from URL or file
$html = file_get_html('http://www.microsoft.com/');
// Extract links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
// Extract images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
The parser can also be used to modify HTML elements:
// Create DOM from string
$html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=simple]', 0)->innertext = 'Foo';
// Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
echo $html;
