How to extract content between two delimiters in PHP
Posted on August 29, 2008, Filled under PHP,
Bookmark it
Hi,
Here’s a function which is useful when you need to extract some content between two delimiters. For instance you need to extract content using a robot that connects to a page.
<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/web-programming/php/extracting-content-between-two-delimiters.html
*/
function extract_unit($string, $start, $end)
{
$pos = stripos($string, $start);
$str = substr($string, $pos);
$str_two = substr($str, strlen($start));
$second_pos = stripos($str_two, $end);
$str_three = substr($str_two, 0, $second_pos);
$unit = trim($str_three); // remove whitespaces
return $unit;
}
This is an usage example of this function:
$text = 'PHP is an acronym for "PHP: Hypertext Preprocessor".'; $unit = extract_unit($text, 'an', 'for'); // Outputs: acronym echo $unit; ?>
How it works?
First, we use stripos() to determine the numeric position of the first occurrence of needle in the haystack string. In our example, there are 7 characters from the beginning of the string until ‘an’.
$pos = stripos($string, $start);
Now, we will use this information to get the content of $string, from the $pos character until the last one:
an acronym for “PHP: Hypertext Preprocessor”.
$str = substr($string, $pos);
Remove ‘an’ from the recently created string:
acronym for “PHP: Hypertext Preprocessor”.
$str_two = substr($str, strlen($start));
Determine the number of characters from the beginning of $str_two until ‘for’ (9 in this case):
$second_pos = stripos($str_two, $end);
Now use this number to get the content from the beginning of the string until ‘for’:
$str_three = substr($str_two, 0, $second_pos);
The last variable would be equal with ‘ acronym ‘. Eventually, let’s strip the whitespaces from the beginning and ending of the string:
acronym
$unit = trim($str_three); // remove whitespaces
If you have any comments, suggestions regarding this snippet please post them.
Do you wish to receive the latest updates as soon as they are posted? Get our RSS Feed or Subscribe to the Newsletter!
- August 29, 2008
- article by Gabriel C.
- 16 comments
Related Posts
How to Replace and Modify Content Between Two Delimiters in PHPat August 8, 2009 with 1 comment
PHP: Creating a simple web data (spider) extractorat September 14, 2008 with 10 comments
Create a PHP Script that Logins in to a Password Protected Areaat December 17, 2008 with 14 comments
How to extract username from an e-mail address stringat September 5, 2008 with 2 comments
PHP: Extract Alphabetical Sequences from a Stringat October 5, 2008

16 Replies to "How to extract content between two delimiters in PHP"
September 29, 2008 at 9:18 AM
Hi, nice script!
But I have additional question… what if I want all instances of $unit in an array, for example in case my string contains more than one location of $start and $end?
example:
$string = “This is a string“;
$start = ““;
$end = ““;
desired output:
array (“This”, “string”);
September 29, 2008 at 9:20 AM
my comment stripped the Bold-tags…
i meant:
$string = “[bold]This is a string[/bold]“;
$start = “[bold]“;
$end = “[/bold]“;
September 29, 2008 at 10:29 AM
Here’s the solution:
<?php $string = "<b>This</b> is a <b>string</b>"; $start = "<b>"; $end = "<\/b>"; // Regexp for the extractor $regexp = '/'.$start.'(.*)'.$end.'/Ui'; preg_match_all($regexp, $string, $out, PREG_PATTERN_ORDER); $desired = $out[1]; echo "<pre>"; print_r($desired); echo "</pre>"; /* Output: Array ( [0] => This [1] => string ) */ ?>Notice the backslash before the slash in the ending bold: <\/b>.
Helpful address: http://www.php.net/manual/ro/function.preg-quote.php
September 29, 2008 at 11:46 AM
Wow, this is a great solution. I hoped that it would work without the use of an regular expression, but it works!
Now I want to create a sort of database in an associative array. So for example data between tag A and B needs to get in the same record as data extracted from tag C and D in a what has to result in something like this:
Array (“data from A and B” => “from C and D”,
“data from A and B” => “from C and D”,
and so on…
)
I now have the following:
<?php
function test($string, $start1, $end1, $start2, $end2)
{
// Regexp for the extractor
$regexp1 = ‘/’.$start1.’(.*)’.$end1.’/Ui’;
$regexp2 = ‘/’.$start2.’(.*)’.$end2.’/Ui’;
preg_match_all($regexp1, $string, $out1, PREG_PATTERN_ORDER);
preg_match_all($regexp2, $string, $out2, PREG_PATTERN_ORDER);
$combined = array_combine($out1[1], $out2[1]);
echo “”; print_r($combined); echo “”;
//checking one row
print_r (explode(“,”,$combined[3]));
}
?>
But it doesn’t work unfortunately…Thanks for your help in advance! It’s appreciated big time!
September 29, 2008 at 11:56 AM
Array (â€Âdata from A and B†=> “from C and Dâ€Â,
“data from A and B†=> “from C and Dâ€Â,
Are you sure that your example (with dupes) is a good one?
September 29, 2008 at 12:03 PM
yeah for example:
$string = “<b>Street 23</b> – Paris
<b>Street 43</b> – Berlin
<b>Street 453</b> – London”;
$string_1 = “<b>”;
$string_1 = “”;
$string_3 = “”;
$string_4 = “”;
required result:
Array (â€ÂStreet 23†=> “Parisâ€Â,
Street 43†=> “Berlinâ€Â,
“Street 453†=> “Londonâ€Â)
Thanks
September 29, 2008 at 12:19 PM
This is a way to get the desired result:
<?php $string = '<b>Street 23</b> - Paris <b>Street 43</b> - Berlin <b>Street 453</b> - London'; $array = explode("<b>", $string); $desired_array = array(); foreach($array as $value) { $value = trim(strip_tags($value)); if($value) { list($street, $city) = explode("-", $value); // hyphen is our delimiter $desired_array[trim($street)] = trim($city); // remove whitespaces } } echo "<pre>"; print_r($desired_array); echo "</pre>"; /* Array ( [Street 23] => Paris [Street 43] => Berlin [Street 453] => London ) */ ?>PS: Consider using < for < and > for > when you write a new comment.
September 29, 2008 at 1:00 PM
Thanks a lot! It brought me on new ideas!
January 17, 2009 at 12:00 PM
Very helpful, thanks!
March 3, 2009 at 5:10 PM
My question, i use the following:
$q = 0;
function extract_unit($string, $start, $end)
{
$pos = stripos($string, $start);
$str = substr($string, $pos);
$str_two = substr($str, strlen($start));
$second_pos = stripos($str_two, $end);
$str_three = substr($str_two, $q, $second_pos);
$unit = trim($str_three); // remove whitespaces
return $unit;
}
$h2 = extract_unit($pagina, ”, ”);
to extract the h2 header from a webpage. $pagina is the string which holds the sourcecode of a webpage. The only problem right now: It only extract the first h2 header from the page. I tried with a for loop, but then i get the same h2 header, multple times.
I have a page with 5 h2 headers, and i want to extract all of them, and put them in an array for example.
Anybody knows how?
Maximus
March 3, 2009 at 5:12 PM
it should be
$h2 = extrace_unit($pagina, “<h2>”,”</h2>”);
forgot the < and >
June 5, 2009 at 2:45 PM
[...] a practical cURL function & another function that extracts content between 2 delimiters (click here to view details about [...]
August 8, 2009 at 4:21 AM
[...] while ago I have written a short tutorial of how you can write a short PHP function to extract content from specific delimiters. I has come to my attention that many people are looking for a way to replace and even modify [...]
September 7, 2009 at 1:59 AM
[...] 5.2.0) which doesn’t have the json_decode() function included, you can use the alternative extract_unit function to get the data between total_posts”: and “ (will return the actual number). [...]
September 8, 2009 at 8:33 AM
[...] is specified.The extract_unit function gets all the text between 2 specified strings.Thanks to Bit Repositary for this function.Since we need to extract text from javascript so we need to deocde the output [...]
February 25, 2010 at 10:12 PM
A more efficient version of the algorithm (no string copies required), which copes with $start or $end not existing in $string