Hi,
Here’s a function which is useful when you need to extract some content between two delimiters. For instance you need to extract content using a robot that connects to a page.
<?php
/*
Credits: Bit Repository
URL: http://www.bitrepository.com/web-programming/php/extracting-content-between-two-delimiters.html
*/
function extract_unit($string, $start, $end)
{
$pos = stripos($string, $start);
$str = substr($string, $pos);
$str_two = substr($str, strlen($start));
$second_pos = stripos($str_two, $end);
$str_three = substr($str_two, 0, $second_pos);
$unit = trim($str_three); // remove whitespaces
return $unit;
}
This is an usage example of this function:
$text = 'PHP is an acronym for "PHP: Hypertext Preprocessor".'; $unit = extract_unit($text, 'an', 'for'); // Outputs: acronym echo $unit; ?>
How it works?
First, we use stripos() to determine the numeric position of the first occurrence of needle in the haystack string. In our example, there are 7 characters from the beginning of the string until ‘an’.
$pos = stripos($string, $start);
Now, we will use this information to get the content of $string, from the $pos character until the last one:
an acronym for “PHP: Hypertext Preprocessor”.
$str = substr($string, $pos);
Remove ‘an’ from the recently created string:
acronym for “PHP: Hypertext Preprocessor”.
$str_two = substr($str, strlen($start));
Determine the number of characters from the beginning of $str_two until ‘for’ (9 in this case):
$second_pos = stripos($str_two, $end);
Now use this number to get the content from the beginning of the string until ‘for’:
$str_three = substr($str_two, 0, $second_pos);
The last variable would be equal with ‘ acronym ‘. Eventually, let’s strip the whitespaces from the beginning and ending of the string:
acronym
$unit = trim($str_three); // remove whitespaces
If you have any comments, suggestions regarding this snippet please post them.
- August 29, 2008
- article by Gabriel C.
- 18 comments

Comment via Facebook
18 Replies to "How to extract content between two delimiters in PHP"
September 29, 2008 at 9:18 AM
Hi, nice script!
But I have additional question… what if I want all instances of $unit in an array, for example in case my string contains more than one location of $start and $end?
example:
$string = “This is a string“;
$start = ““;
$end = ““;
desired output:
array (“This”, “string”);
September 29, 2008 at 9:20 AM
my comment stripped the Bold-tags…
i meant:
$string = “[bold]This is a string[/bold]“;
$start = “[bold]“;
$end = “[/bold]“;
September 29, 2008 at 10:29 AM
Here’s the solution:
<?php $string = "<b>This</b> is a <b>string</b>"; $start = "<b>"; $end = "<\/b>"; // Regexp for the extractor $regexp = '/'.$start.'(.*)'.$end.'/Ui'; preg_match_all($regexp, $string, $out, PREG_PATTERN_ORDER); $desired = $out[1]; echo "<pre>"; print_r($desired); echo "</pre>"; /* Output: Array ( [0] => This [1] => string ) */ ?>Notice the backslash before the slash in the ending bold: <\/b>.
Helpful address: http://www.php.net/manual/ro/function.preg-quote.php
September 29, 2008 at 11:46 AM
Wow, this is a great solution. I hoped that it would work without the use of an regular expression, but it works!
Now I want to create a sort of database in an associative array. So for example data between tag A and B needs to get in the same record as data extracted from tag C and D in a what has to result in something like this:
Array (“data from A and B” => “from C and D”,
“data from A and B” => “from C and D”,
and so on…
)
I now have the following:
<?php
function test($string, $start1, $end1, $start2, $end2)
{
// Regexp for the extractor
$regexp1 = ‘/’.$start1.’(.*)’.$end1.’/Ui’;
$regexp2 = ‘/’.$start2.’(.*)’.$end2.’/Ui’;
preg_match_all($regexp1, $string, $out1, PREG_PATTERN_ORDER);
preg_match_all($regexp2, $string, $out2, PREG_PATTERN_ORDER);
$combined = array_combine($out1[1], $out2[1]);
echo “”; print_r($combined); echo “”;
//checking one row
print_r (explode(“,”,$combined[3]));
}
?>
But it doesn’t work unfortunately…Thanks for your help in advance! It’s appreciated big time!
September 29, 2008 at 11:56 AM
Array (â€Âdata from A and B†=> “from C and Dâ€Â,
“data from A and B†=> “from C and Dâ€Â,
Are you sure that your example (with dupes) is a good one?
September 29, 2008 at 12:03 PM
yeah for example:
$string = “<b>Street 23</b> – Paris
<b>Street 43</b> – Berlin
<b>Street 453</b> – London”;
$string_1 = “<b>”;
$string_1 = “”;
$string_3 = “”;
$string_4 = “”;
required result:
Array (â€ÂStreet 23†=> “Parisâ€Â,
Street 43†=> “Berlinâ€Â,
“Street 453†=> “Londonâ€Â)
Thanks
September 29, 2008 at 12:19 PM
This is a way to get the desired result:
<?php $string = '<b>Street 23</b> - Paris <b>Street 43</b> - Berlin <b>Street 453</b> - London'; $array = explode("<b>", $string); $desired_array = array(); foreach($array as $value) { $value = trim(strip_tags($value)); if($value) { list($street, $city) = explode("-", $value); // hyphen is our delimiter $desired_array[trim($street)] = trim($city); // remove whitespaces } } echo "<pre>"; print_r($desired_array); echo "</pre>"; /* Array ( [Street 23] => Paris [Street 43] => Berlin [Street 453] => London ) */ ?>PS: Consider using < for < and > for > when you write a new comment.
September 29, 2008 at 1:00 PM
Thanks a lot! It brought me on new ideas!
January 17, 2009 at 12:00 PM
Very helpful, thanks!
March 3, 2009 at 5:10 PM
My question, i use the following:
$q = 0;
function extract_unit($string, $start, $end)
{
$pos = stripos($string, $start);
$str = substr($string, $pos);
$str_two = substr($str, strlen($start));
$second_pos = stripos($str_two, $end);
$str_three = substr($str_two, $q, $second_pos);
$unit = trim($str_three); // remove whitespaces
return $unit;
}
$h2 = extract_unit($pagina, ”, ”);
to extract the h2 header from a webpage. $pagina is the string which holds the sourcecode of a webpage. The only problem right now: It only extract the first h2 header from the page. I tried with a for loop, but then i get the same h2 header, multple times.
I have a page with 5 h2 headers, and i want to extract all of them, and put them in an array for example.
Anybody knows how?
Maximus
September 5, 2010 at 11:46 PM
Hi Maximus,
i have the same problem. Did you solve it?
Best,
Nils
October 12, 2010 at 12:26 AM
Nope, but i dont work on this project anymore. If you find the solution, let me know.
March 3, 2009 at 5:12 PM
it should be
$h2 = extrace_unit($pagina, “<h2>”,”</h2>”);
forgot the < and >
June 5, 2009 at 2:45 PM
[...] a practical cURL function & another function that extracts content between 2 delimiters (click here to view details about [...]
August 8, 2009 at 4:21 AM
[...] while ago I have written a short tutorial of how you can write a short PHP function to extract content from specific delimiters. I has come to my attention that many people are looking for a way to replace and even modify [...]
September 7, 2009 at 1:59 AM
[...] 5.2.0) which doesn’t have the json_decode() function included, you can use the alternative extract_unit function to get the data between total_posts”: and “ (will return the actual number). [...]
September 8, 2009 at 8:33 AM
[...] is specified.The extract_unit function gets all the text between 2 specified strings.Thanks to Bit Repositary for this function.Since we need to extract text from javascript so we need to deocde the output [...]
February 25, 2010 at 10:12 PM
A more efficient version of the algorithm (no string copies required), which copes with $start or $end not existing in $string