How to Create an Advanced PHP (bad, naughty) Words Filter

Posted on October 26, 2008, under PHP 

This is a PHP Class useful if you need to filter (bad, naughty) words from a string (text), whether is a simple string or one containing HTML tags.

Here’s the complete source code (we will explain it below):

filter.string.class.php

<?php
/*
Credits: Bit Repository
*/

class Filter_String  {

var $strings;
var $text;
var $keep_first_last;
var $replace_matches_inside_words;

function filter()
{
$new_text = '';

$regex = '/<\/?(?:\w+(?:=["\'][^\'"]*["\'])?\s*)*>/'; // Tag Extractor

preg_match_all($regex, $this->text, $out, PREG_OFFSET_CAPTURE);

$array = $out[0];

if(!empty($array))
{
	if($array[0][1] > 0)
	{
	$new_text .= $this->do_filter(substr($this->text, 0, $array[0][1]));
	}

   foreach($array as $value)
   {
   $tag = $value[0];
   $offset = $value[1];

   $strlen = strlen($tag); // characters length of the tag

   $start_str_pos = ($offset + $strlen); // start position for the non-tag element
   $next = next($array);

   // end position for the non-tag element
   $end_str_pos = $next[1];

   // no end position? 
   // This is the last text from the string and it is not followed by any tags
   if(!$end_str_pos) $end_str_pos = strlen($this->text);


// Start constructing the new resulted string. We'll add tags now!
   $new_text .= substr($this->text, $offset, $strlen);


   $diff = ($end_str_pos - $start_str_pos);

       // Is this a simple string without any tags? Apply the filter to it
       if($diff > 0)
       { 
       $str = substr($this->text, $start_str_pos, $diff);

       $str = $this->do_filter($str);

       $new_text .= $str; // Continue constructing the text with the (filtered) text
       }
   }
}
else // No tags were found in the string? Just apply the filter
{
$new_text = $this->do_filter($this->text);
}

return $new_text;
}

function do_filter($var)
{
if(is_string($this->strings)) $this->strings = array($this->strings);

   foreach($this->strings as $word)
   {
	  $word = trim($word);

	  $replacement = '';

	  $str = strlen($word);

	  $first = ($this->keep_first_last) ? $word[0] : '';
	  $str = ($this->keep_first_last) ? $str - 2 : $str;
	  $last = ($this->keep_first_last) ? $word[strlen($word) - 1] : '';

	  $replacement = str_repeat('*', $str);

	  if($this->replace_matches_inside_words)
	  {
	     $var = str_replace($word, $first.$replacement.$last, $var);
	  }
	  else
	  {
	     $var = preg_replace('/\b'.$word.'\b/i', $first.$replacement.$last, $var);
	  }
   }

return $var;
}

}
?>

How it works?

First, a regex is used to extract the HTML tags (if any). To do that we use preg_match_all() with the flag PREG_OFFSET_CAPTURE. This is used to get the actual position (offset) of the matched element. Based on this information we will calculate the numeric position for elements that aren’t tags & apply the filter to them.

How to use it?

In this example the script is replacing all the words from the array and keeps the first and last letter for each one. It replaces the middle letters with stars (*). For example the word ‘turpis’ will be replaced with ‘t****s’ (notice that four stars are used in this case). The filter ignores the HTML tags. The attribute ‘href’ is not replaced in the A tag.

<?php
error_reporting (E_ALL ^ E_NOTICE);

include 'filter.string.class.php';

$filter = new Filter_String;

$filter->strings = array('consectetuer','consequat','turpis', 'href');

$filter->text = 'Lorem ipsum dolor sit amet, href <a href="http://www.domain.com/">consectetuer</a> adipiscing elit. Nulla mi nunc, consequat vitae, condimentum at, iaculis at, turpis. Praesent suscipit. Maecenas et lectus.';

$filter->keep_first_last = false;
$filter->replace_matches_inside_words = false;

$new_text = $filter->filter();

echo $new_text;
?>
Lorem ipsum dolor sit amet, h**f c**********r adipiscing elit. Nulla mi nunc, c*******t vitae, condimentum at, iaculis at, t****s. Praesent suscipit. Maecenas et lectus.

For example you need to filter the word ‘eat’ and there is a word in the text ‘create’. In this case, if we set $filter->replace_matches_inside_words to true ‘create’ will become ‘cr***te’. If set to false, ‘create’ will remain the same. This is the case when only the distinct word ‘eat’ is filtered.

Comment via Facebook

comments

Leave a Reply


* = required fields

  (will not be published)


XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Note: If you want to post CODE Snippets, please make them postable first!
(e.g. <br /> should be converted to &lt;br /&gt;)

POSTING RULES:

  • The comment must be relevant with the topic of the post.
  • Only comments with real email addresses will get approved. So, emails like 'abc@domain.com' will not be accepted.
  • Do not post the same message in multiple articles through the site.
  • Do not post advertisements, junk mail or pyramid schemes.
  • In case you post a link to another site, please explain briefly where the link goes as a courtesy to other users.
  • Do not post comments such as: "Thank you", "Awesome", "Nice tutorial", "Merci", etc.