Websupporter

Just another Websupporter site

An Example of How To Remove Empty HTML Tags with PHP

In his latest blogpost Tom McFarlin gives us “An Example of How To Remove Empty HTML Tags“. Empty tags can be a real nightmare, since they can really destroy the layout of an article. Take for example empty<p>-Tags, which easliy produce huge spaces between paragraphs. There is no solution in CSS for such a problem and to call the designer is a wasted call. Especially WYSIWYG editors produce these problems easily and probably a lot of WordPress users know, what I am talking about.

So I was quite curious about this topic and read his blogpost. I was a bit disappointed, when I saw, he was facing the problem of empty HTML Tags with an Javascript solution:

( function ( $) {
	'use strict';
	$( '.comment code' ).each(function() {
		if ( '' === $.trim( $( this ).text() ) ) {
			$( this ).remove();
		}
	});
}( jQuery ) );

This solution is neat, no question: for Javascript enabled browsers. But for sure, this solution has some disadvantages:

  • the browser has to keep care of the problem, which basically costs time. First you retrieve data you don’t need and in a second step, you need to remove this data.
  • Empty tags are something like silent conversations. What does <em></em> mean? I emphasize nothing?
  • Your browser needs to have Javascript enabled

So, a server side solution would be my preferred way. I was searching a bit, if there might be such a solution, and I found this blogpost by CodeSnap:


 * @version    1.0
 * @param    string    $str    String to remove tags.
 * @param    string    $repto    Replace empty string with.
 * @return    string    Cleaned string.
 */
function remove_empty_tags_recursive ($str, $repto = NULL)
{
    //** Return if string not given or empty.
    if (!is_string ($str)
        || trim ($str) == '')
            return $str;

    //** Recursive empty HTML tags.
    return preg_replace (

        //** Pattern written by Junaid Atari.
        '/<([^<\/>]*)>([\s]*?|(?R))<\/\1>/imsU',

        //** Replace with nothing if string empty.
        !is_string ($repto) ? '' : $repto,

        //** Source string
        $str
    );
}
/*
+=====================================
| EXAMPLE
+=====================================
*/
$str=<<

Hello User,
Welcome to our domain.

EOF; echo remove_empty_tags_recursive ($str); /* +===================================== | OUTPUT: +===================================== */ /*
Hello User,
Welcome to our domain.
*/

Code Snap is using Regular Expressions to identify empty HTML Tags. The advantage of this solution over DOM-solutions is obvious: It doesn’t need to be a valid DOM-path. But for now, this solution only removes for example <a></a> but not <a href=”” class=”external link”></a>.

So its time to play with the regular expression. In order to fix this, I would suggest this solution:

'/<([^<\/>]*)([^<\/>]*)>([\s]*?|(?R))<\/\1>/imsU'

This removes properly <p> as well as <p class=”entry”></p>

For me, the proposed Javascript solution is like: First you smash a glas and then you present the broom to fix it. A server side solution doesn’t smash the glas in the first place.

photo credit: Eldkvast cc

About the author

Seine erste Webseite hat David Remer 1998 in HTML verfasst. Wenig später war er fasziniert von DHTML und JavaScript. Nach jahrelanger Freelancerei arbeitete er zunächst für Inpsyde und ist heute Entwickler bei Automattic. Außerdem hat er das Buch "WordPress für Entwickler" verfasst.

Leave a Reply

Your email address will not be published.