Why to escape get_permalink()?
Recently I started to create a new theme and found underscores (_S) to work with. Underscores is well-known to follow the WordPress coding standards and so I decided to use it in order to start my new theme.
I stumbled over some piece of code, which was in my opinion a bit of awkward. I started to wonder about it. To sum it up, it was the following line:
- This function is supposed to return a string with the URL address of the current post. You can also send the ID of the post or the post object.
- This function allows you to sanitize URLs. It gets rid of invalid characters, checks the protocol etc.
I started to wonder, since get_permalink() is supposed to return links, why does this function not sanitize the link itself? But, this looks to be by purpose, since a look into the function the_permalink(), which echos the link of get_permalink(), reveals: Here the link gets sanitize before its echoed.
When I started to search for a reason, I found a small discussion about “Too Much Escaping“, started by Konstantin in his blog. He started his discussion with this example:
which he calls
Dirty, difficult to read and understand, and even more difficult to spot an error Konstantin Kovshenin
True. Related to my specific problem why to escape the permalink, he just suggests pragmatically to use
the_permalink(); for a better read. But,
the_permalink(); escapes, why so?
I call that paranoia. […] The permalink [… is] never going to break out of the attributes syntax. Konstantin Kovshenin
If so, a lot of useless code has been written. The basic question would be:
'What happens with the Ampersand?', 'post_type' => 'post', 'post_status' => 'publish', 'post_name' => 'what-happens-with-the-ampersand-&' ); $post_id = wp_insert_post( $args ); echo get_permalink( $post_id ); ?>
wp_insert_post() sanitize the
post_name before saving it. So in order to get a not-sanitized URL out of get_permalink(), this URL needs to be injected beyond
If we read the code, we realize, WordPress does not exclude this possibility at all. There might be reasons for a WordPress developer to use ampersands (I will stick with the ampersand for now) in post links even if this means to go beyond
wp_insert_code(). If this is the case:
esc_url() will transform the ampersand to its numeric entity. So the URL http://www.youtube.com/watch?feature=player_detailpage&v=6q7LPD2KjNE#t=110
would be http://www.youtube.com/watch?feature=player_detailpage&v=6q7LPD2KjNE#t=110
And lets say, this URL is not just used in HTML documents, but also in text documents. The user can download the text as a text file, which is dynamically produced. At the end of this file, the user finds the URL to its origin. In this case & would not equal & anymore. So, if we would use
we would get a wrong URL, while
would output the right address.
Something like a conclusion
So, I am not quite sure. The example above is quite a bit far fetched, but there might be reasons, why to sanitize permalinks and if you are working on themes and plugins for a wider audience, maybe you should sanitize even
get_permalink(), or – whenever you can – use
. Or am I paranoid?
Any thoughts on this?
photo credit: BrittneyBush cc