Slicing Strings with PHP: Be Mindful of Output that Contains HTML Tags

When experimenting with strings that contain HTML code, be mindful of what you're getting for output. Especially if there is something unexpected about the results. That's what I learned the hard way when extracting an open anchor tag from the source code of a web page. The variables used to locate the anchor tag appeared to be working, but for some reason the extracted code wouldn't display to the screen. Let's take a look at where I went wrong.

The Problem

Let's say we're tasked with getting the open anchor tag for the "About Us" link from the code below. Note that there is more code, but we'll keep things simple.

$code_from_website = '...ul id="Menu1"><li><a href="/about/" onmouseover="ShowMenu(\'Menu1\')" onfocus="ShowMenu(\'Menu1\')" onmouseout="HideMenu()" onblur="HideMenu()">About Us</a></li><li><a href="/about/history.php" onmouseover="ShowMenu(\'Menu1\')" onfocus="ShowMenu(\'Menu1\')" onmouseout="HideMenu()" onblur="HideMenu()">History</a></li><li><a href=...';

To extract the anchor tag, we'll need to figure out where the link text (About Us) starts and where the corresponding open anchor tag is. The print statements are there for us to see that we're getting seemingly valid values.

$position_linkText = strpos($code_from_website, 'About Us');
$position_anchorStart = strrpos( substr($code_from_website, 0, $position_linkText), '<a');
print "<div>Link Text Position: $position_linkText</div>";
print "<div>Open Anchor Position: $position_anchorStart</div>";

With the new variables, we know where the open anchor tag is in the overall string. So let's grab the code and display the result.

$length_openAnchor = $position_linkText-$position_anchorStart;
$openAnchorCode = substr($code_from_website, $position_anchorStart, $length_openAnchor);
print "<div>Open Anchor Tag Length: $length_openAnchor</div>";
print "<div>Anchor Code: $openAnchorCode</div>";

The first print statement works as expected, but the second one doesn't seem to display anything. So what went wrong? Everything else displays fine…

Well, if we think about; the answer is obvious. The $openAnchorCode variable contains HTML code for to open an anchor tag. What happens if an anchor tag is displayed without a text label? We get an anchor tag that's only visible by looking at the source code. With that in mind, we just need to make one small change to our code.

print "<div>Anchor Code: " . htmlentities($openAnchorCode) . "</div>";


The moral of the story is to be mindful of what's being displayed. If the output contains HTML tags, or anything that can be interpreted by the browser, mysterious things may happen. We'll end up wasting time trying to determine where the program went wrong…and most likely looking in the wrong place for the fix.


There are currently no comments.

Leave a Comment