Writing a blog aimed at web developers, such as this one, often requires embedding raw HTML code on the page for review. We could use HTML's pre element, but then we'd miss out on the syntax highlighting that facilitates code review.
Other solutions#
A simple web search for HTML syntax highlighter yields many solutions, most of which offload the task to Javascript. In essence, the web developer is asked to embed a rather large, external Javascript file which will scour the page for blocks of code and prettify them on the fly.
This is an okay approach, if not for the extra bandwidth, the call to an external Javascript file (XSS, anyone?), and the fact that it requires Javascript to function. What we wanted, instead, was server-side code that mimics PHP's built-in highlight_string. You can see the result of using this function in the screenshots above.
Our solution#
Rather than requiring a massive library for a simple task, we set out to create our own simple XML (and, by extension, XHTML) syntax highlighter that runs on PHP. Of course, we adhere to our own coding standards, which make use of Object-Oriented Programming, and libraries to provide functionality cheaply and on-demand.
The result is the XmlHighlighter class, described in more detail below.
How it works#
The XmlHighlighter class accepts a string or a filename (similar to PHP's highlight_string and highlight_file).
It then parses the file using PHP's DOM extension. Once loaded, the class calls a recursive method to iterate over the XML tree-structure, parsing each node type and producing a nested tree structure.
XmlHighlighter makes ample use of HTML classes to provide contextual, syntactical meaning to each element, such as:
- tagname
- attrname
- attrvalue
- comment
- commentvalue
- literal
- pi (processing instruction)
- doctype
- unknown-# (# is the PHP value of the nodeType attribute)
This in turn allows the web designer to stylize (and colorize) the code using CSS.
Demonstration#
To see the code in action, consider the HTML structure below, which is the basic skeleton of the file you're reading:
- <!DOCTYPE html>
- <html itemscope="itemscope" itemtype="http://schema.org/Corporation">
- <head prefix="og: http://ogp.me/ns#">
- <title>
- XML/HTML Syntax Highlighter in PHP | OpenWeb Solutions, LLC
- <script async="async" type="text/javascript" src="/auto-toc.js" />
- <meta property="og:site_name" content="OpenWeb Solutions, LLC" />
- <meta property="og:type" content="article" />
- <meta property="og:title" content="XML/HTML Syntax Highlighter in PHP" />
- <meta property="og:description" content="A simple, drop-in solution for fully customizable, highly structured, XML (and XHTML) code highlighter written in PHP." />
- <meta property="og:url" content="http://www.openweb-solutions.net/blog/xml-highlighter/" />
- <meta property="og:article:published_time" content="2014-02-12" />
- <meta property="og:article:section" content="Blog" />
- <script id="twitter-wjs" async="async" defer="defer" type="text/javascript" src="//platform.twitter.com/widgets.js" />
- <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
- <meta name="generator" content="OpenWeb Solutions, LLC" />
- <meta name="viewport" content="width=device-width, initial-scale=1, maximum-scale=1" />
- <meta name="robots" content="index, follow" />
- <meta name="description" content="A simple, drop-in solution for fully customizable, highly structured, XML (and XHTML) code highlighter written in PHP." />
- <meta name="twitter:card" content="summary" />
- <meta name="twitter:title" content="XML/HTML Syntax Highlighter in PHP | OpenWeb Solutions, LLC" />
- <meta name="twitter:description" content="A simple, drop-in solution for fully customizable, highly structured, XML (and XHTML) code highlighter written in PHP." />
- <meta name="twitter:site" content="@OpenWebSolns" />
- <link rel="license" href="#copyright" />
- <link media="screen" type="text/css" href="/style.css" rel="stylesheet" />
- <link media="screen" type="text/css" href="/style-mobile.css" rel="stylesheet" />
- <link media="print" type="text/css" href="/style-print.css" rel="stylesheet" />
- <title>
- <body>
- <!-- [if lt IE 9]> <p style="font-family:sans-serif;background:#FAEB76;padding:1em;border:1px solid orange;text-align:center;margin:1em;font-size:120%;">Oh no! This site is built on ultra-cool technology that your version of IE does not understand. Don't despair: virtually <a href="http://firefox.com">any</a> <a href="http://chrome.com">other</a> <a href="http://opera.com">browser</a> will work. For your own sanity, consider switching for good right now.</p> <![endif] -->
- <nav class="nav" title="Navigate this page">
- <ul>
- <li>
- <a href="#content">
- Skip to body...
- <a href="#content">
- <li>
- <a href="#footerdiv">
- Skip to footer...
- <a href="#footerdiv">
- <li>
- <ul>
- <header id="header">
- <ul id="headerwrap">
- <li id="logodiv">
- <a itemprop="url" href="/">
- <img id="logoimg" itemprop="logo" src="/logo.png" alt="Logo" width="170" height="40" />
- <a itemprop="url" href="/">
- <li>
- <a href="/about/">
- About
- <a href="/about/">
- <li>
- <a href="/services/">
- Services
- <a href="/services/">
- <li>
- <a href="/portfolio/">
- Portfolio
- <a href="/portfolio/">
- <li>
- <a href="/blog/">
- Blog
- <a href="/blog/">
- <li>
- <a href="/games/">
- Games
- <a href="/games/">
- <li id="contactdiv">
- <p>
- Contact us:
- <p>
- <strong>
- Miami
- : (786) 474-6457
- <br />
- <strong>
- Montréal
- : (438) 792-4891
- <strong>
- <p>
- <li id="socialdiv">
- <ul id="social-list">
- <li>
- <a href="https://twitter.com/OpenWebSolns">
- <img src="/tw.png" alt="Twitter" width="20" height="20" />
- <a href="https://twitter.com/OpenWebSolns">
- <li>
- <ul id="social-list">
- <li id="logodiv">
- <ul id="headerwrap">
- <div id="contentwrap">
- <main id="content">
- <nav id="breadcrumbs">
- <ol>
- <li>
- <a href="/">
- Home
- <a href="/">
- <li>
- <a href="/blog/">
- Blog
- <a href="/blog/">
- <li>
- <ol>
- <section id="bodysec">
- <article itemscope="itemscope" itemtype="http://schema.org/BlogPosting">
- <header>
- <h1 itemprop="name">
- XML/HTML Syntax Highlighter in PHP
- <h2 class="published">
- <time itemprop="datePublished" datetime="2014-02-12">
- February 12, 2014
- <time itemprop="datePublished" datetime="2014-02-12">
- <h1 itemprop="name">
- <header>
- <article itemscope="itemscope" itemtype="http://schema.org/BlogPosting">
- <section id="socialsec">
- <p id="twitter-message">
- Continue the discussion on Twitter.
- <a class="twitter-follow-button" data-size="large" data-show-count="false" href="https://twitter.com/OpenWebSolns">
- Follow @OpenWebSolns
- <p id="twitter-message">
- <nav id="breadcrumbs">
- <main id="content">
- <footer id="footerdiv">
- <address id="copyright">
- © Copyright OpenWeb Solutions, LLC, 2012-2014
- <div id="footinfo">
- <p>
- Call or
- <a href="mailto:inquiries@openweb-solutions.net">
- to find out how to make your project a reality!
- <a href="/sitemap/">
- View our sitemap
- .
- <p>
- We support
- <a href="http://www.fsf.org">
- <img src="/fsf.png" alt="FSF" width="206" height="24" />
- <a href="http://validator.w3.org/check?uri=http%3A%2F%2Fwww.openweb-solutions.net%2F;verbose=1">
- <img src="/html5.png" alt="HTML5" width="32" height="32" />
- .
- <a href="/license/">
- License
- .
- <p>
- <address id="copyright">
- <head prefix="og: http://ogp.me/ns#">
Download the source file for comparison.
XmlHighlighter class#
Below is the full source of the class in question, this time colorized by PHP's highlight_string function (as part of our semantic text editor). You can also download the source directly.
<?php
/**
* Set of tools for syntax highlighting of XML documents.
*
* The main class, XmlHighlighter, uses PHP's DOM extension to parse
* the given input, and then converts that input into a list
* (XUl). As such, this class requires PHP's DOM extension as well as
* the OWS's HtmlLib.
*
* @author OpenWeb Solutions, LLC
* @version 2014-02-11
*/
require_once('XML/HtmlLib.php');
/**
* Exception for highlighters
*
* @author OpenWeb Solutions, LLC
* @version 2014-02-11
*/
class XmlHighlighterException extends Exception {}
/**
* Code to highlight XML snippets
*
* @author OpenWeb Solutions, LLC
* @created 2014-02-11
*/
class XmlHighlighter {
protected $dom;
protected $output;
protected $classPrefix = '';
protected $nonEmptyNodes = array();
/**
* Creates a new XML Highlighter
*
*/
public function __construct() {
$this->dom = new DOMDocument();
}
public function setClassPrefix($pre = "") {
$this->classPrefix = $pre;
}
/**
* Specify list of nodeNames that should always contain closing tags.
*
* This will render as <div></div> instead of <div /> even if there
* are no children, for all nodeNames specified in list.
*
* @param Array $list the list of nodeNames
*/
public function setNonEmptyNodes(Array $list) {
$this->nonEmptyNodes = $list;
}
/**
* Produce an XUl element containing the given fragment or document
*
*/
public function highlight(&$text) {
if (!$this->dom->loadXML($text))
throw new XmlHighlighterException("Unable to parse string.");
$this->output = new XUl();
$this->dom->normalize();
for ($i = 0; $i < $this->dom->childNodes->length; $i++) {
$this->convert($this->dom->childNodes->item($i), $this->output);
}
return $this->output;
}
/**
* Convenience wrapper around highlight
*
* @see highlight
*/
public function highlightFile($filename) {
$text = file_get_contents($filename);
return $this->highlight($text);
}
/**
* Recursively parses the given node and fill the given $output
*
*/
protected function convert($node, $output) {
switch ($node->nodeType) {
case XML_ELEMENT_NODE:
$li = new XLi(array(new XSpan("<"),
new XSpan($node->nodeName, array('class' => $this->classPrefix . 'tagname'))));
$output->add($li);
// attributes
for ($i = 0; $i < $node->attributes->length; $i++) {
$this->convert($node->attributes->item($i), $li);
}
// children
if ($node->hasChildNodes()) {
$li->add(new XSpan(">"));
$li->add($sub = new XUl());
for ($i = 0; $i < $node->childNodes->length; $i++) {
$this->convert($node->childNodes->item($i), $sub);
}
}
else {
if (!in_array($node->nodeName, $this->nonEmptyNodes)) {
$li->add(new XSpan(" />"));
break;
}
$li->add(new XSpan(">"));
}
// close tag
$li->add(new XSpan("</"));
$li->add(new XSpan($node->nodeName, array('class' => $this->classPrefix . 'tagname')));
$li->add(new XSpan(">"));
break;
case XML_ATTRIBUTE_NODE:
$output->add(" ");
$output->add(new XSpan($node->nodeName, array('class' => $this->classPrefix . 'attrname')));
$output->add("=");
$output->add(new XSpan(sprintf("\"%s\"", $node->nodeValue), array('class' => $this->classPrefix . 'attrvalue')));
break;
case XML_TEXT_NODE:
$val = trim($node->nodeValue);
if (strlen($val) > 0) {
$output->add(new XLi($node->nodeValue, array('class' => $this->classPrefix . 'literal')));
}
break;
case XML_COMMENT_NODE:
$output->add(new XLi(array("<!-- ",
new XSpan($node->nodeValue, array('class' => $this->classPrefix . 'commentvalue')),
" -->"),
array('class' => $this->classPrefix . 'comment')));
break;
case XML_PI_NODE:
$li = new XLi(array("<?",
new XSpan($node->nodeName, array('class' => $this->classPrefix . 'tagname')),
" ",
new XSpan($node->nodeValue, array('class' => $this->classPrefix . 'attrname'))),
array('class' => $this->classPrefix . 'pi'));
$li->add("?>");
$output->add($li);
break;
case XML_DOCUMENT_TYPE_NODE:
$li = new XLi(new XSpan($node->internalSubset, array('class' => $this->classPrefix . 'doctype')));
$output->add($li);
break;
default:
$output->add(new XLi($node->nodeValue, array('class' => $this->classPrefix . 'unknown-' . $node->nodeType)));
break;
}
}
public static function htmlNonEmptyNodes() {
return array('div', 'td', 'tr', 'table');
}
}
?>
Possible drawbacks#
As mentioned in the comments of the file and in the section on How it works, this particular solution makes use of PHP's DOM extension. Thus, it requires at least PHP 5, which we hope you're using for your own sake.
The current implementation also requires the HtmlLib library we discussed in our XML Library article. Specifically, it requires the following objects:
- XUl (the UL element)
- XLi (the LI element)
- XSpan
These classes are small enough that they may be embedded directly in the file above, at the developer's discretion.
For completeness, we have added the XmlHighlighter class to our PHPLIB repository.
Advantages#
The code works with any valid XML fragment or document. Because it is a PHP class, it can be extended as needed in order to tweak its functionality. It's very small, and therefore easy to debug and work with. It can also handle any schema.
It is our sincerest hope that this simple solution to a common problem serves the needs of other web developers, and also inspire others to extend it and create highlighters of their own.
Continue the discussion on Twitter. Follow @OpenWebSolns