Hacking the DITA-OT to Print Multiple Pages from a HTML Help File

Here’s one that took a little poking around to figure out. One of the many (nay, countless) drawbacks of using HTML Help (.CHM files) is that printing from them is awful. Ideally, a user could print from the Help Viewer to get a hard copy — or at least a .PDF copy — of the manual. This would reduce the burden of trying to continue to produce .PDF deliverables for writers like myself and keep all of the help in one place for the user, allowing them to print what they need1.

The Problem

So, as a writer, I’d like to be able to style differently for the screen and for print. There are a whole host of reasons why (different readability issues, scale issues, etc.) but often as not because web browsers just don’t print everything they render for the screen. Background colors or images, for instance, don’t get sent to the printer. This can quickly go from a style issue to a readability issue.

While CSS gives the writer a fair bit of control over the display medium, good ‘ole HTML Help is right there to block you. It all works just fine when you print only the current topic. However, HTML Help offers this wonderful little feature which allows you to print the current topic and all child topics (for example, the Chapter heading and all contents of that chapter). Sounds great, right? Except for how it is implemented breaks all links/references between files.

Print Topic dialog in HTML Help Viewer

The Print Topic dialog: The cause of all this trouble.

That’s right. Hyperlinks? Broken. JavaScript? Not only broken, but will now present big, scary error warnings to your user2! And CSS? Completely busted.

You see, when you select this option in HTML Help, Windows copies all of your files (conveniently renaming them, thus breaking links between your topics) into some temporary folder and them concatenates them into one long HTML file, which it then prints just as it would have the single topic file (minus all of the CSS, scripting, and other things I, as a the writer, spent weeks on).

The Solution

Fortunately, we can use one of Windows’ other oddities to combat this one. That is, the very strange behavior of .CHM files in the file system. For some insanely odd reason which I cannot fathom, Windows simply doesn’t care about the folder where a .CHM file was placed. Upon opening it, it sends it to some other place where directories and folders don’t exist and you simply only need to call for the name of the file and it will locate it no matter where it is on your machine.3

So, where we would have put a relative file path within the .CHM file’s internal folders, we will use the MS-ITS syntax call to bring it forth! There is no place safe for these temporary print files that Windows creates due to it’s own absurd behavior.

Now, obviously, this all applies to any HTML Help file, not just one created using the DITA Open Toolkit. However, I’ll show you some of the extra know-how it takes to pull this off from within you’re authoring tools if you’re using DITA. Some other help authoring tools have options to correct for this4.

The Hack

  1. Create a new style sheet for use with print display or simply use an @media print { } block to add print-relates styles to an existing style sheet. You can reference the same style sheet multiple times in the same HTML file with (apparently, in HTML Help Viewer, at least) no ill effects.
  2. Locate the XSL file responsible for adding the <head> contents into your HTML files: dita2htmlImpl.xsl

    For example, for XMetaL, this file is located in C:\Users\<user.name>\AppData\Roaming\SoftQuad\XMetaL Shared\DITA_OT\xsl\xslhtml\, where <user.name> is your Windows user name. On older version of Windows, AppData is Application Data and this is usually a hidden system folder.

  3. Near the end of this long XSL file, you’ll find a number of <xsl:template>s, one of which is used to generate links to CSS files.

    Hint: Just do a search for the string "text/css".

  4. Go to the end of this Template (the </xsl:template> line) and add the following link:

    <link rel="stylesheet" type="text/css" href="MS-ITS:<your.filename>.chm://<file.path>/<stylesheet.name>.css" media="print" />

    Where: <your.filename>.chm is the name of the HTML Help file you’re generating. Note that you’ll need to update this file for any different output filenames you generate, unless you want Windows opening up some random .CHM file every time the user clicks Print.

    <file.path>/<stylesheet.name>.css is the relative file path and stylesheet name inside the HTML Help file. If you really just aren’t sure, grap a copy of 7Zip and use it to peek inside your .CHM (it can read them just like a .ZIP file; awesome thing to have in your toolkit).

I’m fairly certain this concept can be applied to the issue of <scripts>, as well (though I haven’t gotten it to work thus far). However, it will never fix the issue of hyperlinks between topics as this system of concatenating files into a temporary file irrevocably breaks those links. You can’t do a one time read this file back in the source .CHM for that issue.

A huge credit goes to Yuko Ishida who sent the key to this over to Helpware.net. I should also point out that this hack was tested in HTML Help Workshop v 4.74.8702, Windows 7 64-bit, XMetaL Author v5.5, and DITA-OT v1.2.

  1. In engineering, it is occasionally necessary to print off some of the technical reference or methodology sections of design software documentation for clients. []
  2. God only knows we’ll get calls about viruses on this one… []
  3. Truly, this has the potential to wreak havoc should you have two or more .CHM files of the same name on your local drive. However, for the most part, it is completely invisible. I can assure you, I have oodles of copies of various .CHM files with the same name and I only recently learned about this Windows weirdness. []
  4. MadCap Flare, for instance, has an option to correct the appearance of multi-page printing for .CHM files which I’m fairly certain does the same thing as I describe here. []