A DITA & DITA Open Toolkit Reading List

I was in the process of reorganizing my computer science and technical writing shelf today during lunch when I began to notice a pattern: I have quite a few books related to DITA and the underlying technologies of the DITA Open Toolkit. Well, this isn’t by coincidence. It’s a big part of my job and something I’m really interested in. But it occurred to me just how much time I’ve spent pouring through these texts of structured authoring and XML-based technology—all in hopes of grokking this for my job.

Some Light Reading on DITA

So, in no particular order, here’s a list of some of my books on the subject:

DITA

XML

A couple of books on Ant & JavaScript that I haven’t even gotten to yet:

And, some wider shots of my (sort of) organized bookshelves:

Non-Fiction Bookshelves

Office Shelves

  1. I have the first edition. I’d recommend getting the later edition.

Regular Expressions versus XSLT

Last week I came across an epic rant within a forum thread1 about why using regular expressions for parsing XML is a bad idea.

The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty.

At first, I was a little surprised. I love using regular expressions to make bulk changes throughout an XHTML document or even across a project consisting of hundreds of files. But, after reading through the post several times and thinkng about what I’ve been able to accomplish with some (relatively) simple XSLT files and a XML parser, it occurred to me that it is absolutely correct.

You, see as great as regular expressions are, they are not aware of the context. They have no idea if your matching a pattern within a C++ routine or an XHTML file. They can only parse characters and short strings as they are, with no understanding of their meaning.

EXstensible Stylesheet Language Transforms, on the other hand, are solely for the purpose of manipulating XML content. By definition, they are aware of XML elements and their attributes. The entire purpose of them is high-level modifications. In fact, after having used them now to successfully convert some XHTML to DITA XML, I have to say the powers feel almost god-like.

RegEx still have their use with XML—particularly with badly formed SGML/HTML one might have had dumped in their lap. But if the need is actually manipulating XML elements or attributes within a file (or even across files), then it’s really foolish to try to accomplish something with multiple regular expressions when a single XSL template will do (and often without the unintended consequences of a greedy RegEx).

  1. And when I say epic, I mean it goes from making a case as to why RegEx is simply insufficiently high-level enough to deal with HTML parsing to opening the gates of the abyss and letting the deep ones in to your mind.

Hacking the DITA-OT to Print Multiple Pages from a HTML Help File

Here’s one that took a little poking around to figure out. One of the many (nay, countless) drawbacks of using HTML Help (.CHM files) is that printing from them is awful. Ideally, a user could print from the Help Viewer to get a hard copy — or at least a .PDF copy — of the manual. This would reduce the burden of trying to continue to produce .PDF deliverables for writers like myself and keep all of the help in one place for the user, allowing them to print what they need1.

The Problem

So, as a writer, I’d like to be able to style differently for the screen and for print. There are a whole host of reasons why (different readability issues, scale issues, etc.) but often as not because web browsers just don’t print everything they render for the screen. Background colors or images, for instance, don’t get sent to the printer. This can quickly go from a style issue to a readability issue.

While CSS gives the writer a fair bit of control over the display medium, good ‘ole HTML Help is right there to block you. It all works just fine when you print only the current topic. However, HTML Help offers this wonderful little feature which allows you to print the current topic and all child topics (for example, the Chapter heading and all contents of that chapter). Sounds great, right? Except for how it is implemented breaks all links/references between files.

Print Topic dialog in HTML Help Viewer

The Print Topic dialog: The cause of all this trouble.

That’s right. Hyperlinks? Broken. JavaScript? Not only broken, but will now present big, scary error warnings to your user2! And CSS? Completely busted.

You see, when you select this option in HTML Help, Windows copies all of your files (conveniently renaming them, thus breaking links between your topics) into some temporary folder and them concatenates them into one long HTML file, which it then prints just as it would have the single topic file (minus all of the CSS, scripting, and other things I, as a the writer, spent weeks on).

The Solution

Fortunately, we can use one of Windows’ other oddities to combat this one. That is, the very strange behavior of .CHM files in the file system. For some insanely odd reason which I cannot fathom, Windows simply doesn’t care about the folder where a .CHM file was placed. Upon opening it, it sends it to some other place where directories and folders don’t exist and you simply only need to call for the name of the file and it will locate it no matter where it is on your machine.3

So, where we would have put a relative file path within the .CHM file’s internal folders, we will use the MS-ITS syntax call to bring it forth! There is no place safe for these temporary print files that Windows creates due to it’s own absurd behavior.

Now, obviously, this all applies to any HTML Help file, not just one created using the DITA Open Toolkit. However, I’ll show you some of the extra know-how it takes to pull this off from within you’re authoring tools if you’re using DITA. Some other help authoring tools have options to correct for this4.

The Hack

  1. Create a new style sheet for use with print display or simply use an @media print { } block to add print-relates styles to an existing style sheet. You can reference the same style sheet multiple times in the same HTML file with (apparently, in HTML Help Viewer, at least) no ill effects.
  2. Locate the XSL file responsible for adding the <head> contents into your HTML files: dita2htmlImpl.xsl

    For example, for XMetaL, this file is located in C:\Users\<user.name>\AppData\Roaming\SoftQuad\XMetaL Shared\DITA_OT\xsl\xslhtml\, where <user.name> is your Windows user name. On older version of Windows, AppData is Application Data and this is usually a hidden system folder.

  3. Near the end of this long XSL file, you’ll find a number of <xsl:template>s, one of which is used to generate links to CSS files.

    Hint: Just do a search for the string "text/css".

  4. Go to the end of this Template (the </xsl:template> line) and add the following link:

    <link rel="stylesheet" type="text/css" href="MS-ITS:<your.filename>.chm://<file.path>/<stylesheet.name>.css" media="print" />

    Where: <your.filename>.chm is the name of the HTML Help file you’re generating. Note that you’ll need to update this file for any different output filenames you generate, unless you want Windows opening up some random .CHM file every time the user clicks Print.

    <file.path>/<stylesheet.name>.css is the relative file path and stylesheet name inside the HTML Help file. If you really just aren’t sure, grap a copy of 7Zip and use it to peek inside your .CHM (it can read them just like a .ZIP file; awesome thing to have in your toolkit).

I’m fairly certain this concept can be applied to the issue of <scripts>, as well (though I haven’t gotten it to work thus far). However, it will never fix the issue of hyperlinks between topics as this system of concatenating files into a temporary file irrevocably breaks those links. You can’t do a one time read this file back in the source .CHM for that issue.

A huge credit goes to Yuko Ishida who sent the key to this over to Helpware.net. I should also point out that this hack was tested in HTML Help Workshop v 4.74.8702, Windows 7 64-bit, XMetaL Author v5.5, and DITA-OT v1.2.

  1. In engineering, it is occasionally necessary to print off some of the technical reference or methodology sections of design software documentation for clients.
  2. God only knows we’ll get calls about viruses on this one…
  3. Truly, this has the potential to wreak havoc should you have two or more .CHM files of the same name on your local drive. However, for the most part, it is completely invisible. I can assure you, I have oodles of copies of various .CHM files with the same name and I only recently learned about this Windows weirdness.
  4. MadCap Flare, for instance, has an option to correct the appearance of multi-page printing for .CHM files which I’m fairly certain does the same thing as I describe here.

Designing User-Focused Context Sensitive Help

This presentation by Matthew Ellison [Goog docs] given at last year’s Australasian Online Documentation and Content Conference (AODC 2009) has some excellent points on how to craft online help for context sensitive calls. This is something Bentley uses (a lot) and I’m trying to catch up on. There are really a lot of excellent points in these slides. I believe that even if you aren’t employing context-sensitive help, structuring your help as though you were is just as likely to get your users to their answers faster.

Also, the slide in this photo (from the same conference) made me laugh out loud (literally, not in a LOL sort of way).

Clarity Trumps Brevity

Dan Silverman doesn’t like his Avaya desktop phone1 very much. He explains how its cryptic buttons don’t really provide enough information to make sense of their function. He also includes this gem on what happens when industrial design fails (which is almost always, to some extent):

Yes, in the case of electronic devices, the design should intuitively convey how it works without the need for a manual. But if the design is bad, a manual is the next best thing.

Writing the manual or the help should be integral to the process of design and not left until the end (or worse, after the product ships). Good manuals and help can indeed be the next best thing to an inspired design and make products far more usable.

1see how I invented a new phrase to describe an old thing based on the way we do things now?

Open Source Documentation

This is very humbling to me. Last week, at the DocTrain West conference, 25 writers produced a manual for FireFox in just two days as part of the FLOSS Manuals project. The manual is freely available online and is distributed in a Creative Commons CC-BY-SA license. You can purchase a print-on-demand copy of the manual from LuLu as well, which helps to support the FLOSS project. So a special thanks to all those folks who spent some time indoors (when they could have been enjoying Palm Springs) to help the open source community. I’ve already sent a link to the manual to my mom, who uses FireFox on her mac!

Using Variables with Find/Replace in Flare

This one is a pretty simple trick and, to be honest, one that a lot of folks probably figured out sooner than I did. With just a little bit of work, you can easily replace oft-used words or phrases in your Flare project with a variable. This is especially useful if you find yourself writing early on in the development process where some terminology of features or a product interface are subject to change.

Or, if you’re like me and you just don’t know what the hell such-and-such thing is called and the development team has yet to answer your e-mail asking because they’re too busy forwarding it to everyone else in the company who’ll get a good laugh out of the ridiculously silly question. Okay, that hasn’t actually happened (except for the part about me not knowing what something is actually called). At least not that I’m aware of.

So, here are the steps for finding all the instances of a term and replacing it with a variable:

  1. Create a Variable in the MyVarables set.

    Note: It’s good practice to use camel notation when naming your variable. Keep it short, but make it something you can easily identify (variables don’t have dental records and teeth and such in the event of a serious accident). And be consistent in how you name things!

  2. Launch the Find and Replace panel by selecting Edit > Find and Replace > Find and Replace from the menu bar, or just press Ctrl + F.
  3. Enter the text you wish to substitute with a your new variable in the Find What field. Under the options section, select (whole project) for the Find In: field, Topics for File Types, and make sure the option for Find in source code is cleared (though we’ll use that option in a moment).
  4. Click the Start button to locate the first instance of the text.
  5. The searched-for text will be selected for you in the XML editor within a topic file. Click the topic’s tab along the editor window just to make that part of the program window active. Now, select Insert > Variable… from the menu bar to open the Variables dialog.
  6. Select the MyVariables set and then the variable you’ll be using to replace this particular text with. Click the OK button.
  7. Now, you need to get the actual markup for this variable. The fastest way I know to do so is click the Locate in Content Explorer button in the Standard toolbar. Then, with the topic file now selected, right click and select Open With > Internal Text Editor. Now, hunt around until you locate the variable tag. It looks like this:

    <MadCap:variable name="MyVariables.SuchAndSuch" />

    Select and copy this entire tag.

    Note: You can also use the Send To menu button, also located on the standard toolbar. It’s the one that looks like an envelope and that you probably thought was just for e-mailing a file. However, it will actually open up the current file in an external program, including your handy text editor (I use TextPad).

  8. Now, back in the Find and Replace panel, this is what you’ll paste into the Replace with: field. But now you’re going to make sure that the Find in source code option is now selected.
  9. Click the Start button again (you changed the options since you last did so). Use the Replace and Find Next buttons to swap out the text with the variable markup one by one.

A Note of Caution

You’re going to be replacing text in the source markup here so be careful. I strongly urge you to not use the Replace In All Files button. It’s fast but it’s also risky. You’ll replace any instance of the text; anywhere: keywords, etc. You might find yourself putting a variable tag where it really doesn’t belong. Fortunately, Flare will likely just give you a gentle scolding and ignore your silly little nonsense. But, you might just find a loophole you wish you hadn’t. It’s best to do this one-at-a-time, even if that takes a while.

Ideally, MadCap would add an option in the the Replace With field to just select one of your variables from there. This way, you don’t have to Find/Replace in source code and run the risk of doing something unintended (hopefully they’d handle all that under the hood). But until then, only replace what you are sure is content material and not anything else.

Extend This Trick

Now, you can also use this little trick for finding and replacing other code, so you could add a particular style to any instance of a phrase. Ex: replace "OK button" with "<strong>OK</strong> button". I’ve yet to find a limit to the number of characters available in the Find and Replace field, but I suspect it’s probably around 256 or so. I don’t think you’ll be replacing A Tale of Two Cities with War and Peace using this.

Further, you can use regular expression for — well — anything that you just about think of, I suppose. You can also use wildcards which though not as sexy as RegEx are still quite useful when just doing text search. If you’re just looking for any instance of noun — plural or singular; RegEx might be swatting flies with tanks.