Needle in a Haystack

I just finished what was one of the longest days of my career so-far at Bentley. And everything that was bad about today was entirely my own damn fault and could have been easily avoided if I’d just been a bit more careful.

In addition to pushing some publishing updates to our documentation CCMS last night, I also decided to roll out my new Troubleshooting DITA specialization. It’s based on the specialization that is expected to ship with the DITA 1.3 specification sometime next year, but uses our specialized domains and works with DITA 1.2. That’s mostly tech comm nerd talk for I decided to give the writers a new template geared toward writing troubleshooting tips.

Unfortunately, even after thoroughly testing it on our development server, I managed to mess things up by added a comment to a couple of DTD catalog files after all of my testing but before making a backup of the production environment. That is, I didn’t really have a backup of the functioning production server. Rather, I had a copy of some files I had just made a minor edit to, one of which included a critical error. An error I ended up spending all day today trying to locate and correct.

Eventually, I realized I had left a single “>” character in an XML comment copied over to a text catalog file (the text catalogs aren’t XML and have the angle brackets stripped out – something I will now do with XSL instead of manually!). This one particular catalog file is used to locate the DTDs for our desktop DITA editor and no one was able to check out or create new content in our CCMS as a result of that one errant character. It took me about ten hours to figure this out (well, maybe nine hours of panic attack and one hour of actual clear-headed work). Thus leaving half-a-dozen writers with no good way to edit some of their files today as well as me feeling like a jerk for not being more cautious.

I just wrapped up my fixes —both tested thoroughly in their final form and put into place after backups of the working production environment were made. I’m going to check in with my colleagues in India shortly to ensure they can now edit and create content once again.

Lesson learned; and I am humbled.

Make a Folder Manifest for XML Files

One task that has come up quite a lot as I’m working with a lot of XML files (mostly DITA content) is I need a way to create a list of all the XML files within a folder. More than not, I want this list to be an XML file, too. There’s really no folder- (or even file-) level operations in XSLT to do this. It’s simply not what that language is used for. To do this, I had to create a simple script. Using scrips like this is very easy to integrate into the DITA-OT (though not where I use this particular script).

If you’re a web developer, there’s probably many better ways to go about doing this than using a Windows Batch file. You probably already know many of them. This isn’t intended to be used in a web data scenario, but more for local XML data management tasks.

The Windows Batch File

I personally really like the Windows batch file command language. It’s pretty simple, even though it does lack a lot of nice features1. When you want to do folder or file operations in Windows, I think it’s the easiest thing to use even when you’re a really poor programming like I am.

This batch file writes three pieces of information to an external XML file:

  1. It writes a root node to an XML file. It also adds the folder path into an attribute of the root node, which can be useful for post-processing.
  2. For every XML file in the folder, it adds a child node after the root node’s open tag. These child nodes will contain a link to these XML files in the folder.
  3. It writes a close tag for the root note.

I refer to this new XML file as a manifest, as it lists all fo the contents (well, XML files in this case, anyway) in the folder. Once an XML file is created with this information, XSLT can then be used to use or change the information in those files by running against this manifest file.

So, MakeManifest.bat looks like this:

SET output=manifest.xml
ECHO ^<manifest sourcepath="%~dp0"^> > %output%
FOR %%f in ("*.xml") DO (
    ECHO      ^<file href="%%~nf.xml"/^> >> %output%
ECHO ^</manifest^> >> %output%

Copy those lines into a plain text editor and save it with the file extension .bat and give it a try!. That’s all there is to it. If none of that makes any sense to you, I’ll refer you to SS64’s CMD reference page.

It is worth noting that (and the sharp reader might have figured this out already) this list will include a refernce to itself, itself being another XML file in the folder. You could simply rename the output file extension to something else (.txt, .manifest, etc.), which is a good reason I put in a variable to make that easy to do. It doesn’t affect what’s in the file.

Post-Processing the Manifest File

In my case, these XML files tend to be DITA topics. What I’m really after here is to create a DITA map. With a little XSLT file to process this manifest —which can be run from the same Windows batch file— it’s easy to create a DTIA map for all of the DITA topics the script finds in the folder.

Now, to do this, I use Saxon9HE, which is the opens source version of Saxonica’s (Michael Kay’s) XSLT processor. It’s easy to use, very fast, supports the latest versions of everything, and free.

I’ll follow up this post with another soon about how to do just that. I wanted to post this step first so as to not overwhelm someone who is learning (nor give me an excuse to put off posting anything).

  1. Most notably, to me, is regular expressions. However, the RxFind utility is a great way to add regular expression search and replace functionality to your Windows batch files and I use it a lot. []

Batch File Output in MadCap Flare

I have a couple of products which I document using MadCap Flare to generate about two dozen help files and another half-dozen PDFs. These outputs are spread across multiple Flare projects which I inherited. Producing a full set of output for a release can prove to be nearly a full day’s worth of effort so I finally got around to creating a single Windows batch file to take use of the command line interface for Flare. Flare has had the command line feature for a few years now, but regrettably, I just never took the time to learn it. It’s actually very simple to implement, even if you’re not that familiar with writing batch files or the idea of the command line scares you off a bit.

Tools Used

First, I should point out that to further streamline my work, I’ve implemented a couple of other tools besides just Flare. These are all free, open-source tools which I highly recommend you having in your tech-writer toolkit1.

  • 7-Zip – The best compression utility out there. The command line interface is easy to wrap a lot of files into a compressed archive (variety of formats, including .zip).
  • NcFTP – A very easy-to-use FTP which has some command line utilities capable of transfer in passive mode (required for our FTP behind a firewall).
  • Notepad++ – A great text editor which has syntax highlighting for batch files.

And of course Flare. However, you could also easily integrate much of the same workflow into using the DITA Open Toolkit as well as any other help authoring tool with a command line interface.

Set Up

I prefer to use dates in my archive file names just to make things clear for the teams downloading them what ‘version’ it is. Sure, we could just check timestamps, but this just makes it more obvious. I use the international data format — YYYY-MM-DD — as the prefix for my titles and I wanted this automated into my batch file. However, as my region is US on my Windows machine, I need to just change the short date format in the Control Panel to this format. That way, I can use the %date% environment variable to always input the current date when the archive is created.

Aside from that, installing the above tools is all that is required.

Creating the Batch File

Notepad++ can be used to create and edit the Batch file. Simply create a new document and save it (somewhere convenient) with the .bat file extension. This also indicates the file type to Notepad++ so the syntax is highlighted appropriately (simply makes editing easier).

I want to place my outputs in a Zip archive for the convenience of labeling them all with the current date and placing onto a FTP server for other teams to download. So I set a variable to include the current date:

set ZipOut=C:\Documentation\Output\
echo %ZipOut%

(The second line just outputs the same back to me so I can verify the date string was as intended)

Next, I change the directory to the MadCap Flare installation:

cd\Program Files (x86)\MadCap Software\MadCap Flare V8\

Then I can use the command line entry — madbuild — to initiate builds of any number of Flare projects and targets (which are individual outputs from a single-source Flare project).

madbuild -project "C:\Documentation\Product\ProductHelp_A\Product_A.flprj" -log true -target "Product_A HTML Help"
madbuild -project "C:\Documentation\Product\ProductHelp_B\Product_B.flprj" -log true -target "Product_B HTML Help"
madbuild -project "C:\Documentation\Product\ProductHelp_C\Product_C.flprj" -log true -target "Product_C HTML Help"

Next, I want these three compiled HTML Help files to get placed into the ZIP file I named in my variable. This uses the command line interface for 7-Zip:

cd\Program Files\7-Zip
7z a -tzip %ZipOut% @C:\Documentation\Output\Product_file_list.txt

Where Product_file_list.txt is just a plain text file containing the absolute file path and file name of each of the compiled HTML Help files. It’s described in detail in the 7-Zip help, but essentially the entire file path for each file to be included is on a line in the text file. No special syntax or separators required.

Lastly, I want to transfer the ZIP file over FTP to a convenient place for the rest of the team. The default Windows FTP program cannot run in passive mode, which is required to navigate a firewall. However, the Linux FTP client NcFTP has been ported to Windows and has a command line interface which is more flexible.

ncftpput -F -u username -p password /Product/ %ZipOut%

Running the Batch File

Just save the file in your text editor. All that is needed to run it is to simply double-click the .bat file in Windows Explorer. The command line window will open, execute each line in order, and close upon completion.

It would be easy to also use Windows to schedule running the same thing nightly or weekly if you need to regularly post updates of your work.

  1. There are OS X and Linux equivalents to these, but not to Flare, which is why I’ve limited this to Windows. []

Think Inside the Box

I saw this video today demoing a very interesting user manual concept. Essentially, the manual wraps around a device with queues to manipulate the actual device, rather than some screenshots or photos. Basically the manual is more of a physical template (or jig, since I’m using template in the craftsman sense).

Out of the box from Vitamins on Vimeo.

However, I can’t think of a worse device to apply this idea to than a touchscreen smartphone.

Let me explain: I’ve been using an Apple iPhone for about the past four years now1. As much as I initially opposed the idea, Apple was correct in taking things like the SIM card and phone battery out of the hands of the user2. It’s a far superior user experience to design those out of the experience all together, in my opinion. That being said, if you’re going to force your user into awkward set-up necessities, this is about as painless a way to do it as possible. I can image some layered gadget packaging where each section the user opens, they are presented with the next step in setup or assembly (would work great for Ikea products, too!).

Now, as for instructing the user how to do anything on the phone: with a generous sized touch screen, there’s simply no reason why all of these instructions can’t just present themselves on the screen. My favorite apps on th the iPhone are those where the instructions appear as modal dialogs pointing to the most-used features. Add’l help can get included to, but the top two or three tools are called out as soon as the app launches, making any user almost instantly proficient.

So, as much as I like this concept, I’d much rather see all of this inside the box—er, phone—than in some bulky, physical thing that isn’t going to be with you at all times.

In short: I think the manual for a smart phone should simply be one short sentence: Push the power button.

Via Johne Cook, by way of Bill Swallow & Ray Gallon

  1. Yes, this is the part where I start coming off as an Apple fan boy, but bear with me… it applies to any smartphone or other touch-screen device []
  2. Sure, you can still get to the SIM card on an iPhone, but compared to any other phone, it holds virtually no data beyond the user’s account credentials or phone number. []

A DITA & DITA Open Toolkit Reading List

I was in the process of reorganizing my computer science and technical writing shelf today during lunch when I began to notice a pattern: I have quite a few books related to DITA and the underlying technologies of the DITA Open Toolkit. Well, this isn’t by coincidence. It’s a big part of my job and something I’m really interested in. But it occurred to me just how much time I’ve spent pouring through these texts of structured authoring and XML-based technology—all in hopes of grokking this for my job.

Some Light Reading on DITA

So, in no particular order, here’s a list of some of my books on the subject:



A couple of books on Ant & JavaScript that I haven’t even gotten to yet:

And, some wider shots of my (sort of) organized bookshelves:

Non-Fiction Bookshelves

Office Shelves

  1. I have the first edition. I’d recommend getting the later edition. []

Regular Expressions versus XSLT

Last week I came across an epic rant within a forum thread1 about why using regular expressions for parsing XML is a bad idea.

The <center> cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty.

At first, I was a little surprised. I love using regular expressions to make bulk changes throughout an XHTML document or even across a project consisting of hundreds of files. But, after reading through the post several times and thinkng about what I’ve been able to accomplish with some (relatively) simple XSLT files and a XML parser, it occurred to me that it is absolutely correct.

You, see as great as regular expressions are, they are not aware of the context. They have no idea if your matching a pattern within a C++ routine or an XHTML file. They can only parse characters and short strings as they are, with no understanding of their meaning.

EXstensible Stylesheet Language Transforms, on the other hand, are solely for the purpose of manipulating XML content. By definition, they are aware of XML elements and their attributes. The entire purpose of them is high-level modifications. In fact, after having used them now to successfully convert some XHTML to DITA XML, I have to say the powers feel almost god-like.

RegEx still have their use with XML—particularly with badly formed SGML/HTML one might have had dumped in their lap. But if the need is actually manipulating XML elements or attributes within a file (or even across files), then it’s really foolish to try to accomplish something with multiple regular expressions when a single XSL template will do (and often without the unintended consequences of a greedy RegEx).

  1. And when I say epic, I mean it goes from making a case as to why RegEx is simply insufficiently high-level enough to deal with HTML parsing to opening the gates of the abyss and letting the deep ones in to your mind. []

Hacking the DITA-OT to Print Multiple Pages from a HTML Help File

Here’s one that took a little poking around to figure out. One of the many (nay, countless) drawbacks of using HTML Help (.CHM files) is that printing from them is awful. Ideally, a user could print from the Help Viewer to get a hard copy — or at least a .PDF copy — of the manual. This would reduce the burden of trying to continue to produce .PDF deliverables for writers like myself and keep all of the help in one place for the user, allowing them to print what they need1.

The Problem

So, as a writer, I’d like to be able to style differently for the screen and for print. There are a whole host of reasons why (different readability issues, scale issues, etc.) but often as not because web browsers just don’t print everything they render for the screen. Background colors or images, for instance, don’t get sent to the printer. This can quickly go from a style issue to a readability issue.

While CSS gives the writer a fair bit of control over the display medium, good ‘ole HTML Help is right there to block you. It all works just fine when you print only the current topic. However, HTML Help offers this wonderful little feature which allows you to print the current topic and all child topics (for example, the Chapter heading and all contents of that chapter). Sounds great, right? Except for how it is implemented breaks all links/references between files.

Print Topic dialog in HTML Help Viewer

The Print Topic dialog: The cause of all this trouble.

That’s right. Hyperlinks? Broken. JavaScript? Not only broken, but will now present big, scary error warnings to your user2! And CSS? Completely busted.

You see, when you select this option in HTML Help, Windows copies all of your files (conveniently renaming them, thus breaking links between your topics) into some temporary folder and them concatenates them into one long HTML file, which it then prints just as it would have the single topic file (minus all of the CSS, scripting, and other things I, as a the writer, spent weeks on).

The Solution

Fortunately, we can use one of Windows’ other oddities to combat this one. That is, the very strange behavior of .CHM files in the file system. For some insanely odd reason which I cannot fathom, Windows simply doesn’t care about the folder where a .CHM file was placed. Upon opening it, it sends it to some other place where directories and folders don’t exist and you simply only need to call for the name of the file and it will locate it no matter where it is on your machine.3

So, where we would have put a relative file path within the .CHM file’s internal folders, we will use the MS-ITS syntax call to bring it forth! There is no place safe for these temporary print files that Windows creates due to it’s own absurd behavior.

Now, obviously, this all applies to any HTML Help file, not just one created using the DITA Open Toolkit. However, I’ll show you some of the extra know-how it takes to pull this off from within you’re authoring tools if you’re using DITA. Some other help authoring tools have options to correct for this4.

The Hack

  1. Create a new style sheet for use with print display or simply use an @media print { } block to add print-relates styles to an existing style sheet. You can reference the same style sheet multiple times in the same HTML file with (apparently, in HTML Help Viewer, at least) no ill effects.
  2. Locate the XSL file responsible for adding the <head> contents into your HTML files: dita2htmlImpl.xsl

    For example, for XMetaL, this file is located in C:\Users\<>\AppData\Roaming\SoftQuad\XMetaL Shared\DITA_OT\xsl\xslhtml\, where <> is your Windows user name. On older version of Windows, AppData is Application Data and this is usually a hidden system folder.

  3. Near the end of this long XSL file, you’ll find a number of <xsl:template>s, one of which is used to generate links to CSS files.

    Hint: Just do a search for the string "text/css".

  4. Go to the end of this Template (the </xsl:template> line) and add the following link:

    <link rel="stylesheet" type="text/css" href="MS-ITS:<your.filename>.chm://<file.path>/<>.css" media="print" />

    Where: <your.filename>.chm is the name of the HTML Help file you’re generating. Note that you’ll need to update this file for any different output filenames you generate, unless you want Windows opening up some random .CHM file every time the user clicks Print.

    <file.path>/<>.css is the relative file path and stylesheet name inside the HTML Help file. If you really just aren’t sure, grap a copy of 7Zip and use it to peek inside your .CHM (it can read them just like a .ZIP file; awesome thing to have in your toolkit).

I’m fairly certain this concept can be applied to the issue of <scripts>, as well (though I haven’t gotten it to work thus far). However, it will never fix the issue of hyperlinks between topics as this system of concatenating files into a temporary file irrevocably breaks those links. You can’t do a one time read this file back in the source .CHM for that issue.

A huge credit goes to Yuko Ishida who sent the key to this over to I should also point out that this hack was tested in HTML Help Workshop v 4.74.8702, Windows 7 64-bit, XMetaL Author v5.5, and DITA-OT v1.2.

  1. In engineering, it is occasionally necessary to print off some of the technical reference or methodology sections of design software documentation for clients. []
  2. God only knows we’ll get calls about viruses on this one… []
  3. Truly, this has the potential to wreak havoc should you have two or more .CHM files of the same name on your local drive. However, for the most part, it is completely invisible. I can assure you, I have oodles of copies of various .CHM files with the same name and I only recently learned about this Windows weirdness. []
  4. MadCap Flare, for instance, has an option to correct the appearance of multi-page printing for .CHM files which I’m fairly certain does the same thing as I describe here. []

Designing User-Focused Context Sensitive Help

This presentation by Matthew Ellison [Goog docs] given at last year’s Australasian Online Documentation and Content Conference (AODC 2009) has some excellent points on how to craft online help for context sensitive calls. This is something Bentley uses (a lot) and I’m trying to catch up on. There are really a lot of excellent points in these slides. I believe that even if you aren’t employing context-sensitive help, structuring your help as though you were is just as likely to get your users to their answers faster.

Also, the slide in this photo (from the same conference) made me laugh out loud (literally, not in a LOL sort of way).

Clarity Trumps Brevity

Dan Silverman doesn’t like his Avaya desktop phone1 very much. He explains how its cryptic buttons don’t really provide enough information to make sense of their function. He also includes this gem on what happens when industrial design fails (which is almost always, to some extent):

Yes, in the case of electronic devices, the design should intuitively convey how it works without the need for a manual. But if the design is bad, a manual is the next best thing.

Writing the manual or the help should be integral to the process of design and not left until the end (or worse, after the product ships). Good manuals and help can indeed be the next best thing to an inspired design and make products far more usable.

1see how I invented a new phrase to describe an old thing based on the way we do things now?

Open Source Documentation

This is very humbling to me. Last week, at the DocTrain West conference, 25 writers produced a manual for FireFox in just two days as part of the FLOSS Manuals project. The manual is freely available online and is distributed in a Creative Commons CC-BY-SA license. You can purchase a print-on-demand copy of the manual from LuLu as well, which helps to support the FLOSS project. So a special thanks to all those folks who spent some time indoors (when they could have been enjoying Palm Springs) to help the open source community. I’ve already sent a link to the manual to my mom, who uses FireFox on her mac!