Building a Better CHM

June 10, 2011

GravatarMichael Snoyman

The Problem

CHMs are a commonly used help format in the Windows world. They are based on HTML, meaning the core content is portable and easy to write. But they've got a number of issues:

  • They're Windows-only. Some of us prefer Linux and Mac.
  • You can't easily take your CHM content and deploy it on the server.
  • There are a number of issues with CHMs, particularly with regards to i18n.

Nonetheless, CHMs provide some features we don't really have in any other system: single-file deployment with search and an index (aka, tri-pane). We have other systems like Eclipse Help which try to do the same thing, but they have requirements on the client side. Another possibility is simply providing the raw HTML to users, but this would require sending multiple files, and precludes the usage of Ajax and other features.

The Solution

The solution we came up with is providing a single executable that embeds a server (Warp) and the static files, and launches a browser to view them. This is a natural extension of Yesod in general, where Hamlet and family are compiled directly into our executables. This approach works fairly well, but required some extra work.


My original idea was to embed Webkit directly in the executable as well. This is really a very cool solution: you never have to worry about Internet Explorer again. However, distributing Webkit (or QtWebkit) was simply too heavy a dependency, and made compiling a nightmare. Paulo Tanimoto and I also took a crack at using MSHTML on Windows, but our win32-fu wasn't up to the task.

So instead, we have wai-handler-launch. This package lets you run any WAI application using Warp, and will automatically launch the default web browser. On Windows, it uses the ShellExecute API call, on Mac it calls out to the open program and on Linux it uses xdg-open. (Note that this may cause trouble for some desktop environments.)

The other interesting thing in this package is automatic shutdown. wai-handler-launch automatically inserts a piece of Javascript into every HTML page that pings the server every minute. If it doesn't get pinged for two minutes, it shuts down.

Template Haskell

The next trick is embedding static files into an executable. For this, we can use Template Haskell (or more specifically, the file-embed package). A Template Haskell splice always lives in the Q monad, which can embed arbitrary IO actions via qRunIO. And using the StringL constructor, we can embed arbitrary content. Therefore, with just a little work, we can embed a whole file at compile time:

embeddedFile :: String
embeddedFile = $(fmap (LitE . StringL) $ readFile "myfile.txt")

One downside of this approach (well, there's a few, but the one I want to mention now) is that it isn't very good for binary data. Sure, we can use Data.ByteString.Char8.pack, but we'll see soon why that's not exactly what we want. Also, GHC has trouble with dealing with string literals that get too big. So instead of StringL, we'll use StringPrimL, which instead of returning a String provides an Addr#. Combined with Data.ByteString.Unsafe.unsafePackAddressLen, we're in good shape.

Well, good shape except for the fact that it doesn't work for non-ASCII data. It turns out that GHC encodes the contents of both StringL and StringPrimL, and automatically decodes the results of StringL. We would need to manually decode this data ourselves. You can see this with a little sample program:

{-# LANGUAGE TemplateHaskell #-}
import Data.ByteString.Unsafe (unsafePackAddressLen)
import Language.Haskell.TH.Syntax
import qualified Data.ByteString as S

main = do
    fromAddr <- unsafePackAddressLen 7 $(return $ LitE $ StringPrimL "123\0\&456")
    print fromAddr
    let fromStr = S.pack $ map (toEnum . fromEnum) $(return $ LitE $ StringL "123\0\&456")
    print fromStr

But considering the next stunt we're about to pull, that's not really an issue.

Modifying an Executable

Here's the last curveball in this project. We're going to need to generate these all-in-one executables on a number of different systems, many of which won't have Haskell compilers set up. Also, it would be very convenient to be able to produce executables for Windows, Linux and Mac on a single system, instead of needing a compiler farm for each new web help we want to deploy.

Yitz Gale (my coworker at Suite Solutions) came up with an idea: let's embed some dummy content with a recognizable pattern in the executable. Then, we'll have another program that comes along and swaps out the dummy data with the real stuff. Downside: we need to pre-define the maximum size of the webhelp. But we can mitigate this disadvantage by generating executables of various sizes, and then using the smallest executable that will hold our data.

And this is why we need to go with StringPrimL. Besides the much quicker compile time, if we used StringL then GHC would attempt to decode the data for us. Now, we can embed whatever arbitrary binary content we want in our executable. GHC will give us an Addr# to the raw bytes, and we can access it directly. Apply some binary pickling scheme, such as the cereal package, and we're in business.

The final result, using some not-yet-released versions of the packages described in this post, are two files: the template code and the injector:

-- Template
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE OverloadedStrings #-}
import Data.FileEmbed
import Data.Serialize (decode)
import Network.Wai.Handler.Launch (run)
import Network.Wai.Application.Static

main = do
    let bs = $(dummySpace 1000000)
    let files = either error id $ decode bs
    run $ staticApp (defaultStaticSettings NoCache)
        { ssFolder = embeddedLookup $ toEmbedded files
        , ssDirListing = StaticDirListing (Just defaultListing) ["index.html", "index.htm"]
-- Injector
import Data.FileEmbed
import Data.Serialize
import qualified Data.ByteString.Char8 as S
import qualified Data.ByteString.Lazy as L
import System.Environment (getArgs)
import Codec.Compression.Zlib (compress)

main = do
    args <- getArgs
    (webhelpTemp, srcDir, dstExe) <-
        case args of
            [a, b, c] -> return (a, b, c)
            _ -> error "Usage: webhelp-inject <webhelp-template.exe> <source dir> <output exe>"
    folder <- getDir srcDir
    injectFile (encode folder) webhelpTemp dstExe


comments powered by Disqus