Efficient YAML Parsing- Follow up

January 21, 2010

GravatarBy Michael Snoyman

Just wanted to give an update on my quest for my efficient YAML file parsing. You can read the previous post for more information. The code has not yet been released, but you can download it from github (repos linked in the previous article).

The use case I'm working on is generating the Atom feed for the photo blog. I have benchmarked three versions of this generation:

  1. The version I've been using up until now, based on data-object-yaml 0.0.0. This version converts the entire YAML data file into an in-memory representation and then converts it to the atom feed data.
  2. The same as the previous one, but built on the most recent version of data-object-yaml.
  3. A version using the new low-level monadic API in the Text.Libyaml module. This code lives in its own module, and is rather ugly (more on this below).

Without further ado: the benchmark results.

Results Version 1 Version 2 Version 3
Maximum memory residency (bytes) 5,474,840 3,987,280 186,504
Heap allocation (bytes) 136,454,032 148,156,096 13,856,304
Total execution time 0.28s 0.28s 0.03s

Unsurprisingly, there are huge gains in memory usage from the third version. There have been minor optimizations made to the library overall (not allocating space for missing tags, for example) which make version 2 more memory efficient than 1, but clearly 3 takes the cake here.

However, why the drastic change in execution time? The efficient version aborts the YAML parsing after finding ten entries; the other two versions read the whole file. This is actually a significant change in behavior; the third version will completely ignore YAML parse errors- or any other kinds of failures- beyond those first ten entries.

That code is ugly

For those of you that looked at the EfficientFeed module, you'll notice how long and tedious it is. Compare it to the beauty and conciseness of the inefficient version. This means:

  • Don't bother using the low-level library for small YAML files. If you use YAML for settings, data-object-yaml is, in my opinion, a good compromise between ease and complexity.
  • The EfficientFeed module looks very boilerplate to me. So boilerplate, in fact, that I think I can write some type of declarative framework for generating that code. I have to play around with the ideas a bit, but I expect it to make the cut into the 0.2.0 release. I'm just not sure if it belongs in yaml or data-object-yaml (I'm leaning to the former).


comments powered by Disqus