Web Application Interface 0.2 Ideas

July 9, 2010

GravatarMichael Snoyman

There's a number of ideas I've had for fixing up the Web Application Interface, and finally decided to put them all together. I've created new branches for wai, wai-extra and yesod.

In no particular order, here are the items I've changed. I'm very happy to hear any critiques of these, so don't hold back.

Removed the buffer function

I wrote a buffer function which performs badly, and I want to remove it. Nuf said.

Less data types, more type synonyms

An obvious definition for HTTP request methods is:

data Method = GET | POST | PUT | DELETE

Obvious... except I forgot OPTIONS. No problem, I'll add it. And I forgot TRACE. OK, add that too. Oh wait: you can have arbitrary HTTP methods if you like. Doh!

WAI 0.0 and 0.1 therefore had an extra constructor Method B.ByteString, which allowed you to specify any request method you wanted. All seemed well and good to me, until Jeremy Shaw commented on an entirely different topic: "The use of a type ... implies that you want to be pattern matching on the constructors in your code".

And I realized he was absolutely right. It's a bad idea to create a data type with constructors that can't be pattern matched on, and that's exactly what I'd created. After all, Method "GET" == GET, so pattern matching is out of the question. Sure enough, I've received a few patches for WAI code that mistakenly attempts to pattern match. I'd created a horrible monster which thwarted all that is well and good in type-safe programming! </drama>

So instead, the definition is now simply type Method = B.ByteString. And all is good and right in the world.

This same change was applied to the other relevant datatypes in Network.Wai: HttpVersion, RequestHeader, ResponseHeader and Status.

Functions are the new constructors

I'm on the fence for this one, so please let me know what you think. I've replaced the GET constructor with a methodGET function. This is simply defined as methodGET = B8.pack "GET"; and the same is done for all other constructors.

But frankly, I think this just litters the namespace. I have not implemented these functions for RequestHeader and ResponseHeader, and I don't think I will. I'm fairly certain I dislike the functions for Method, but will probably keep the definitions for HttpVersion and Status.

Added CIByteString

Gregory Collins pointed out that request and response headers should be case-insensitive. As a result, WAI 0.1 made sure their Eq instance handled this properly. Now, due to the switch to type synonyms, we define a new datatype named CIByteString (which looks strangely familiar) and use that for RequestHeader and ResponseHeader.

RequestBody datatype

WAI 0.0/0.1 define a response body as Either FilePath Enumerator. We all know how wonderful enumerators are versus lazy I/O... but how good are they versus lazy evaluation. In other words, which is faster: an enumerator, or a purely created lazy bytestring? I don't have specific benchmarks, but my work on Hamlet indicated to me that lazy bytestrings win.

The vast majority of web applications will not often need an enumerator interface. Mostly, static files and lazy bytestrings will be sufficient. Hack has done a very good job with only providing a lazy bytestring interface. So I would like to add a lazy bytestring response body to WAI; not to replace enumerators, but to augment them. So I've created a new datatype:

data ResponseBody = ResponseFile FilePath
                  | ResponseEnumerator Enumerator
                  | ResponseLBS L.ByteString

This adds a bit of complexity to handler code, but I think it's worth it. I've also replaced the fromEitherFile function with fromResponseBody. As a bonus, I think it's nicer using a specific datatype than an Either.

What hasn't changed

There are three decisions in particular that I've decided to stick with, even though there's been some pressure to change this. I'd like to state these out loud to re-open the debate, because I want to make sure this is the right decision.

The first is fairly trivial: whether the Request datatype has a record for script name. Hack and the Hyena WAI both have it, but I've removed it for two reasons:

  1. It has no meaning for many backends, particularly any stand-alone server.
  2. Backends where is does have meaning (CGI, FastCGI) do not provide reliable data on it. Whenever you use URL rewriting (which I hope you're using), it's nearly impossible to get a meaningful script name.

The second issue is the Source datatype as a request body. People have said it should simply be an Enumerator. I have this to say:

  1. Any Source can be converted to an Enumerator without issue, but the reverse is not true.
  2. A Source is easier to parse than an Enumerator.
  3. A Source allows you to implement backtracking parsing of request bodies without reading the entire request body into memory.
  4. I don't think there is significant difficulty placed on backend implementations to provide the more powerful Source type versus an Enumerator.

The final issue is the definition of Enumerator. There's lots of work in the iteratee package, and it seems to be blazing the trail for this domain. However:

  1. WAI should be low-dependency, and introducing a dependency on the iteratee package would be too high a burden. (I realize this isn't a full argument, since I could just copy-paste the definitions from iteratee.)
  2. The iteratee approach is still evolving, now migrating to a continuations-based approach. It seems too early to try and jump on that band-wagon; let's let the dust settle first.
  3. Much of the complexity introduced by iteratee is totally unnecessary in WAI. For example, we can't have an arbitrary monad, we need IO. We can't have arbitrary chunk data types, we use strict bytestrings.

    However, at a more fundamental level, iteratee is trying to solve problems WAI doesn't have. WAI just needs a way to allow efficient outputting of data with interleaved IO actions. It will always just send one chunk at a time, and either the entire chunk will succeed, or there will be a failure. iteratee deals with complex problems like partial input consumption.

    In other words: I have not seen a single argument made why switching to iteratee is actually a win for WAI.

  4. And let's face it: enumerators are complicated beasts. People are afraid of them. If the Enumerator type in WAI is as scary as someone holding a knife to you, iteratee is a B2 Bomber.

That said, I'm happy to hear any explanations and see some examples of why I'm making The Wrong Decision here. For that matter, please tell me if you think any of the changes I'm proposing for WAI are a bad idea.


comments powered by Disqus