Exciting changes coming to conduit 0.2
January 29, 2012
tl;dr: The new Haddocks are available at: http://www.snoyman.com/haddocks/conduit-0.2.0/index.html
Even though it's relatively young, conduit has gotten a lot of real-world usage, and a fair bit of scrutiny. I think we achieved all of our main objectives with the first release, but that doesn't mean we're going to avoid improvements. I asked the community to give their feedback, and here were the main criticisms I've heard:
BufferedSourcedoesn't feel quite right. One complaint was the name
bsourceUnpull, but overall people thought it didn't fit in well with the rest of the package.
- Usage of mutable variables for storing state is suboptimal.
- The split between
PreparedSourceisn't very nice.
While I won't call the first issue fully resolved, I would say that conduit 0.1 was a big step
in the right direction. Instead of exposing all the internals of
it's now an abstract type. (This does solve the
dislike, though that's obviously a minor point.) Overall, we had a move in dependent packages
away from using
BufferedSource in any external APIs. In other words,
BufferedSource is intended purely as an internal tool. For example, in Warp, we
BufferedSource to parse the request headers, but then convert it back to a
Source to pass to the application for request body reading.
I've been opposed to making any changes for the second issue (mutable variables). My belief was that one of the sources of conduits' simplicity relative to enumerators was its usage of mutable state. And in general, I don't believe in changing something until there's hard evidence that it's actually causing problems.
Last week, however, Felipe Lessa found one such concrete problem: using
SequencedSink was very slow. Upon investigation, I determined that the problem
Sink's monadic bind implementation. The issue is that for each bind, a
new mutable variable was being allocated, and it needed to be checked to determine its state.
Unfortunately, having a long chain of binds resulted in exponential complexity, having to check
N variables for each action. This clearly needed to be fixed, but there was no
way to do so (that I could see) with the previous types.
So I was presented with a dilemna: either continue in the mutable variable path and try to solve the problem, or go in the pure/CPS direction, where I knew a simpler solution existed. The choice was actually pretty easy: go for the pure approach. I had the following reasons:
- The main motivation to avoid the change to CPS was to keep the simplicity of the current approach. However, I was about to lose that simplicity anyway.
- Like most Haskellers, I do have an innate dislike for mutable variables.
- After more work comparing conduits to enumerators, I've come to believe that the main source
of confusion in enumerators is that the data producer (
Enumerator) is just a consumer-transformer. Since the essence of
Sourcewould stay the same in CPS, I think that this change does not hinder our simplicity.
- There was strong reason to believe that GHC would be able to optimize CPS code better than mutable variable code.
So I took the plunge and tried out CPS... and I really like the result! The first change is to
Open constructor: instead of just returning a
new value, it returns a new value and a new
Source. This allows us to
pass our state in that new
Source. There are similar changes to
ConduitResult. After this, I benchmarked the
old and new version, comparing both a monadic-bind-intensive
Sink and a
Sink without any binds. The former had a ten-fold speedup (not surprising due
to the decrease in algorithmic complexity), and the latter had a 20% speedup.
But that wasn't the end of it. This new approach allows us to get rid of the
Prepared family of types. Let's take the
as an example, which opens a
Handle and reads data from a file. In the old
approach, we needed to provide the
PreparedSource with the
Handle in order for the
PreparedSource to read from it.
Therefore, we had a
Source which opened the
Handle and passed
it to the
PreparedSource. In the new approach, we have a
that opens a handle, reads some data, and returns a new
Source that reads from
So contrary to my original belief, I think this CPS move actually simplifies conduit greatly.
Another, orthogonal change that I put in was better data types in a few places. Previously, if
you wanted to use the
sourceState function, and had a pull function that
Closed, you needed to provide a dummy state value. (If you look through
conduit code bases, you'll see a lot of
Instead, we now have a specialized data type (
suggestions welcome) that avoids this need. Internally, I also cleaned up a number of the types
to enforce invariants at the type level.
Speaking of invariants, the final simplification is that we now have just one invariant ruling
over the whole package: never reuse a
Conduit. After you pull from a
Source, it will give you a new
Source. Do not reuse the original
Source. If you get a
Closed result, there is no new
Source, and therefore you
cannot pull again or close the
I encourage everyone to have a look at the Haddocks and give me your feedback.
When will this be released?Likely some time this week. I don't have any specific changes in mind right now, outside of name adjustments that are suggested by the community.
How this affects users
Anyone programming against the high-level conduit API exclusively will have no breakage. If
you're using functions like
sinkState, you'll have
minimal changes to use the modified datatypes (essentially changing a few constructors and
reordering your arguments). If you're coding directly against the low-level types, you'll need
to restructure things a bit to pass around continuations.
Please email me (or preferably the Haskell cafe) if you want some help on converting old conduit code to this new set of types. For the most part, it's a mechanical process, and I can give lots of examples from the code I've already migrated.
How this affects Yesod
Yesod 0.10 will be built off of this new-and-improved conduit. In fact, the code is already updated for it. This likely means that the Yesod release will be about a week later than originally anticipated, maybe in the second week of February.