My last blog post on the prosposed changes to conduit got quite a discussion on Reddit. I really appreciate everyone's input, thank you! In the process of the discussion, a number of questions came up, and I'd like to summarize them here.
By the way, if anyone thinks I'm spending an inordinate amount of time on this stuff... it's because I am :). This is the last remaining blocking issue for the Yesod 1.0 release, so I'm trying to make a decision on this stuff quickly. I don't want to make a bad decision under pressure of course, but if we can come to some conclusions in the next week, it would be very nice to be able to use the shiny new conduit 0.4 in Yesod 1.0.
Most importantly: is this a good change?
The first question is the most important. Is the move towards a single datatype
overall a good thing or a bad thing? Obviously the advantages are strong: a
single set of instances to maintain, a single fusion operator, less
constructors to deal with. However, we need to accept that there are also
downsides: error messages are more confusing and code needs to deal with
meaningless constructors (e.g.,
After playing around with this quite a bit, I'm strongly leaning towards saying
that the benefits outweigh the costs. The clincher for me was when I was able
sequenceSink, and the two functions basically
disappeared. Compare the
I'm still hoping to hear from some
conduit users on this to make sure the
changes won't be off-putting, but I think it's almost certain that the code
well be merged into master.
Side point: newtypes?
The idea came up of using
newtypes for the
types to try and have a best of both worlds. I still think it would be
beneficial (better error messages, and a nicer
Functor instance for
Conduit), but at least for now, I think it's too much overhead to have to
wrap/unwrap everywhere. There's also an argument to be made that a
would hide away the "true nature" of our types, though I'm still on the fence
as to whether users should be confronted with the fact that the three types are
Type for second record in
NeedInput constructor has two records. The first takes some input and
returns a new
Pipe. The second is for indicating that no input is available.
Unlike the early termination records for
PipeM, this record
Pipe, since it's feasible that we may want to output more values
after we've run out of input (the typical example here is a decompression
Said another way: the early termination for
PipeM can only
ever be called when the upstream
Pipe closes, not when the downstream
Anyway, the idea of this record is that it can't receive any input, since once
it's been called, we know that the upstream pipe has closed and won't be
providing any more input. There are two ways we could model this: set the
(), or set it to
However, there's also a third approach: keeping
i as it was before. Since
we'll never be providing any more input to this
Pipe, it's completely
irrelevant what the
i parameter is. If the
Pipe ever requests more input,
we'll just call the early termination
Pipe again anyway. The advantage to
this third approach is that it simplifies some of the internal code, since we
don't need to juggle different parameters.
I'm leaning towards the third approach, but all three seem equally feasible.
There's a bit of an inconsistency, in that the
Done constructor performs two
actions: it returns a result, and gives back any leftover input. Also confusing
is that we can only have 0 or 1 leftover values.
We could address the second issue by changing the
Maybe to a list.
Alternatively, we could solve both issues by introducing a new
constructor, and modifying the
Done constructor, like so:
| Done r | Leftover (Pipe i o m r) i
I've put this in a branch on Github, and it certainly works. However, I think I'm most comfortable leaving code as-is:
- We don't have the concept of chunking (i.e., dealing with more than one value at a time) anywhere else in the type, so why should the leftovers be different?
- Right now, we have a nice invariant expressed in the types that you can only return leftovers when computation is complete. I think I like that setup.
Void or not to
In order to ensure that a
Sink never yields output, we set the
Void. Initially, we set the
i parameter on a
(), so that
runPipe can just provide an infinite stream of unit values.
However, we can just as well set the
i parameter to
Void, and then call the
no-more-input record of
NeedInput. I'm not going to try and summarize the
arguments back and forth on this one, because there are a lot of them. I will
say that I'm leaning towards
Void, just because it gives a very nice parallel
Fuse operators: unify?
There's now a fusion function (
pipe) which can fuse together
Conduits. All three fusion operators (
simply type-constrained wrappers around it. (
$$ also utilizies
pipe, but it
runPipe on the generated
Pipe.) The question is: do we need all
three operators, or should we have just one?
The advantage of separate operators is clearer error messages, and more explicit code. However, it hides the fact that all three types are really one and the same. (Again, I'm ambivalent as to whether that hiding is a feature or a bug.) It also means that people have to learn more names.
So should we have a unified fusion operator? And what would it be called?
Note: either way, this next release will still contain the other three, type-constrained fusion operators, if only to ease migration. If we add a unified operator, it would be in addition to those three for now, and likely after a few point releases we would deprecate the three operators.
Bikeshedding: rename the $$& operator
This is likely the easiest. I call
$$& the connect-and-resume operator, and
it connects a
Source to a
Sink, gets a result, and also gives back the most
recent state of the
Source. This allows us to continue computation.
Frankly, I chose a pretty bad name for the operator. (In my defense, I did that
on purpose to make sure I didn't become attached to it.) Some other ideas that
have been floated are
$$+. I feel no particular drive one way or