A History of Haskell persistence

March 6, 2012

GravatarBy Greg Weber

We have a Google Summer of Code proposal to improve Persistent, the preferred database abstraction layer for the Yesod web framework.

While discussing the proposal on haskell-cafe, it was brought to my attention a new similar package called structured-mongoDB.

Now that we have yet another library accomplishing the same goal, haskellers will wonder what the difference between them is. Let me give a history of persistence in Haskell as I know it.

There have always been raw database drivers. These meant that you need to do extra work to serialize your data and that your queries could have typos that wouldn't be detected until runtime. In the case at least of SQL, raw queries are also difficult to compose.

One solution given for better persistence was acid-state. This potentially solves every problem, but limits you to an experimental in-process memory store.

Another solution was HaskellDB. HaskellDB helps you know your queries are correct and composes well. Its weakness are that you have to learn new terms for standard SQL terms, it may not generate optimized queries, and a lack of automatic serialization into normal Haskell data structures. Probably HaskellDB never got great uptake simply because it didn't have a great maintainer pushing it forward, although there are users that enjoy using it today.

Michael Snoyman more recently created Persistent. Persistent showed that through reliance on Template Haskell, one could create type-safe queries with automatic serialization, and this could be done across different databases, SQL (Sqlite, PostgreSQL, or Felipe Lessa's later contributed MySQL backend), or (my contributed) MongoDB (and now there is a pull request for a CouchDB backend). Persistent's main weakness was that it can only satisfy the common 80% usage pattern, and didn't offer as much help when you want to write a raw SQL query, in addition its API did not allow for great composition.

Boris Lykah released the Groundhog library. It showed that instead of all the Template Haskell generation of Persistent you could instead use easy to compose combinators. This was a point of collaboration between the 2 libraries, and Persistent absorbed Groundhog's approach. Ultimately Persistent did not merge with Groundhog because Groundhog has some more advanced features that we though would complicate the internals and the types the users had to deal with. Groundhog continues under Boris's stewardship mostly with the goal of being an experiment for advanced features (for example support for Sum types).

Persistent underwent recent changes to keep its serialization layer separate from its query layer. We would really like to share Persistent's serialization layer with anyone else that wants to support multiple database backends. I have thrown out the idea of Groundhog relying on Persistent's serialization layer now that it has been separated from Persistent's query layer, and Boris plans to look into it. Persistent is also better at raw queries now: it can give you back nicerly serialized data in many cases, although you still risk typos in your queries.

Blake Rain created a Persistent-like library just for MongoDB Blake has not been advertising this, and the fact that Persistent already has support for MongoDB means I don't think many outside his company have used it. Blake's implementation is simpler than Persistent's because it only targets one backend.

structured-mongoDB was released recently by Deian Stefan. It is the same as mt-mongoDB right now because it targets just one backend. However, Deian stated that they are considering supporting multiple backends, which would make structured-mongoDB the same as Persistent (and Groundhog, which does not have MongoDB support).

I am sure I am missing some other attempts at type-safe querying and automatic data-marshaling: please let me know of them. I do know that there are still a lot of open or at least untackled problems for a persistence layer.

I believe the biggest limitation that Haskell puts on us for coming up with better solutions to persistence layers are that records are not name spaced. I have been involved over the last few months in a discussion to try to push forward a solution to this, and we may finally be close to coming up with a solution now.

I also believe there is a fundamental problem with the query combinator approach taken by persistent, groundhog, and structured. It does let you compose SQL (and perhaps we will always find it invaluable for that), but It does not allow for precise (raw) queries. We are creating better capabilities in persistent (new rawSQL function that still does serialization), but we still aren't fully there yet. mt-mongodb is working towards a solution for this by using a quasi-quoted query. Quasi-quoting will allow for the query to be parsed at compile time and verified. It would also be possible with MongoDB to try for a non-Quasi-quoted approach. persistent-hssqlppp also attempts to solve this problem by using hssqlppp to validate a QQ SQL query.

I wrote this message as a starting point to try to increase collaboration in the realm of Haskell persistence. I would certainly welcome more libraries to the scene if they were tackling these new problems in a way difficult to tackle in an existing library. I believe the greatest weakness of the Haskell ecosystem is a lack of quality libraries, and that we have few resources at our disposal to improve the situation. I also have seen that every time I have collaborated with others in the Haskell community, we end up with something much better.


comments powered by Disqus