Persistent 0.6.0

August 26, 2011

GravatarBy Michael Snoyman

Persistent

Forms deal with the boundary between the user and the application. Another boundary we need to deal with is between the application and the storage layer. Whether it be a SQL database, a YAML file, or a binary blob, odds are you have to work to get your storage layer to accept your application datatypes. Persistent is Yesod's answer to data storage- a type-safe universal data store interface for Haskell.

Haskell has many different database bindings available. However, most of these have little knowledge of a schema and therefore do not provide useful static guarantees and force database-dependent interfaces and data structures on the programmer. Haskellers have attempted a more revolutionary route of creating Haskell specific data stores to get around these flaws that allow one to easily store any Haskell type. These options are great for certain use cases, but they constrain one to the storage techniques provided by the library, do not interface well with other languages, and the flexibility can also mean one must write reams of code for querying data. In contrast, Persistent allows us to choose among existing databases that are highly tuned for different data storage use cases, interoperate with other programming languages, and to use a safe and productive query interface.

Persistent follows the guiding principles of type safety and concise, declarative syntax. Some other nice features are:

  • Database-agnostic. There is first class support for PostgreSQL, SQLite and MongoDB.
  • By being non-relational in nature, we simultaneously are able to support a wider number of storage layers and are not constrained by some of the performance bottlenecks incurred through joins.
  • A major source of frustration in dealing with SQL databases is changes to the schema. Persistent can automatically perform database migrations.

Synopsis

{-# LANGUAGE QuasiQuotes, TemplateHaskell, TypeFamilies, OverloadedStrings #-}
{-# LANGUAGE GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Control.Monad.IO.Class (liftIO)

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
    age Int Maybe
BlogPost
    title String
    authorId PersonId
|]

main :: IO ()
main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll

    johnId <- insert $ Person "John Doe" $ Just 35
    janeId <- insert $ Person "Jane Doe" Nothing

    insert $ BlogPost "My fr1st p0st" johnId
    insert $ BlogPost "One more for good measure" johnId

    oneJohnPost <- selectList [BlogPostAuthorId ==. johnId] [LimitTo 1]
    liftIO $ print (oneJohnPost :: [(BlogPostId, BlogPost)])

    john <- get johnId
    liftIO $ print (john :: Maybe Person)

    delete janeId
    deleteWhere [BlogPostAuthorId ==. johnId]

Solving the boundary issue

Let's say you are storing information on people in a SQL database. Your table might look something like:

CREATE TABLE Person(id SERIAL PRIMARY KEY, name VARCHAR NOT NULL, age INTEGER)

And if you are using a database like PostgreSQL, you can be guaranteed that the database will never store some arbitrary text in your age field. (The same cannot be said of SQLite, but let's forget about that for now.) To mirror this database table, you would likely create a Haskell datatype that looks something like:

data Person = Person
    { personName :: String
    , personAge :: Int
    }

It looks like everything is type safe: the database schema matches our Haskell datatypes, the database ensures that invalid data can never make it into our data store, and everything is generally awesome. Well, until:

  • You want to pull data from the database, and the database layer gives you the data in an untyped format.
  • You want to find everyone older than 32, and you accidently write "thirtytwo" in your SQL statement. Guess what: that will compile just fine, and you won't find out you have a problem until runtime.
  • You decide you want to do something as simple as find the first 10 people alphabetically. No problem... until you make a typo in your SQL. Once again, you don't find out until runtime.

In dynamic languages, the answers to these issues is unit testing. For everything that can go wrong, make sure you write a test case. But as I am sure you are aware by now, that doesn't jive well with the Yesod approach to things. We like to take advantage of Haskell's strong typing to save us wherever possible, and data storage is no exception.

So the question remains: how can we use Haskell's type system to save the day?

Types

Like routing, there is nothing intrinsically difficult about type-safe data access. It just requires a lot of monotonous, error prone, boiler plate code. As usual, this means we can use the type system to keep us honest. And to avoid some of the drudgery, we'll use a sprinkling of Template Haskell.

PersistValue is the basic building block of Persistent. It is a very simple datatype that can represent data that gets sent to and from a database. Its definition is:

data PersistValue = PersistText Text
                  | PersistByteString ByteString
                  | PersistInt64 Int64
                  | PersistDouble Double
                  | PersistBool Bool
                  | PersistDay Day
                  | PersistTimeOfDay TimeOfDay
                  | PersistUTCTime UTCTime
                  | PersistNull
                  | PersistList [PersistValue]
                  | PersistMap [(T.Text, PersistValue)]
                  | PersistForeignKey ByteString -- ^ intended especially for MongoDB backend

Each Persistent backend needs to know how to translate the relevant values into something the database can understand. However, it would be awkward do have to express all of our data simply in terms of these basic types. The next layer is the PersistField typeclass, which defines how an arbitrary Haskell datatype can be marshaled to and from a PersistValue. A PersistField correlates to a column in a SQL database. In our person example above, name and age would be our PersistFields.

To tie up the user side of the code, our last typeclass is PersistEntity. An instance of PersistEntity correlates with a table in a SQL database. This typeclass defines a number of functions and some associated types. To review, we have the following correspondence between Persistent and SQL:

SQL Persistent
Datatypes (VARCHAR, INTEGER, etc) PersistValue
Table PersistEntity
Column PersistField

Code Generation

In order to ensure that the PersistEntity instances match up properly with your Haskell datatypes, Persistent takes responsibility for both. This is also good from a DRY (Don't Repeat Yourslef) perspective: you only need to define your entities once. Let's see a quick example:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.TH
import Database.Persist.Sqlite

mkPersist sqlSettings [persist|
Person
    name String
    age Int
|]

We use a combination of Template Haskell and Quasi-Quotation (like when defining routes): persistent-template:Database.Persist.TH:persist is a quasi-quoter which converts a whitespace-sensitive syntax into a list of entity definitions. (You can also declare your entities in a separate file using persistent-template:Database.Persist.TH:persistFile.) persistent-template:Database.Persist.TH:mkPersist takes that list of entities and declares:

  • One Haskell datatype for each entity.
  • A PersistEntity instance for each datatype defined.

Of course, the interesting part is how to use this datatype once it is defined.

main = withSqliteConn ":memory:" $ runSqlConn $ do
    michaelId <- insert $ Person "Michael" 26
    michael <- get michaelId
    liftIO $ print michael

We start off with some standard database connection code. In this case, we used the single-connection functions. Persistent also comes built in with connection pool functions, which we will generally want to use in production.

In this example, we have seen two functions: insert creates a new record in the database and returns its ID. Like everything else in Persistent, IDs are type safe. We'll get into more details of how these IDs work later. So when you call insert $ Person "Michael" 25, it gives you a value back of type PersonId.

The next function we see is get, which attempts to load a value from the database using an Id. In Persistent, you never need to worry that you are using the key from the wrong table: trying to load up a different entity (like House) using a PersonId will never compile.

PersistBackend

One last detail is left unexplained from the previous example: what are those withSqliteConn and runSqlConn functions doing, and what is that monad that our database actions are running in?

All database actions need to occur within an instance of PersistBackend. As its name implies, every backend (PostgreSQL, SQLite, MongoDB) has an instance of PersistBackend. This is where all the translations from PersistValue to database-specific values occur, where SQL query generation happens, and so on.

withSqliteConn creates a single connection to a database using its supplied connection string. For our test cases, we will use ":memory:", which simply uses an in-memory database. runSqlConn uses that connection to run the inner action, in this case, SqlPersist. Both SQLite and PostgreSQL share the same instance of PersistBackend.

One important thing to note is that everything which occurs inside a single call to runSqlConn runs in a single transaction. This has two important implications:

  • For many databases, committing a transaction can be a costly activity. By putting multiple steps into a single transaction, you can speed up code dramatically.
  • If an exception is thrown anywhere inside a single call to runSqlConn, all actions will be rolled back.

Migrations

I'm sorry to tell you, but so far I have lied to you a bit: the example from the previous section does not actually work. If you try to run it, you will get an error message about a missing table.

For SQL databases, one of the major pains can be managing schema changes. Instead of leaving this to the user, Persistent steps in to help, but you have to ask it to help. Let's see what this looks like:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.TH
import Database.Persist.Sqlite
import Control.Monad.IO.Class (liftIO)

mkPersist sqlSettings [persist|
Person
    name String
    age Int
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration $ migrate (undefined :: Person) -- this line added: that's it!
    michaelId <- insert $ Person "Michael" 26
    michael <- get michaelId
    liftIO $ print michael

With this one little code change, Persistent will automatically create your Person table for you. This split between runMigration and migrate allows you to migrate multiple tables simultaneously.

This works when dealing with just a few entities, but can quickly get tiresome once we are dealing with a dozen entities. Instead of repeating yourself, Persistent provides a helper function:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
    age Int
Car
    color String
    make String
    model String
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll

mkMigrate is a Template Haskell function which creates a new function that will automatically call migrate on all entities defined in the persist block. The share function is just a little helper that passes the information from the persist block to each Template Haskell function and concatenates the results.

Persistent has very conservative rules about what it will do during a migration. It starts by loading up table information from the database, complete with all defined SQL datatypes. It then compares that against the entity definition given in the code. For simple cases, it will automatically alter the schema:

  • The datatype of a field changed. However, the database may object to this modification if the data cannot be translated.
  • A field was added. However, if the field is not null, no default value is supplied (we'll discuss defaults later) and there is already data in the database, the database will not allow this to happen.
  • A field is converted from not null to null. In the opposite case, Persistent will attempt the conversion, contingent upon the database's approval.
  • A brand new entity is added.

However, there are a number of cases that Persistent will not handle:

  • Field or entity renames: Persistent has no way of knowing that "name" has now been renamed to "fullName": all it sees is an old field called name and a new field called fullName.
  • Field removals: since this can result in data loss, Persistent by default will refuse to perform the action (you can force the issue by using runMigrationUnsafe instead of runMigration, though it is not recommended).

runMigration will print out the migrations it is running on stderr (you can bypass this by using runMigrationSilent). Whenever possible, it uses ALTER TABLE calls. However, in SQLite, ALTER TABLE has very limited abilities, and therefore Persistent must resort to copying the data from one table to another.

Finally, if instead of performing a migration, you just want Persistent to give you hints about what migrations are necessary, use the printMigration function. This function will print out the migrations which runMigration would perform for you. This may be useful for performing migrations that Persistent is not capable of, for adding arbitrary SQL to a migration, or just to log what migrations occurred.

Uniqueness

In addition to declaring fields within an entity, you can also declare uniqueness constraints. A typical example would be requiring that a username be unique.

Unique Username
User
    username Text
    UniqueUsername username

While each field name must begin with a lowercase letter, the uniqueness constraints must begin with an uppercase letter.

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Data.Time
import Control.Monad.IO.Class (liftIO)

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    firstName String
    lastName String
    age Int
    PersonName firstName lastName
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll
    insert $ Person "Michael" "Snoyman" 26
    michael <- getBy $ PersonName "Michael" "Snoyman"
    liftIO $ print michael

To declare a unique combination of fields, we add an extra line to our declaration. Persistent knows that it is defining a unique constructor, since the line begins with a capital letter. Each following word must be a field in this entity.

The main restriction on uniqueness is that it can only be applied non-null fields. The reason for this is that the SQL standard is ambiguous on how uniqueness should be applied to NULL (eg, is NULL=NULL true or false?). Besides that ambiguity, most SQL engines in fact implement rules which would be contrary to what the Haskell datatypes anticipate (eg, PostgreSQL says that NULL=NULL is false, whereas Haskell says Nothing == Nothing is True).

In addition to providing nice guarantees at the database level about consistency of your data, uniqueness constraints can also be used to perfect some specific queries within your Haskell code, like the getBy demonstrated above. This happens via the Unique associated type. In the example above, we end up with a new constructor:

PersonName :: String -> String -> Unique Person

Queries

Fetching by ID

The simplest query you can perform in Persistent is getting based on an ID. Since this value may or may not exist, its return type is wrapped in a Maybe.

Using get
personId <- insert $ Person "Michael" "Snoyman" 26
    maybePerson <- get personId
    case maybePerson of
        Nothing -> liftIO $ putStrLn "Just kidding, not really there"
        Just person -> liftIO $ print person

This can be very useful for sites that provide URLs like /person/5. However, in such a case, we don't usually care about the Maybe wrapper, and just want the value, returning a 404 message if it is not found. Fortunately, the get404 function helps us out here. We'll go into more details when we see integration with Yesod.

Fetching by unique constraint

getBy is almost identical to get, except it takes a uniqueness constraint instead of an ID it takes a Unique value.

Using getBy
personId <- insert $ Person "Michael" "Snoyman" 26
    maybePerson <- getBy $ UniqueName "Michael" "Snoyman"
    case maybePerson of
        Nothing -> liftIO $ putStrLn "Just kidding, not really there"
        Just person -> liftIO $ print person

Select functions

But likely, you're going to want more powerful queries. You'll want to find everyone over a certain age; all cars available in blue; all users without a registered email address. For this, you need one of the select functions.

All the select functions use a similar interface, with slightly different outputs:

Function Returns
selectEnum An Enumerator containing all the IDs and values from the database.
selectList A list containing all the IDs and values from the database.
selectFirst Takes just the first ID and value from the database, if available
selectKeys Returns only the keys, without the values, as an Enumerator

selectList is the most commonly used, so we will cover it specifically. But understanding the others should be trivial after that.

selectList takes two arguments: a list of Filters, and a list of SelectOpts. The former is what limits your results based on characteristics; it allows for equals, less than, is member of, and such. SelectOpts provides for three different features: sorting, limiting output to a certain number of rows, and offsetting results by a certain number of rows.

Let's jump straight into an example of filtering, and then analyze it.

people <- selectList [PersonAge >. 25, PersonAge <=. 30] []
    liftIO $ print people

As simple as that example is, we really need to cover three points:

  1. PersonAge is a constructor for an associated phantom type. That might sound scary, but what's important is that it uniquely identifies the "age" column of the "person" table, and that it knows that the age field is an Int. (That's the phantom part.)
  2. We have a bunch of Persistent filtering operators. They're all pretty straight-forward: just tack a period to the end of what you'd expect. There are three gotchas here, I'll explain below.
  3. The list of filters is ANDed together, so that our constraint means "age is greater than 25 AND age is less than or equal to 30". We'll describe ORing later.

The one operator that's surprisingly named is the not equals one. We use !=., since /=. is used for updates (described later). Don't worry: if you use the wrong one, the compiler will catch you. The other two surprising operators are the "is member" and "is not member". They are, respectively, <-. and /<-. (both end with a period).

And regarding ORs, we use the ||. operator. Let's see an example:

people <- selectList
        (       [PersonAge >. 25, PersonAge <=. 30]
            ||. [PersonFirstName /<-. ["Adam", "Bonny"]]
            ||. ([PersonAge ==. 50] ||. [PersonAge ==. 60])
        )
        []
    liftIO $ print people

This (completely nonsensical) example means: find people who are 26-30, inclusive, OR whose names are neither Adam or Bonny, OR whose age is either 50 or 60.

SelectOpt

All of our selectList calls have included an empty list as the second parameter. That specifies no options, meaning: sort however the database wants, return all results, and don't skip any results. A SelectOpt has four constructors that can be used to change all that.

Asc
Sort by the given column in ascending order. This uses the same phantom type as filtering, such as PersonAge.
Desc
Same as Asc, in descending order.
LimitTo
Takes an Int argument. Only return up to the specified number of results.
OffsetBy
Takes an Int argument. Skip the specified number of results.

The following code defines a function that will break down results into pages. It returns all people aged 18 and over, and then sorts them by age (oldest person first). For people with the same age, they are sorted alphabetically by last name, then first name.

resultsForPage pageNumber = do
    let resultsPerPage = 10
    selectList
        [ PersonAge >=. 18
        ]
        [ Desc PersonAge
        , Asc PersonLastName
        , Asc PersonFirstName
        , LimitTo resultsPerPage
        , OffsetBy $ (pageNumber - 1) * resultsPerPage
        ]

Manipulation

Insert

It's all well and good to be able to play with data in the database, but how does it get there in the first place? The answer is the insert function. Its usage is incredibly simple, as you've seen already. You just give it a value, and it gives back an ID.

At this point, it makes sense to explain a it of the philosophy behind Persistent. In many other ORM solutions, the datatypes used to hold data is opaque: you need to go through their defined interfaces to get at and modify the data. That's not the case with Persistent: we're using plain old Algebraic Data Types for the whole thing. This means you still get all the great benefits of pattern matching, currying and everything else you're used to.

However, there are a few things we can't do. For one, there's no way to automatically update values in the database every time the record is updated in Haskell. Of course, with Haskell's normal stance of purity and immutability, this wouldn't make much sense anyway, so I don't shed any tears over it.

However, there is one issue that newcomers are often bothered by: why are IDs and values completely separate? It seems like it would be very logical to embed the ID inside the value. In other words, instead of having:

data Person = Person { name :: String }
have
data Person = Person { personId :: PersonId, name :: String }

Well, there's one problem with this right off the bat: how do we do an insert? If a Person needs to have an ID, and we get the ID by inserting, and an insert needs a Person, we have an impossible loop. Now, we could solve this with undefined, but that's just asking for trouble.

OK, you say, let's try something a bit safer:

data Person = Person { personId :: Maybe PersonId, name :: String }
I definitely prefer insert $ Person Nothing "Michael" to insert $ Person undefined "Michael". And now our types will be much simpler, right? For example, selectList could return a simple [Person] instead of that ugly [(PersonId, Person)].

The problem is that the "ugliness" is incredibly useful. Having (PersonId, Person) makes it obvious, at the type level, that we're dealing with a value that exists in the database. Let's say we want to create a link to another page that requires the PersonId (not an uncommon occurrence as we'll discuss later). The (PersonId, Person) form gives us unambiguous access to that information; embedding PersonId within Person with a Maybe wrapper means an extra runtime check for Just, instead of a more error-proof compile time check.

Finally, there's a semantic mismatch with embedding the ID within the value. The Person is the value. Two people are identical (in the context of a database) if all their fields are the same. By embedding the ID in the value, we're no longer talking about a person, but about a row in the database. Equality is no longer really equality, it's identity: is this the same person. as opposed to equivalent person.

In other words, there are some annoyances with having the ID separated out, but overall, it's the right approach, which in the grand scheme of things leads to better, less buggy code.

Update

Now, in the context of that discussion, let's think about updating. The simplest way to update is:

let michael = Person "Michael" 26
    michaelAfterBirthday = michael { personAge = 27 }
But that's not actually updating anything, it's just creating a new person value based on the old one. When we say update, we're not talking about modifications to the values in Haskell. (We better not be of course, since Haskell data types are immutable.)

Instead, we're looking at ways of modifying rows in a table. And the simplest way to do that is with the update function.

personId <- insert $ Person "Michael" "Snoyman" 26
    update personId [PersonAge =. 27]

update simply takes two arguments: an ID, and a list of Updates. The simplest update is assignment, but it's not always the best. What if you want to increase someones age by 1, but you don't have their current age? Persistent has you covered:

haveBirthday personId = update personId [PersonAge +=. 1]

And as you might expect, we have all the basic mathematical operators: +=., -=., *=., and /=. (full stop). These can be convenient for updating a single record, but they are also essential for proper ACID guarantees. Imagine the alternative: pull out a Person, increment the age, and update the new value. If you have two threads/processes working on this database at the same time, you're in for a world of hurt (hint: race conditions).

Sometimes you'll want to update many fields at once (give all your employees a 5% pay increase, for example). updateWhere takes two parameters: a list of filters, and a list of updates to apply.

updateWhere [PersonFirstName ==. "Michael"] [PersonAge *=. 2] -- it's been a long day

Occasionally, you'll just want to completely replace the value in a database with a different value. For that, you use (surprise) the replace function.

personId <- insert $ Person "Michael" "Snoyman" 26
    replace personId $ Person "John" "Doe" 20

Delete

As much as it pains us, sometimes we must part with our data. To do so, we have three functions:

delete
Delete based on an ID
deleteBy
Delete based on a unique constraint
deleteWhere
Delete based on a set of filters
personId <- insert $ Person "Michael" "Snoyman" 26
    delete personId
    deleteBy $ UniqueName "Michael" "Snoyman"
    deleteWhere [PersonFirstName ==. "Michael"]

We can even use deleteWhere to wipe out all the records in a table, we just need to give some hints to GHC as to what table we're interested in:

deleteWhere ([] :: [Filter Person])

Attributes

So far, we have seen a very simple syntax for our persist blocks: a line for the name of our entities, and then an indented line for each field with two words: the name of the field and the datatype of the field. Persistent handles more than this: you can assign an arbitrary list of attributes after the first two words on a line.

Let's say that we want to have a Person entity with an (optional) age and the timestamp of when he/she was added to the system. For entities already in the database, we want to just use the current date-time for that timestamp.

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Data.Time
import Control.Monad.IO.Class

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
    age Int Maybe
    created UTCTime default=now()
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    time <- liftIO getCurrentTime
    runMigration migrateAll
    insert $ Person "Michael" (Just 26) time
    insert $ Person "Greg" Nothing time

Maybe is a built in, single word attribute. It makes the field optional. In Haskell, this means it is wrapped in a Maybe. In SQL, it makes the column nullable.

The default attribute is backend specific, and uses whatever syntax is understood by the database. In this case, it uses the database's built-in now() function. Let's say we now want to add a field for favorite programming language:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Data.Time

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
    age Int Maybe
    created UTCTime default=now()
    language String default='Haskell'
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll

We need to surround the string with single quotes so that the database can properly interpret it. Finally, Persistent can use double quotes for containing white space, so let's say we want to set someone's default home country to the El Salvador:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Data.Time

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
    age Int Maybe
    created UTCTime default=now()
    language String default='Haskell'
    country String "default='El Salvador'"
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll

One last trick you can do with attributes is to specify the names to be used for the SQL tables and columns. This can be very convenient when interacting with existing databases.

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person sql=the-person-table
    firstName String sql=first_name
    lastName String sql=fldLastName
    age Int Gt Desc "sql=The Age of the Person"
    UniqueName firstName lastName
|]

Relations

Persistent allows references between your data types in a manner that is consistent with supporting non-SQL databases. We do this by embedding an ID in the related entity. So if a person has many cars:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Control.Monad.IO.Class (liftIO)
import Data.Time

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
Car
    ownerId PersonId Eq
    name String
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll
    bruce <- insert $ Person "Bruce Wayne"
    insert $ Car bruce "Bat Mobile"
    insert $ Car bruce "Porsche"
    -- this could go on a while
    cars <- selectList [CarOwnerId ==. bruce] []
    liftIO $ print cars

Using this technique, it's very easy to define one-to-many relationships. To define many-to-many relationships, we need a join entity, which has a one-to-many relationship with each of the original tables. It is also a good idea to use uniqueness constraints on these. For example, to model a situation where we want to track which people have shopped in which stores:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH
import Data.Time

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
Store
    name String
PersonStore
    personId PersonId
    storeId StoreId
    UniquePersonStore personId storeId
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll

    bruce <- insert $ Person "Bruce Wayne"
    michael <- insert $ Person "Michael"

    target <- insert $ Store "Target"
    gucci <- insert $ Store "Gucci"
    sevenEleven <- insert $ Store "7-11"

    insert $ PersonStore bruce gucci
    insert $ PersonStore bruce sevenEleven

    insert $ PersonStore michael target
    insert $ PersonStore michael sevenEleven

Closer look at types

So far, we've spoken about Person and PersonId without really explaining what they are. In the simplest sense, for a SQL-only system, the PersonId could just be type PersonId = Int64. However, that means there is nothing binding a PersonId at the type level to the Person entity. Instead, it would be very easy to accidently use a PersonId and get a Car. In order to model this relationship, we use phantom types. So, our next naive step would be:

newtype Key entity = Key Int64
type PersonId = Key Person

And that works out really well, until you get to a backend that doesn't use Int64 for its IDs. And that's not just a theoretical question; MongoDB uses ByteStrings instead. So what we need is a key value that can contain an Int and a ByteString. Seems like a great time for a sum type:

data Key entity = KeyInt Int64 | KeyByteString ByteString

But that's just asking for trouble. Next we'll have a backend that uses timestamps, so we'll need to add something to Key. This could go on for a while. Fortunately, we already have a sum type intended for representing arbitrary data: PersistValue. So now, we can use:

newtype Key entity = Key PersistValue

But this has another problem. Let's say we have a web application that takes an ID as a parameter from the user. It will need to receive that parameter as Text and then try to convert it to a Key. Well, that's simple: write a function to convert a Text to a PersistValue, and then wrap the result in the Key constructor, right?

Wrong. We tried this, and there's a big problem with it. We end up getting Keys that could never be. For example, if we're dealing with SQL, a key must be an integer. But the approach described above would allow arbitrary textual data in. The result was a bunch of 500 server errors as the database choked on comparing an integer column to text.

So what we need is a way to convert text to a Key, but have it dependent on the rules of the backend in question. And once phrased that way, the answer is simple: just add another phantom. The real, actual definition of Key in Persistent is:

newtype Key backend entity = Key { unKey :: PersistValue }

This works great: we can have a Text -> Key MongoDB entity function and a Text -> Key SqlPersist entity function, and everything runs smoothly. But now we have a new problem: relations. Let's say we want to represent blogs and blog posts. We would use the entity definition:

Blog
    title Text
Post
    title Text
    blogId BlogId

But what would that look like in terms of our Key datatype?

data Blog = Blog { blogTitle :: Text }
data Post = Post { postTitle :: Text, postBlogId :: Key <what goes here?> Blog }

We need something to fill in as the backend. In theory, we could hardcode this to SqlPersist, or Mongo, but then our datatypes will only work for a single backend. For an individual application, that might be acceptable, but what about libraries defining datatypes to be used by multiple applications, using multiple backends?

So thinks got a little more complicated. Our types are actually:

data BlogGeneric backend = Blog { blogTitle :: Text }
data PostGeneric backend = Post { postTitle :: Text, postBlogId :: Key backend (BlogGeneric backend) }

Notice that we still keep the short names for the constructors and the records. Finally, to give a simple interface for normal code, we define some type synonyms:

type Blog = BlogGeneric SqlPersist
type BlogId = Key SqlPersist Blog
type Post = PostGeneric SqlPersist
type PostId = Key SqlPersist Post

And no, SqlPersist isn't hard-coded into Persistent anywhere. That sqlSettings parameter you've been passing to mkPersist is what tells us to use SqlPersist. Mongo code will use mongoSettings instead.

This might be quite complicated under the surface, but user code hardly ever touches this. Look back through this whole chapter: not once did we need to deal with the Key or Generic stuff directly. The most common place for it to pop up is in compiler error messages. So it's important to be aware that this exists, but it shouldn't affect you on a day-to-day basis.

Custom Fields

Occasionally, you will want to define a custom field to be used in your datastore. The most common case is an enumeration, such as employment status. For this, Persistent provides a helper Template Haskell function:

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving, TemplateHaskell, OverloadedStrings, GADTs #-}
import Database.Persist
import Database.Persist.Sqlite
import Database.Persist.TH

data Employment = Employed | Unemployed | Retired
    deriving (Show, Read, Eq)
derivePersistField "Employment"

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name String
    employment Employment
|]

main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll

    insert $ Person "Bruce Wayne" Retired
    insert $ Person "Peter Parker" Unemployed
    insert $ Person "Michael" Employed

derivePersistField stores the data in the database using a string field, and performs marshaling using the Show and Read instances of the datatype. This may not be as efficient as storing via an integer, but it is much more future proof: even if you add extra constructors in the future, your data will still be valid.

Persistent: Raw SQL

The Persistent package provides a type safe interface to data stores. It tries to be backend-agnostic, such as not relying on relational features of SQL. My experience has been you can easily perform 95% of what you need to do with the high-level interface. (In fact, most of my web apps use the high level interface exclusively.)

But occasionally you'll want to use a feature that's specific to a backend. One feature I've used in the past is full text search. In this case, we'll use the SQL "LIKE" operator, which is not modeled in Persistent. We'll get all people with the last name "Snoyman" and print the records out.

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE TemplateHaskell #-}
{-# LANGUAGE QuasiQuotes #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE GADTs #-}
import Database.Persist.Sqlite (withSqliteConn)
import Database.Persist.TH (mkPersist, persist, share, mkMigrate, sqlSettings)
import Database.Persist.GenericSql (runSqlConn, runMigration, SqlPersist)
import Database.Persist.GenericSql.Raw (withStmt)
import Database.Persist.GenericSql.Internal (RowPopper)
import Data.Text (Text)
import Database.Persist
import Control.Monad.IO.Class (liftIO)

share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    name Text
|]

main :: IO ()
main = withSqliteConn ":memory:" $ runSqlConn $ do
    runMigration migrateAll
    insert $ Person "Michael Snoyman"
    insert $ Person "Miriam Snoyman"
    insert $ Person "Eliezer Snoyman"
    insert $ Person "Gavriella Snoyman"
    insert $ Person "Greg Weber"
    insert $ Person "Rick Richardson"

    -- Persistent does not provide the LIKE keyword, but we'd like to get the
    -- whole Snoyman family...
    let sql = "SELECT name FROM Person WHERE name LIKE '%Snoyman'"
    withStmt sql [] withPopper

-- A popper returns one row at a time. We loop over it until it returns Nothing.
withPopper :: RowPopper (SqlPersist IO) -> SqlPersist IO ()
withPopper popper =
    loop
  where
    loop = do
        mrow <- popper
        case mrow of
            Nothing -> return ()
            Just row -> liftIO (print row) >> loop

Integration with Yesod

So you've been convinced of the power of Persistent. How do you integrate it with your Yesod application? Well, if you use the scaffolding, most of the work is done for you already. But as we normally do, we'll build up everything manually here to point out how it works under the surface.

The yesod-persistent package provides the meeting point between Persistent and Yesod. It provides the YesodPersist typeclass, to standardize access to the database via the runDB method. Let's see this in action.

{-# LANGUAGE QuasiQuotes, TypeFamilies, GeneralizedNewtypeDeriving #-}
{-# LANGUAGE TemplateHaskell, OverloadedStrings, GADTs, MultiParamTypeClasses #-}
import Yesod
import Database.Persist.Sqlite

-- Define our entities as usual
share [mkPersist sqlSettings, mkMigrate "migrateAll"] [persist|
Person
    firstName String
    lastName String
    age Int Gt Desc
|]

-- We keep our connection pool in the foundation. At program initialization, we
-- create our initial pool, and each time we need to perform an action we check
-- out a single connection from the pool.
data PersistTest = PersistTest ConnectionPool

-- We'll create a single route, to access a person. It's a very common
-- occurrence to use an Id type in routes.
mkYesod "PersistTest" [parseRoutes|
/person/#PersonId PersonR GET
|]

-- Nothing special here
instance Yesod PersistTest where
    approot _ = ""

-- Now we need to define a YesodPersist instance, which will keep track of
-- which backend we're using and how to run an action.
instance YesodPersist PersistTest where
    type YesodPersistBackend PersistTest = SqlPersist

    -- This is a bit complicated, but liftIOHandler is necessary for proper
    -- handling of exceptions.
    runDB action = liftIOHandler $ do
        PersistTest pool <- getYesod
        runSqlPool action pool

-- We'll just return the show value of a person, or a 404 if the Person doesn't
-- exist.
getPersonR :: PersonId -> Handler RepPlain
getPersonR personId = do
    person <- runDB $ get404 personId
    return $ RepPlain $ toContent $ show person

openConnectionCount :: Int
openConnectionCount = 10

main :: IO ()
main = withSqlitePool "test.db3" openConnectionCount $ \pool -> do
    runSqlPool (runMigration migrateAll) pool
    runSqlPool (insert $ Person "Michael" "Snoyman" 26) pool
    warpDebug 3000 $ PersistTest pool

There are two important pieces here for general use. runDB is used to run a DB action from within a Handler. That liftIOHandler is a little scary, but it can be safely ignored. It is necessary in order to properly catch exceptions and rollback changes. Within the runDB, you can use any of the functions we've spoken about so far, such as insert and selectList.

The other new feature is get404. It works just like get, but instead of returning a Nothing when a result can't be found, it returns a 404 message page. The getPersonR function is a very common approach used in real-world Yesod applications: get404 a value and then return a response based on it.

Comments

comments powered by Disqus

Archives