Table Of Contents

Previous Topic

Django Project Structure

PostgreSQL

One of Heroku’s oldest and most popular addons is their PostgreSQL addon. Did you know that Heroku’s hosted PostgreSQL service is “the largest and most reliable Postgres service in the world”?

Overview

The Heroku PostgreSQL service allows you to easily create, destroy, resize, backup, restore, and scale your database infrastructure both from the command line (via the Heroku CLI app) as well as from the web (using Heroku’s standalone site).

Heroku’s PostgreSQL service is an excellent choice for you if:

  • Your application requires a relational database for storing information,
  • You don’t want to worry about downtime or data loss (since Heroku fully manages your PostgreSQL instances),
  • You want the ability to scale your PostgreSQL database by:
    • Creating read slaves on demand,
    • Creating database snapshots for experimental testing (eg: duplicate your production database and run tests against the clone),
    • Easily upgrading (or downgrading) the size of your database,
    • Paying only for what you use (by the second) so you can instantly scale up to support burst traffic–and down again to prevent bankruptcy :),
  • You want the ability to instantly backup and restore your data to any PostgreSQL database (no vendor lock-in).
  • Your application is running anywhere on the internet; Heroku’s PostgreSQL service is a standalone service and works just like any normal PostgreSQL server, so you don’t need to make any application changes.

Setup

Through the rest of this article, I’ll assume you’ve already got a working Heroku application setup, and that you’re using the Heroku CLI tool.

Bootstrapping a Database

To get started, let’s bootstrap a new PostgreSQL server instance. In the example below, we’ll create a free (shared) PostgreSQL database:

$ heroku addons:add shared-database
----> Adding shared-database to deploydjango... done, v2 (free)

If we wanted to create a larger (paid) database, we could run the following command:

$ heroku addons:add heroku-postgresql:ronin
----> Adding heroku-postgresql:ronin to deploydjango... done, v3 ($200/mo)
      Attached as HEROKU_POSTGRESQL_IVORY
      The database should be available in 3-5 minutes
      Use `heroku pg:wait` to track status

To verify that our database(s) exist, we can now run the pg:info command:

$ heroku pg:info
=== SHARED_DATABASE (DATABASE_URL)
Data Size    (empty)
=== HEROKU_POSTGRESQL_GREEN_URL
Plan         Ronin
Status       preparing
Data Size    -1 B
Tables       -1
PG Version   ?
Created      2012-04-20 07:26 UTC
Conn Info    "host=ec2-yy-yy-yy-yy.compute-1.amazonaws.com
             port=5432 dbname=yyyyy
             user=yyyyy sslmode=require
             password=yyyyy"
Maintenance  not required

As you can see from the output above, our Heroku application now has two databases defined, a free (shared) database, and a paid database.

Note

You can find a complete list of available Heroku databases plans on their pricing page.

As you can also see–there are obviously some differences between the shared database we created, and the paid database:

  • The shared database is created insantly (there is no delay), while the paid database is provisioned on the fly, and takes a minute or so to become available.
  • The paid database lists connection info while the shared database does not. This is because paid databases allow you to directly connect to them using the psql tool like you would with any normal PostgreSQL database.

Configuring Django to Use PostgreSQL

Now that we’ve got a database running, let’s configure our Django site to use it!

Like all other Heroku addons–when we created our database in the previous section, Heroku added a couple environment variables to our application, which specify our newly created database information. Let’s quickly take a look:

$ heroku config
DATABASE_URL                => postgres://xxxxx:xxxxx@ec2-xx-xx-xx-xx.compute-1.amazonaws.com/xxxxx
HEROKU_POSTGRESQL_GREEN_URL => postgres://yyyyy:yyyyy@ec2-yy-yy-yy-yy.compute-1.amazonaws.com:5432/yyyyy
SHARED_DATABASE_URL         => postgres://xxxxx:xxxxx@ec2-xx-xx-xx-xx.compute-1.amazonaws.com/xxxxx

As you can see, we’ve now got 3 environment variables defined. One for each our our databases, and one extra variable, DATABASE_URL. The DATABASE_URL variable is a special variable, in that its only purpose is to provide a standardized ‘default’ database for your application.

At the moment, it looks like our shared database (SHARED_DATABASE_URL) is set as the default database (DATABASE_URL). If we wanted to set our larger (and more performant) database (HEROKU_POSTGRESQL_GREEN_URL) as the default, we could do so by running the pg:promote command:

$ heroku pg:promote HEROKU_POSTGRESQL_GREEN
-----> Promoting HEROKU_POSTGRESQL_GREEN to DATABASE_URL... done

$ heroku config
DATABASE_URL                => postgres://yyyyy:yyyyy@ec2-yy-yy-yy-yy.compute-1.amazonaws.com:5432/yyyyy
HEROKU_POSTGRESQL_GREEN_URL => postgres://yyyyy:yyyyy@ec2-yy-yy-yy-yy.compute-1.amazonaws.com:5432/yyyyy
SHARED_DATABASE_URL         => postgres://xxxxx:xxxxx@ec2-xx-xx-xx-xx.compute-1.amazonaws.com/xxxxx

Now our ronin database is the default!

The next thing we need to do is tell Django to use our new Heroku database. What we’re going to do is tell Django to use our DATABASE_URL database just like we would any other database, with one exception: instead of hard-coding in our database credentials–we’ll simply grab them from the environment!

# settings.py
from os import environ
from urlparse import urlparse

if environ.has_key('DATABASE_URL'):
    url = urlparse(environ['DATABASE_URL'])
    DATABASES['default'] = {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': url.path[1:],
        'USER': url.username,
        'PASSWORD': url.password,
        'HOST': url.hostname,
        'PORT': url.port,
    }

Also: don’t forget to add psycopg2 to your requirements.txt file, since the psycopg2 library is required for Django to interface with PostgreSQL:

# requirements.txt
psycopg2==2.4.5
...

Now that we’ve got Django configured to use whichever database is currently set to DATABASE_URL, we can easily switch our primary database without changing a single line of code!

Destroying a Database

If you’d like to remove a database that you’ve already provisioned, you can do so via the addons:remove command:

$ heroku addons:remove HEROKU_POSTGRESQL_GREEN
 !    WARNING: Potentially Destructive Action
 !    This command will affect the app: deploydjango
 !    To proceed, type "deploydjango" or re-run this command with --confirm deploydjango

> deploydjango
----> Removing HEROKU_POSTGRESQL_GREEN from deploydjango... done, v9 ($200/mo)

Heroku bills for database usage by the second, so as soon as your database has been removed, you’ll stop being charged.

Advanced

This section covers more advanced PostgreSQL topics. Feel free to skip around to whatever sections are of interest.

Creating Read Slaves

Creating read slaves is a popular way to help scale read requests across a cluster of database servers. Luckily, Heroku makes this process extremely simple with their follow feature.

Let’s assume your application currently has a single database, HEROKU_POSTGRESQL_GREEN. In order to create a new read slave database, you can use the --follow option when creating your new slave databse:

$ heroku addons:add heroku-postgresql:ronin --follow HEROKU_POSTGRESQL_GREEN
----> Adding heroku-postgresql:ronin to deploydjango... done, v7 ($200/mo)
      Attached as HEROKU_POSTGRESQL_AMBER
      Follower will become available for read-only queries when up-to-date
      Use `heroku pg:wait` to track status

The newly created database, HEROKU_POSTGRESQL_AMBER, will now automatically stay up-to-date with its master database, HEROKU_POSTGRESQL_GREEN.

Using Heroku’s follow feature, you can create as many read slaves as you like.

Note

While Heroku ensures your new slave database will follow its master, there will be some replication delay. This means that newly written data to the master database may take several seconds to reach your slave database.

Creating a Duplicate Database

In many situations, the ability to create a duplicate database can be extremely useful:

  • You want to run tests against your production data, but you don’t want to expose your production database to your testing code.
  • You want to test database migrations before applying them to production.
  • You want a copy of a production database just incase something bad happens.

Duplicating a database using Heroku is really simple thanks to Heroku’s fork feature. To duplicate (fork) a copy of your production database, HEROKU_POSTGRESQL_GREEN, you can create a new database using the --fork option:

$ heroku addons:add heroku-postgresql:ronin --fork HEROKU_POSTGRESQL_GREEN
----> Adding heroku-postgresql:ronin to deploydjango... done, v7 ($200/mo)
      Attached as HEROKU_POSTGRESQL_AMBER
      Database will become available after it completes forking
      Use `heroku pg:wait` to track status

Once the new database is up and running, you’ll be able to use it like any other master database–you can perform write queries, give it a read slave, whatever you want.

Note

Forked databases do NOT stay up-to-date with the database they were forked from.

Promoting a Slave Database to a Master Database

Let’s say you’re in a situation where you need to make one of your read slaves writable (as a master). Heroku makes this process extremely simple using their unfollow command.

Let’s assume you’ve got a read slave named HEROKU_POSTGRESQL_AMBER. To make it writable, all we do is run the pg:unfollow command, like so:

$ heroku pg:unfollow HEROKU_POSTGRESQL_AMBER
 !    HEROKU_POSTGRESQL_AMBER will become writable and no longer
 !    follow HEROKU_POSTGRESQL_BRONZE. This cannot be undone.

 !    WARNING: Potentially Destructive Action
 !    This command will affect the app: deploydjango
 !    To proceed, type "deploydjango" or re-run this command with --confirm deploydjango

> deploydjango
Unfollowing... done

View Slow Queries

A big part of writing good code is knowing when you do things wrong. Slow database queries, in particular, are probably the greatest cause of poor site performance.

Luckily for us, Heroku’s logging system allows you to easily view a stream of slow query logs directly from your console. Any query that takes longer than 50ms to execute will be dumped into the log output.

To view the streaming logs, you can simply run: heroku logs --tail --ps postgres. If you’d like to just view the most recent logs, you can run the same command without the optional --tail argument: heroku logs --ps postgres.

Backing Up Your Database

Backing up your database is incredibly important, but not always easy to do. To combat the complexity of backing up your database, Heroku created their extremely useful pgbackups addon.

Incredibly, all of the pgbackups addon plans are completely free!

To get started, we’ll use the largest available backup plan: auto-month. This plan will:

  • Automatically backup your DATABASE_URL database every night.
  • Retain 7 daily backups.
  • Retain 5 weekly backups.
  • Retain 10 manual backups.

To get it going, install the addon:

$ heroku addons:add pgbackups:auto-month
----> Adding pgbackups:auto-month to deploydjango... done, v14 (free)
      You can now use "pgbackups" to backup your databases or import an external backup.

Once you’ve got the addon installed, Heroku will start automatically backing up your primary database each day.

To view a list of your available backups, you can run the pgbackups command. Since we just installed the auto-month backup plan, however, we’ve got no existing backups:

$ heroku pgbackups
 !    No backups. Capture one with `heroku pgbackups:capture`.

Let’s fix that right now by forcing a manual backup:

$ heroku pgbackups:capture

SHARED_DATABASE (DATABASE_URL)  ----backup--->  b001

Capturing... done
Storing... done

Now, if we run the pgbackups command again, we should see:

$ heroku pgbackups
ID   | Backup Time         | Size       | Database
-----+---------------------+------------+----------------
b001 | 2012/04/20 23:09.50 | 918.0bytes | SHARED_DATABASE

As time progresses, and we gradually get more backups, they’ll show up in the pgbackups listing.

Note

In the backup example above, we backed up our default database (DATABASE_URL). If you’d like to backup another database, you can do so by specifying its name, for example: heroku pgbackups:capture HEROKU_POSTGRESQL_GREEN.

As a quick note: all manually captured backups display with a b prefix in the heroku pgbackups listing, while all automatically captured backups display with an a prefix.

Downloading Your Backups

So you’ve got a few database backups, and you’d like to download them–no problem! Assuming you’ve got the following backup available:

$ heroku pgbackups
ID   | Backup Time         | Size       | Database
-----+---------------------+------------+----------------
b001 | 2012/04/20 23:09.50 | 918.0bytes | SHARED_DATABASE

You can create a publicly available download URL from Amazon S3 (which is good for 10 minutes) by running:

$ heroku pgbackups:url b001
"https://s3.amazonaws.com/hkpgbackups/app4444444@heroku.com/b001.dump?AWSAccessKeyId=test&Expires=1334989747&Signature=Juy%2FKGQvJ5zPLknUUOFSEAoEGyc%3D"

Which you can then download directly to your computer using wget, curl, or any other standard tool.

Restoring From a Backup

While backing up your database is always a good idea, your backups will do you no good unless you know how to restore from the backup when things go wrong.

Assuming you have the following backup available:

$ heroku pgbackups
ID   | Backup Time         | Size       | Database
-----+---------------------+------------+----------------
b001 | 2012/04/20 23:09.50 | 918.0bytes | SHARED_DATABASE

You can restore your backup (b001 in this case) to any of your available databases using the pgbackups:restore command:

$ heroku pgbackups:restore SHARED_DATABASE b001

SHARED_DATABASE (DATABASE_URL)  <---restore---  b001
                                                SHARED_DATABASE
                                                2012/04/20 23:09.50
                                                918.0bytes

 !    WARNING: Potentially Destructive Action
 !    This command will affect the app: deploydjango
 !    To proceed, type "deploydjango" or re-run this command with --confirm deploydjango

> deploydjango

Retrieving... done
Restoring... done

Note

The restoration process make take a few minutes if your backups are large.

Resources

PostgreSQL is a complex database system, and no matter how much experience you have with it, there will always be a lot left to learn. Below are some resources that I’ve found helpful, and I hope you do too.

If you were to read your way through the resources below from start to finish you should have not only a solid grasp on PostgreSQL itself, but also a firm understanding of all the provided features, how PostgreSQL works behind the scenes, and how to manage and administrate your own PostgreSQL databases in high performance environments.

Books

The following books are absolutely excellent, and highly recommended to anyone who works with PostgreSQL on a daily basis.

Tutorials

The only tutorial worth reading is the official PostgreSQL documentation. While it is verbose and a bit difficult to read through sequentially, it is very well written and is the first place you should look for answers to your questions.

If you’ve never worked with PostgreSQL before, or lack a basic understanding of relational databases, you should probably read through one of the books linked above instead, as you’ll have a much easier time learning the system and getting started.