One of Heroku’s oldest and most popular addons is their PostgreSQL addon. Did you know that Heroku’s hosted PostgreSQL service is “the largest and most reliable Postgres service in the world”?
The Heroku PostgreSQL service allows you to easily create, destroy, resize, backup, restore, and scale your database infrastructure both from the command line (via the Heroku CLI app) as well as from the web (using Heroku’s standalone site).
Heroku’s PostgreSQL service is an excellent choice for you if:
Through the rest of this article, I’ll assume you’ve already got a working Heroku application setup, and that you’re using the Heroku CLI tool.
To get started, let’s bootstrap a new PostgreSQL server instance. In the example below, we’ll create a free (shared) PostgreSQL database:
$ heroku addons:add shared-database
----> Adding shared-database to deploydjango... done, v2 (free)
If we wanted to create a larger (paid) database, we could run the following command:
$ heroku addons:add heroku-postgresql:ronin
----> Adding heroku-postgresql:ronin to deploydjango... done, v3 ($200/mo)
Attached as HEROKU_POSTGRESQL_IVORY
The database should be available in 3-5 minutes
Use `heroku pg:wait` to track status
To verify that our database(s) exist, we can now run the pg:info command:
$ heroku pg:info
=== SHARED_DATABASE (DATABASE_URL)
Data Size (empty)
=== HEROKU_POSTGRESQL_GREEN_URL
Plan Ronin
Status preparing
Data Size -1 B
Tables -1
PG Version ?
Created 2012-04-20 07:26 UTC
Conn Info "host=ec2-yy-yy-yy-yy.compute-1.amazonaws.com
port=5432 dbname=yyyyy
user=yyyyy sslmode=require
password=yyyyy"
Maintenance not required
As you can see from the output above, our Heroku application now has two databases defined, a free (shared) database, and a paid database.
Note
You can find a complete list of available Heroku databases plans on their pricing page.
As you can also see–there are obviously some differences between the shared database we created, and the paid database:
Now that we’ve got a database running, let’s configure our Django site to use it!
Like all other Heroku addons–when we created our database in the previous section, Heroku added a couple environment variables to our application, which specify our newly created database information. Let’s quickly take a look:
$ heroku config
DATABASE_URL => postgres://xxxxx:xxxxx@ec2-xx-xx-xx-xx.compute-1.amazonaws.com/xxxxx
HEROKU_POSTGRESQL_GREEN_URL => postgres://yyyyy:yyyyy@ec2-yy-yy-yy-yy.compute-1.amazonaws.com:5432/yyyyy
SHARED_DATABASE_URL => postgres://xxxxx:xxxxx@ec2-xx-xx-xx-xx.compute-1.amazonaws.com/xxxxx
As you can see, we’ve now got 3 environment variables defined. One for each our our databases, and one extra variable, DATABASE_URL. The DATABASE_URL variable is a special variable, in that its only purpose is to provide a standardized ‘default’ database for your application.
At the moment, it looks like our shared database (SHARED_DATABASE_URL) is set as the default database (DATABASE_URL). If we wanted to set our larger (and more performant) database (HEROKU_POSTGRESQL_GREEN_URL) as the default, we could do so by running the pg:promote command:
$ heroku pg:promote HEROKU_POSTGRESQL_GREEN
-----> Promoting HEROKU_POSTGRESQL_GREEN to DATABASE_URL... done
$ heroku config
DATABASE_URL => postgres://yyyyy:yyyyy@ec2-yy-yy-yy-yy.compute-1.amazonaws.com:5432/yyyyy
HEROKU_POSTGRESQL_GREEN_URL => postgres://yyyyy:yyyyy@ec2-yy-yy-yy-yy.compute-1.amazonaws.com:5432/yyyyy
SHARED_DATABASE_URL => postgres://xxxxx:xxxxx@ec2-xx-xx-xx-xx.compute-1.amazonaws.com/xxxxx
Now our ronin database is the default!
The next thing we need to do is tell Django to use our new Heroku database. What we’re going to do is tell Django to use our DATABASE_URL database just like we would any other database, with one exception: instead of hard-coding in our database credentials–we’ll simply grab them from the environment!
# settings.py
from os import environ
from urlparse import urlparse
if environ.has_key('DATABASE_URL'):
url = urlparse(environ['DATABASE_URL'])
DATABASES['default'] = {
'ENGINE': 'django.db.backends.postgresql_psycopg2',
'NAME': url.path[1:],
'USER': url.username,
'PASSWORD': url.password,
'HOST': url.hostname,
'PORT': url.port,
}
Also: don’t forget to add psycopg2 to your requirements.txt file, since the psycopg2 library is required for Django to interface with PostgreSQL:
# requirements.txt
psycopg2==2.4.5
...
Now that we’ve got Django configured to use whichever database is currently set to DATABASE_URL, we can easily switch our primary database without changing a single line of code!
If you’d like to remove a database that you’ve already provisioned, you can do so via the addons:remove command:
$ heroku addons:remove HEROKU_POSTGRESQL_GREEN
! WARNING: Potentially Destructive Action
! This command will affect the app: deploydjango
! To proceed, type "deploydjango" or re-run this command with --confirm deploydjango
> deploydjango
----> Removing HEROKU_POSTGRESQL_GREEN from deploydjango... done, v9 ($200/mo)
Heroku bills for database usage by the second, so as soon as your database has been removed, you’ll stop being charged.
This section covers more advanced PostgreSQL topics. Feel free to skip around to whatever sections are of interest.
Creating read slaves is a popular way to help scale read requests across a cluster of database servers. Luckily, Heroku makes this process extremely simple with their follow feature.
Let’s assume your application currently has a single database, HEROKU_POSTGRESQL_GREEN. In order to create a new read slave database, you can use the --follow option when creating your new slave databse:
$ heroku addons:add heroku-postgresql:ronin --follow HEROKU_POSTGRESQL_GREEN
----> Adding heroku-postgresql:ronin to deploydjango... done, v7 ($200/mo)
Attached as HEROKU_POSTGRESQL_AMBER
Follower will become available for read-only queries when up-to-date
Use `heroku pg:wait` to track status
The newly created database, HEROKU_POSTGRESQL_AMBER, will now automatically stay up-to-date with its master database, HEROKU_POSTGRESQL_GREEN.
Using Heroku’s follow feature, you can create as many read slaves as you like.
Note
While Heroku ensures your new slave database will follow its master, there will be some replication delay. This means that newly written data to the master database may take several seconds to reach your slave database.
In many situations, the ability to create a duplicate database can be extremely useful:
Duplicating a database using Heroku is really simple thanks to Heroku’s fork feature. To duplicate (fork) a copy of your production database, HEROKU_POSTGRESQL_GREEN, you can create a new database using the --fork option:
$ heroku addons:add heroku-postgresql:ronin --fork HEROKU_POSTGRESQL_GREEN
----> Adding heroku-postgresql:ronin to deploydjango... done, v7 ($200/mo)
Attached as HEROKU_POSTGRESQL_AMBER
Database will become available after it completes forking
Use `heroku pg:wait` to track status
Once the new database is up and running, you’ll be able to use it like any other master database–you can perform write queries, give it a read slave, whatever you want.
Note
Forked databases do NOT stay up-to-date with the database they were forked from.
Let’s say you’re in a situation where you need to make one of your read slaves writable (as a master). Heroku makes this process extremely simple using their unfollow command.
Let’s assume you’ve got a read slave named HEROKU_POSTGRESQL_AMBER. To make it writable, all we do is run the pg:unfollow command, like so:
$ heroku pg:unfollow HEROKU_POSTGRESQL_AMBER
! HEROKU_POSTGRESQL_AMBER will become writable and no longer
! follow HEROKU_POSTGRESQL_BRONZE. This cannot be undone.
! WARNING: Potentially Destructive Action
! This command will affect the app: deploydjango
! To proceed, type "deploydjango" or re-run this command with --confirm deploydjango
> deploydjango
Unfollowing... done
A big part of writing good code is knowing when you do things wrong. Slow database queries, in particular, are probably the greatest cause of poor site performance.
Luckily for us, Heroku’s logging system allows you to easily view a stream of slow query logs directly from your console. Any query that takes longer than 50ms to execute will be dumped into the log output.
To view the streaming logs, you can simply run: heroku logs --tail --ps postgres. If you’d like to just view the most recent logs, you can run the same command without the optional --tail argument: heroku logs --ps postgres.
Backing up your database is incredibly important, but not always easy to do. To combat the complexity of backing up your database, Heroku created their extremely useful pgbackups addon.
Incredibly, all of the pgbackups addon plans are completely free!
To get started, we’ll use the largest available backup plan: auto-month. This plan will:
To get it going, install the addon:
$ heroku addons:add pgbackups:auto-month
----> Adding pgbackups:auto-month to deploydjango... done, v14 (free)
You can now use "pgbackups" to backup your databases or import an external backup.
Once you’ve got the addon installed, Heroku will start automatically backing up your primary database each day.
To view a list of your available backups, you can run the pgbackups command. Since we just installed the auto-month backup plan, however, we’ve got no existing backups:
$ heroku pgbackups
! No backups. Capture one with `heroku pgbackups:capture`.
Let’s fix that right now by forcing a manual backup:
$ heroku pgbackups:capture
SHARED_DATABASE (DATABASE_URL) ----backup---> b001
Capturing... done
Storing... done
Now, if we run the pgbackups command again, we should see:
$ heroku pgbackups
ID | Backup Time | Size | Database
-----+---------------------+------------+----------------
b001 | 2012/04/20 23:09.50 | 918.0bytes | SHARED_DATABASE
As time progresses, and we gradually get more backups, they’ll show up in the pgbackups listing.
Note
In the backup example above, we backed up our default database (DATABASE_URL). If you’d like to backup another database, you can do so by specifying its name, for example: heroku pgbackups:capture HEROKU_POSTGRESQL_GREEN.
As a quick note: all manually captured backups display with a b prefix in the heroku pgbackups listing, while all automatically captured backups display with an a prefix.
So you’ve got a few database backups, and you’d like to download them–no problem! Assuming you’ve got the following backup available:
$ heroku pgbackups
ID | Backup Time | Size | Database
-----+---------------------+------------+----------------
b001 | 2012/04/20 23:09.50 | 918.0bytes | SHARED_DATABASE
You can create a publicly available download URL from Amazon S3 (which is good for 10 minutes) by running:
$ heroku pgbackups:url b001
"https://s3.amazonaws.com/hkpgbackups/app4444444@heroku.com/b001.dump?AWSAccessKeyId=test&Expires=1334989747&Signature=Juy%2FKGQvJ5zPLknUUOFSEAoEGyc%3D"
Which you can then download directly to your computer using wget, curl, or any other standard tool.
While backing up your database is always a good idea, your backups will do you no good unless you know how to restore from the backup when things go wrong.
Assuming you have the following backup available:
$ heroku pgbackups
ID | Backup Time | Size | Database
-----+---------------------+------------+----------------
b001 | 2012/04/20 23:09.50 | 918.0bytes | SHARED_DATABASE
You can restore your backup (b001 in this case) to any of your available databases using the pgbackups:restore command:
$ heroku pgbackups:restore SHARED_DATABASE b001
SHARED_DATABASE (DATABASE_URL) <---restore--- b001
SHARED_DATABASE
2012/04/20 23:09.50
918.0bytes
! WARNING: Potentially Destructive Action
! This command will affect the app: deploydjango
! To proceed, type "deploydjango" or re-run this command with --confirm deploydjango
> deploydjango
Retrieving... done
Restoring... done
Note
The restoration process make take a few minutes if your backups are large.
PostgreSQL is a complex database system, and no matter how much experience you have with it, there will always be a lot left to learn. Below are some resources that I’ve found helpful, and I hope you do too.
If you were to read your way through the resources below from start to finish you should have not only a solid grasp on PostgreSQL itself, but also a firm understanding of all the provided features, how PostgreSQL works behind the scenes, and how to manage and administrate your own PostgreSQL databases in high performance environments.
The following books are absolutely excellent, and highly recommended to anyone who works with PostgreSQL on a daily basis.
The only tutorial worth reading is the official PostgreSQL documentation. While it is verbose and a bit difficult to read through sequentially, it is very well written and is the first place you should look for answers to your questions.
If you’ve never worked with PostgreSQL before, or lack a basic understanding of relational databases, you should probably read through one of the books linked above instead, as you’ll have a much easier time learning the system and getting started.