Rails models with teeth and database constraints

by ewout

One of the things I do not like about ruby on rails is the arrogance it came with. Of course, creating a blog application from scratch in 15 minutes was cool at the time, but that does not mean the author  of the framework holds all truth. As a computer scientist I appreciate the ruby language and the elegant rails API. Writing tests for your code is a best practice that rails probably introduced to a lot of people. What I do not understand is that rails rejected a best practice that is quite old and proven: database integrity.

There has been quite some discussion in the rails community about enforcing integrity at the database level. To me, the discussion is simple: anything that helps me find or prevent bugs is gold. But even if your code is free of bugs, a ruby on rails database in production will become inconsistent given enough time. This is not rocket science: if there are 10′s or even 100′s of application servers using the database concurrently without knowing about each other, something will go wrong sometimes. At that point I’d rather have an ActiveRecord::StatementInvalid exception raised then letting the problem silently corrupt our customer’s data.

The rails argument against enforcing database integrity is that business logic should be DRY and not both in the ruby class and the database. With migrations, rails is violating its own fundamentals:

  • attributes are defined outside of the ruby class
  • associations are defined twice, once with belongs_to & co macros and once with a foreign key in the database

The attributes of a model and their data types are a core part of the model logic. One can also see these as constraints: you cannot store a string in a date field. Some constraints make sense at the database level, some make more sense at the application level. This article explains my as-dry-and-consistent-as-possible use of database constraints.

:null => false

One of the simplest constraints at the database level is the requirement for a value to be present. In the migrations, this can be specified with the :null => false parameter. One would think it is a replacement for validates_presence_of, which is wrong especially when using a mysql database not in strict mode. In general we want the user-facing validation in ActiveRecord, because then we get the nice validation errors. However, there are cases where the database constraint is more appropriate.

Required (parent) associations

Suppose we are building an invoicing application and we want to make sure a line item cannot exist without an invoice. LineItem.validates_presence_of :invoice will not do the trick. When saving a new invoice with a few new line items in it, the validation of these new line items will fail because the invoice is not present. Of course the invoice_id cannot be set because it is not saved yet.

Solution: t.belongs_to :invoice, :null => false

Boolean attributes

To stick with the invoice application, an invoice could have a boolean attribute “paid”. The boolean attribute can be 0, 1 or NULL. If NULL means unspecified that is fine, but in this case we want the invoice to be either paid or not paid. When creating an invoice it will generally not be paid, so we could interpret NULL as not paid. However, Invoice.all(:conditions => {:paid => false}) will not return the invoices that have a NULL value for paid, which can lead to subtle bugs.

Solution: t.boolean :paid, :null => false, :default => false

Uniqueness

Validates_uniqueness_of is not safe for concurrent updates. This is well documented in the rails API, where one of the suggestions is to use a unique index. No need to be redundant here. However, there is a hidden use case for unique indexes: one-to-one associations.

has_one

Suppose we have a Person model to store personal information and contact details. There is also a User model for authenticating users of the application. As good citizens of DRY-land, we use the Person model to keep the personal information about our users, where User.belongs_to :person and Person.has_one :user. Nothing is stopping the application now from creating two user accounts for the same person. One to one associations should be backed with unique indexes.

Solution: t.index :person_id, :unique => true

Referential integrity

While the data type of each attribute is defined in the migrations, foreign keys just get the integer data type. Rails made it look prettier: t.belongs_to :person, but in reality this is just syntactic sugar for t.integer :person_id. There is nothing that stops this integer from pointing to a row (object) that does not exist. Referential integrity makes the database aware of the associations and is just a natural extension of the data types. (As a bonus, you get these nice arrows next to a foreign key in sequel pro, which really ease table navigation while debugging.)

There are a number of rails plugins that help define foreign key constraints in the migrations. Foreigner and migration helpers both require you to explicitly state each foreign key constraint to be added. Sexy migrations looks nice but is a bit dated and I could not check out the repository. We use a patched version of the redhill foreign key migrations plugin. The advantage of this plugin is foreign key constraints are automagically added to the database when using the proper rails conventions. However, we encountered some issues when implementing foreign key constraints in our application.

:dependent => :destroy

Foreign key constraints will by default prevent a record from being removed when there are references to it. Deleting a record will then raise an error. It is therefore important to specify what should happen with the associated records when a record is destroyed. This can be done with the :dependent option for associations in ActiveRecord or in the database itself. We prefer to keep this logic in rails and use the database as a safety net: when rails fails to destroy a child record when destroying a parent record, the referential integrity will be violated and the database will complain.

parent_id

Suppose we are storing a tree of categories in a table. When removing a number of categories from the table in a single query, mysql determines the order in which they are removed. This may result in a violated foreign key constraint when a parent is deleted before one of its children. Postgresql has the option of adding foreign keys that are only checked at the end of a transaction, which would prevent this kind of problem. In mysql, the solution is to nullify the child foreign keys when deleting a parent.

Solution: t.belongs_to :parent, :on_delete => :set_null

Performance

There is overhead associated with checking the referential integrity of a database. However, we did not notice any performance decline in our application. One thing to watch though: mysql automatically creates indexes for all foreign keys. We use userstamp, which adds two foreign keys creator_id and updater_id to every record. Since we never delete users from our system (we merely deactivate them), it seemed overkill to add two indexes to every table just for the userstamps.

Conclusion

Data tends to outlive its applications. A tight data model is a good foundation for an application and can save you a lot of trouble when migrating the data to a different system (years later). Database constraints can make your models even tighter, and enforce integrity rules that are hard to enforce in a multi-process application environment. Don’t be afraid, use them!

Fork me on GitHub