Primary Key Best Practices

I follow a few rules:

1. Primary keys should be as small as necessary. Prefer a numeric type because numeric types are stored in a much more compact format than character formats. This is because most primary keys will be foreign keys in another table as well as used in multiple indexes. The smaller your key, the smaller the index, the less pages in the cache you will use.

2. Primary keys should never change. Updating a primary key should always be out of the question. This is because it is most likely to be used in multiple indexes and used as a foreign key. Updating a single primary key could cause of ripple effect of changes.

3. Do NOT use “your problem primary key” as your logic model primary key. For example passport number, social security number, or employee contract number as these “primary key” can change for real world situations.

On surrogate vs natural key, I refer to the rules above. If the natural key is small and will never change it can be used as a primary key. If the natural key is large or likely to change I use surrogate keys. If there is no primary key I still make a surrogate key because experience shows you will always add tables to your schema and wish you’d put a primary key in place.

** Definitions **
Natural Key: using various data fields to construct the primary key, using 1 or more columns.

Surrogate Key: Using an identity column, or a home-grown version where only 1 column is used and is not based off of any user-related data. It is important though, in using surrogate keys to have the appriopriate non-clustered (i.e. covering) index or performance will suffer, especially on larger systems.

** Guidelines **
The primary key must uniquely identify each record.
A record’s primary-key value can’t be null.
The primary key-value must exist when the record is created.
The primary key must remain stable—you can’t change the primary-key field(s).
The primary key must be compact and contain the fewest possible attributes.
The primary-key value can’t be changed.

For lookup tables with a low volume of rows, you can use a 3 CHAR code as the primary key as this takes less room than an INT, but the performance difference is negligible. Other than that, I would always use an INT unless you have a reference table that perhaps has a composite primary key made up from foreign keys from associated tables.

** Related Articles **
The Cost of GUIDs as Primary Keys
http://www.informit.com/articles/article.aspx?p=25862&seqNum=7

The great primary-key debate
http://www.techrepublic.com/article/the-great-primary-key-debate/

Bad habits to kick : putting an IDENTITY column on every table
http://sqlblog.com/blogs/aaron_bertrand/archive/2010/02/08/bad-habits-to-kick-putting-an-identity-column-on-every-table.aspx

Views – 2358

Leave a Reply