Why using GUIDs as Primary Keys on a Clustered Index (Table) is not a good idea and some alternative approaches

Here is a great blog post by Kimberly Tripp on why using GUIDs as a PRIMARY KEY on a CLUSTERED index is a bad idea: GUIDs as PRIMARY KEYs and/or the clustering key

Summarized, the 2 big issues with using GUIDs as PK’s is:

1. Fragmentation of the clustered index is costly
2. The data type used is unnecessarily wide. (4x to be exact)

If you *have* to use a GUID as the PK, use NEWSEQUENTUALID() as opposed to NEWID(). It is not infallible in it’s ability to generate *sequential* keys, but it will reduce the fragmentation some. However, there is yet an even better way of designing a table in the case where you HAVE to use GUIDs as a PK. The idea is to place the PK on a NON-CLUSTERED index instead of the default CLUSTERED index. It’s called using a surrogate identity row identifier. This approach is far more performant as each new row is still INSERTed at the end of the data file, since the clustered index (which determines the physical ordering of the table) is always incremental. This greatly reduces fragmentation and queries are more performant as the result.

Here is an example of how you can use a surrogate identity INT for the CLUSTERED index and use the GUID as the PRIMARY KEY on a NONCLUSTERED index and save most of the performance hit:

First let’s create the table:

Then let’s insert a 1000 rows, generating a GUID for each row as the PK:

GUIDasPKonNCindexRows

Next let’s confirm that the indexes are correct:

GUIDasPKonNCindex

Views – 1061