Why using GUIDs as Primary Keys on a Clustered Index (Table) is not a good idea and some alternative approaches

Here is a great blog post by Kimberly Tripp on why using GUIDs as a PRIMARY KEY on a CLUSTERED index is a bad idea: GUIDs as PRIMARY KEYs and/or the clustering key

Summarized, the 2 big issues with using GUIDs as PK’s is:

1. Fragmentation of the clustered index is costly
2. The data type used is unnecessarily wide. (4x to be exact)

If you *have* to use a GUID as the PK, use NEWSEQUENTUALID() as opposed to NEWID(). It is not infallible in it’s ability to generate *sequential* keys, but it will reduce the fragmentation some. However, there is yet an even better way of designing a table in the case where you HAVE to use GUIDs as a PK. The idea is to place the PK on a NON-CLUSTERED index instead of the default CLUSTERED index. It’s called using a surrogate identity row identifier. This approach is far more performant as each new row is still INSERTed at the end of the data file, since the clustered index (which determines the physical ordering of the table) is always incremental. This greatly reduces fragmentation and queries are more performant as the result.

Here is an example of how you can use a surrogate identity INT for the CLUSTERED index and use the GUID as the PRIMARY KEY on a NONCLUSTERED index and save most of the performance hit:

First let’s create the table:

-- Example how to use a surrogate ID for the clustered index and a GUID as a PK on a non-clustered index
CREATE TABLE dbo.myTable
(
  rowID INT IDENTITY (1, 1) UNIQUE CLUSTERED,
  SomeImportantID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,
  col1 INT NOT NULL,
  col2 NVARCHAR(100),
  col3 BIT
)

GO

-- Example how to use a surrogate ID for the clustered index and a GUID as a PK on a non-clustered index

CREATE TABLE dbo.myTable

(

rowID INT IDENTITY (1, 1) UNIQUE CLUSTERED,

SomeImportantID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY,

col1 INT NOT NULL,

col2 NVARCHAR(100),

col3 BIT

)

Then let’s insert a 1000 rows, generating a GUID for each row as the PK:

-- INSERT 1000 rows, generating new a new GUID for each row
INSERT INTO myTable (anotherID, col1, col2, col3)
  VALUES ((SELECT NEWID()), 0, N'Some Data string', 0);

GO 1000;

-- INSERT 1000 rows, generating new a new GUID for each row

INSERT INTO myTable (anotherID, col1, col2, col3)

VALUES ((SELECT NEWID()), 0, N'Some Data string', 0);

GO 1000;

Next let’s confirm that the indexes are correct:

-- Shows the indexes types for the myTable
SELECT OBJECT_NAME(OBJECT_ID) TableObject,
  [name] IndexName,
  [Type_Desc]
FROM sys.indexes
WHERE OBJECT_NAME(OBJECT_ID) = 'myTable';

GO

-- Shows the indexes types for the myTable

SELECT OBJECT_NAME(OBJECT_ID) TableObject,

[name] IndexName,

[Type_Desc]

FROM sys.indexes

WHERE OBJECT_NAME(OBJECT_ID) = 'myTable';

Views – 3087

LincolnBlog.net

…my observations in the world of SQL Server, PowerShell, Linux and PostgreSQL etc…

Why using GUIDs as Primary Keys on a Clustered Index (Table) is not a good idea and some alternative approaches