By ElGrig


2019-03-14 11:22:10 8 Comments

In a production system on SQL Server all ids (mostly PKs) in all tables are generated automatically and I am informed that they are unique globally. I mean no 2 ids are the same in the database, even if tables are different. I want to know how this can be done? If there are multiple ways please list them all. Thanks.

3 comments

@Michael Green 2019-03-14 11:34:52

Back in the day we had an ID table. Single column, single row with an int value. Every transaction first updated that table to get a new value, which was then used wherever it was needed. This was, of course, a great source of concurrency errors.

Later, sequences were introduced. A single sequence used across the whole database would show the behaviour you describe. There's an example in the documentation that illustrates this:

CREATE TABLE Audit.ProcessEvents
(
    EventID int DEFAULT (NEXT VALUE FOR Audit.EventCounter), -- same sequence, different table
    <other columns>
);

CREATE TABLE Audit.ErrorEvents
(
    EventID int DEFAULT (NEXT VALUE FOR Audit.EventCounter), -- same sequence, different tables
    <other columns>
);

I've edited the example to highlight this usage.

An identical outcome could be achieved by generating the globally unique numbers in the application code, before they are passed to the database. Were I to implement this I imagine it would be as a static method of some utility class compiled into the executable (though other implementations would be possible). Say the application needs to write a customer's details to the database. As it is marshaling the customer name, address, telephone number etc. it also generates a new global ID. The ID is passed to the INSERT statement (or stored procedure) as just another parameter value.

Whether the ID values are produced by the application architectural tier or the database tier would depend on the specific design considerations. If the app can scale out coordination between instances becomes problematic. After an application re-start the code must figure out the next value to use. The DB server has these features, and others, written into it already.

What I would definitely not do is have the application call the database just for the next ID, then marshal that with the business data into an INSERT. That's too many round-trips to the database when only one is required.

@David Spillett 2019-03-14 12:15:11

For unique ID values in the same table, I presume you are aware of the commonly used IDENTITY option, usually using a 32-bit value starting from 1 (so for defining a PK this way something like ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY). You can of course use a larger (BIGINT) if the table might need more than 2,147,483,647 rows.

SQL Server has the option of defining your own sequence which can be shared between multiple tables, potentially all of them. See https://docs.microsoft.com/en-us/sql/t-sql/statements/create-sequence-transact-sql for details. You then define each ID column as ID INTEGER DEFAULT NEXT VALUE FOR The_sequence_You_Defined PRIMARY KEY. There are some things to be aware of here though. Unlike with IDENTITY you are not blocked from dropping in any old value (that isn't already present) as the sequence value is applied by the default only if one is not explicitly given, which could be problematical. Also, using a sequence performs a little more slowly and can become a bottleneck as all tables rely on the same object, though both of these issues are only a concern if your database sees a lot of insert activity in a short space of time. NEXT VALUE FOR The_sequence_You_Defined can be used elsewhere too (i.e. SET @someVariable = NEXT VALUE FOR The_sequence_You_Defined;) which means that if you need IDs to be generated elsewhere in your application logic you can have it done this way (in fact I've seen this used even for a single identity, not just sharing a sequence between multiple objects).

A more hacky approach could be to use a BIGINT for each identity column and start each at a different multiple of (for example) 4,000,000,000. This will work in other DBs and avoids the bottleneck issue, but does double the size of your key and could give you a maintenance nightmare if you accidentally define two tables with IDs starting at the same point. You may wish to add check constraints to make sure an identity value defined this way can't overflow into another value's number space, which adds back in some performance concern.

If you don't mind the larger key, then UUIDs are useful and have the added advantage of being unique between databases (all databases, as the name suggests) not just between tables in one database. As with a sequence these are applied with a default constraint, i.e. ID UNIQUEIDENTIFIER NOT NULL PRIMARY KEY DEFAULT NEWID(). These are 128-bit values though, twice the size of BITINT and four times the size of a "standard" 32-bit INTEGER. If you are concerned about the potential for extra fragmentation caused by the randomness of v4 UUIDs you can use NEWSEQUENTIALID() instead of NEWID() which should still be unique enough (the chance of a collision in the lifetime of this galaxy is vanishingly small).

@Tomáš Na'vi Koválik 2019-03-14 11:54:43

First of all, I have to mention that I did not work with SQL Server, so I cannot point out some specific features.

I have two concepts of how this can be done on my mind:

  1. One sequence to rule them all: This concept is easy as it sounds. You have one sequence which is responsible for generating IDs for every row inserted to any table. In my last job, we were using this concept. Implementation depends on a lot of circumstances, so I will let you decide. But one way is to have some stored procedure which will retrieve the next value of the sequence before any insert.
  2. Timestamp: You can somehow incorporate timestamps into your IDs

In the SQL Server world you can refer to this: NEWID() documentation - newid is compliant with RFC4122

@ddunn801 2019-03-14 15:04:00

How would a timestamp prevent duplicate ids? Events can happen simultaneously.

@dvlsg 2019-03-14 23:11:01

Timestamps don't prevent them, but timestamp + other uniqueness is a fairly common tactic for building unique identifiers. NEWSEQUENTIALID() in sql server utilizes this, for example.

@Tomáš Na'vi Koválik 2019-03-15 07:51:04

Yeah... that's what I meant. :-)

Related Questions

Sponsored Content

1 Answered Questions

1 Answered Questions

1 Answered Questions

3 Answered Questions

1 Answered Questions

1 Answered Questions

2 Answered Questions

2 Answered Questions

[SOLVED] Deadlocks From Locks on Same Temporary Tables In Different Processes

1 Answered Questions

Sponsored Content