I see heated discussion all over the internet about whether a GUID is a poor choice for a primary key. Usually, these conversations all revolve around SQL Server. A lot of people say that the size difference is an issue. (GUID = 16bytes, BIGINT = 8bytes, INT = 4bytes). Is this due to physical HDD space, or page space (fragmentation?) For sake of conversation, lets say we have a 'PEOPLE' table:
How bad of a design is that over using a BIGINT or an INT? We have a lot of new development going on in a legacy application, and we are tossing GUIDs on every table. We do have to take into account replication (which is why we went with GUIDs as opposed to INTs and 'Hold Keys', as we call them) Thanks! |
I would generally prefer GLOBAL AUTOINCREMENT over GUIDs as the latter should perform much worse - they are stored as BINARY(16) which won't perform as fast as 32 or 64 bit numbers on common platforms. So, I personally agree with the reasoning in the docs. I would add that for many applications, even in a replication setup, INT may still be enough for a PK type if there are "only" some hundred remotes and you expect, say only a few ten thousand rows per table. That will do in many applications where data is added manually. E.g., a few years ago, we have ported a client-server app with one central database into a SQL Remote setup by simply redefining the PK type (int) to a DEFAULT GLOBAL AUTOINCREMENT with a 200000 partition size. That gives use the opportunity to support some thousand remotes - much more than we have now. And the apps (and all the FKs) hadn't to be changed at all as they still work with INT fields. Needless to say, I was really fond of that easy migration:) Of course YMMV. In addition to Breck's response, I even prefer comparing INT/BIGINT values starting with 1 to those starting with higher values. At the moment, I have to check the migration of a system with miminal 8digit PK values, and that is really harder than with smaller values. But it's definitely much easier than comparing GUIDs. How does the partition size work if you have two tables defined with DEFAULT GLOBAL AUTOINCREMENT? Does the database manage two different ranges, one for each table or does it use the same partition size(range) across all tables? @Calvin: You can define the partition size for each column independently, s. the DEFAULT GLOBAL AUTOINCREMENT [(partition-size)] syntax. However, as with DEFAULT AUTOINCREMENT, different columns don't get distinct values, i.e. each DEFAULT AUTOINCREMENT column will start with 1, 2, 3. With GLOBAL AUTOINCREMENT, they would start with (<global_database_id> * partition_size) + 1. So GLOBAL AUTOINCREMENT is aimed to generate distinct values for different databases, but not for different tables/columns - as that is not necessary for a RDBMS. However Forget my last "However"... This is fantastic! I can't believe we never researched this stuff! Lets say I have a parent/child table relationship. If the user inserts the parent, then multiple children at once and hits Save, how would I be able to assign the parent id to the children using this? Is there a method to obtain the next number BEFORE the data actually hits the database? Thanks alot for your help! Nevermind, figured it out - get_identity('table') |
GUIDs may be attractive if you don't want to use DEFAULT GLOBAL AUTOINCREMENT for avoiding primary key collisions in a replicating/synchronizing setup. IMO the big problem with GUIDs will be administrative... every time anyone has to track down a problem involving data they'll be faced with this giant un-readable un-typable and generally incomprehensible string of crap. Or dozens, hundreds, thousands of strings of crap. Think Windows registry. Pity the maintenance programmer. GLOBAL AUTOINCREMENT values also have their readability problems, but 100000001234 versus 200000000789 is not nearly as bad as {1ec17912-d2b4-48f9-9a47-5da6bbe695fb} versus {8dcb5d9f-5a52-43ef-82c3-2254e4d0c163}. And that's if you're lucky enough to see the dashes, which isn't always true when you're in a hurry. GLOBAL AUTOINCREMENT values are also somewhat more difficult to implement than GUIDs which, as you say, can be tossed around like road salt. The phrase "tossing GUIDs on every table" does make me somewhat nervous. But then again, I always get nervous when I hear "we are unconditionally [making some dramatic change] to every single [thing] in our application". Gives me the screaming abdabs :) Everything has unintended consequences. My advice, get promoted to management before the GUIDs go into production :) @Breck: By the way, are GUID values always distinct when created on the same computer (i.e. is it theoretically impossible to generate the same value twice or is this only "very improbable")? Comment Text Removed
Great answer, thanks! 2
@Volker: Pick the answer YOU like from http://stackoverflow.com/questions/39771/is-a-guid-unique-100-of-the-time @Breck: Thanks for the pointer - so the answer may be dependent on one own's sense of paranoia:) 1
Here is a trick I use all the time when I have to type out those big INt values... remember that 200000000789 = 2e11+789 and later is WAY easier to type correctly the first time. SELECT * FROM MYTABLE WHERE RECORD_ID = 2E11+789 Although a big int is 2e17, not 2e11 (I just copied one of breck's numbers) |
I have written about this subject in the past: here and here. There are no right answers with this discussion, only tradeoffs. In my opinion, the convenience of GUIDs are trumped by usability considerations. GUIDs as a key may make sense if the table has another candidate key, but all too often this other candidate key is itself a surrogate identifier and there doesn't, at least to me, seem to be much point in having a table with two surrogate identifiers. One will do. |