Short story: storing identities in an integer field instead of a bigint. Now wou...

nathanaldensr · on Nov 12, 2018

Consider using GUIDs/UUIDs instead of integers. GUIDs are friendlier for replication, fake/mock data, and other reasons. The only situations where I'd recommend against using GUIDs is if total row size is a concern (GUIDs are usually larger on disk than integers) or if there is some legacy requirement to use them (e.g., an integration with a third-party).

Another thing I've done is use GUIDs for the primary key, then have an identity/sequence column called something like Sequence, which retains the sequencing capabilities of a regular integer identity column.

otterley · on Nov 12, 2018

The flip side is that GUIDs take up a ton of space and are full of entropy, which makes them very inefficient to work with, are not cache-friendly, and not sortable. Certain database engines (e.g., InnoDB on MySQL) also limit the length of primary keys for performance reasons, which disqualifies GUIDs from being used. With clustering, short keys are better as they reduce IOPS required to fetch data.

See https://www.percona.com/blog/2006/10/03/long-primary-key-for... for an interesting discussion of the topic.

In general, it's unwise to make recommendations about what data types to use, especially for primary keys, without thorough research on their implications.

snowwolf · on Nov 12, 2018

I found https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-ca... has a good rundown of the pros and cons.

spullara · on Nov 12, 2018

Use twitter snowflake ids — they pulled their implementation as it was out of date but Sony has one:

https://github.com/sony/sonyflake

overcast · on Nov 12, 2018

Vast majority of projects will not reach two billion row tables(ID>0), but it should be something considered when planning, and regularly revisited in the future. BIGINT from the start uses twice as much space, if it's not needed, it's just a waste. You have to think about memory, indexing, scans in the future. Hundreds of millions of rows x 4 more bytes, is A LOT.