The Problem With UUIDs
Summary
TLDRThe video script discusses the complexities and considerations of using Universally Unique Identifiers (UUIDs) as primary keys in MySQL databases. It highlights that while UUIDs ensure uniqueness across systems, they can negatively impact database performance due to increased storage requirements and the challenges of maintaining a balanced B+ tree structure with random UUIDs. The script also explores various UUID versions, with a focus on time-based versions like V1 and V6, and contrasts them with random variants like V4. It further delves into alternative ID types, such as Snowflake IDs and Nano IDs, which offer more structured approaches to generating unique identifiers. The video emphasizes the importance of choosing the right ID system for database architecture, considering factors like security, performance, and storage efficiency.
Takeaways
- 📈 **Sponsor Acknowledgment**: The video is sponsored by Planet Scale, which also created a blog post that serves as a starting point for the discussion on UUIDs.
- 🔄 **Multiple Attempts**: This is the third time the video has been filmed over two years, indicating the complexity and importance of accurately covering the topic.
- 🤔 **Performance Concerns**: Using UUIDs as primary keys in MySQL can hurt database performance due to the way B+ tree indexes need to be updated and balanced with random values.
- 📅 **UUID Versions**: There are five official and three proposed versions of UUIDs, each with different properties and use cases, highlighting the evolution and issues in the standard.
- 🚫 **UUID v2 Issues**: UUID version 2 is rarely used because replacing the low time segment with a POSIX user ID increases the chances of collisions.
- 🔢 **Randomness in UUID v4**: Version 4 UUIDs are almost entirely random, which can lead to storage and performance issues due to the randomness affecting the B+ tree balance.
- 🔬 **Technical Details**: The video dives into the technical aspects of how UUIDs are structured and the implications of using different versions as primary keys.
- 💾 **Storage Utilization**: Storing UUIDs requires significantly more storage space compared to auto-incrementing integers, which can impact database storage requirements.
- 🚀 **Best Practices**: The video suggests best practices for using UUIDs, such as using the binary data type and considering ordered UUID variants to mitigate performance and storage issues.
- 🛡️ **Security Considerations**: There's a discussion on the security implications of using sequential keys versus random UUIDs, including the risk of guessable IDs leading to unauthorized access.
- 🌟 **Alternative ID Types**: The video mentions alternative ID types like Snowflake IDs, ULIDs, and Nano IDs, which can be considered depending on the specific requirements and constraints of a system.
Q & A
What is the main topic of the video?
-The main topic of the video is the discussion of using UUIDs (Universally Unique Identifiers) as primary keys in MySQL databases and the associated performance implications.
Why did the video creator film the video multiple times?
-The video creator filmed the video multiple times because they wanted to ensure they got all the details about UUIDs correct and felt that previous attempts did not meet their standards for accuracy and comprehensiveness.
What is the significance of October 10th, 1568, in the context of UUIDs?
-October 10th, 1568, is significant because it marks the start of the Gregorian calendar, which is used as a reference point for the embedded timestamp within a UUID, specifically in version 1.
What are the potential issues with using UUIDs as primary keys in a MySQL database?
-Using UUIDs as primary keys can lead to performance issues due to the randomness of the values, which can cause problems with index rebalancing in B+ trees, increased storage utilization, and potential page splitting that is less efficient than with sequential keys.
What is the difference between UUID version 1 and version 6?
-UUID version 6 is nearly identical to version 1, but the key difference is that the bits used to capture the timestamps are flipped, meaning the most significant portions of the timestamp are stored first, which can make the UUIDs more sortable.
Why might someone choose to use a different identifier format other than UUIDs in a distributed system?
-Alternative identifier formats like Snowflake IDs, UL IDs, or Nano IDs might be chosen over UUIDs due to their ability to provide a more efficient and sortable unique identifier that can mitigate some of the performance and storage issues associated with UUIDs.
What is the recommendation for storing UUIDs to reduce storage requirements?
-Storing UUIDs in their native binary format as a binary(16) column can reduce the storage requirement down to 16 bytes, which is more efficient than storing them as a 36-character string.
null
-null
Why might the use of UUID version 4 lead to excessive storage usage in database indexing?
-UUID version 4 is randomly generated, which means that the values are not sequential. This randomness can lead to inefficient page utilization, with pages being only around 50% full, thus using significantly more storage space for the index.
What is the primary advantage of using time-based UUIDs like version 6 or 7?
-Time-based UUIDs like version 6 or 7 can guarantee uniqueness while keeping the generated values as close to sequential as possible, which can help avoid some of the page splitting issues and improve performance in database operations.
What is the concern regarding the use of sequentially ordered UUIDs from a security perspective?
-The concern is that sequentially ordered UUIDs can be guessed, which might lead to security vulnerabilities such as unauthorized password resets or unauthorized access if an attacker can predict the ID sequence.
What is the significance of the B+ tree data structure in the context of MySQL databases?
-The B+ tree data structure is significant in MySQL databases because it is used to create indexes that allow for efficient querying of data. The structure keeps data organized in a way that enables quick searches, insertions, and updates, which is crucial for database performance.
What is the role of Planet Scale in this video?
-Planet Scale is a sponsor of the video and has created a blog post that the video creator uses as a starting point for the discussion on UUIDs. The video creator emphasizes that Planet Scale has not influenced the content of the video but provided a resource that helped shape the discussion.
Outlines
🎬 Introduction to UUIDs and Sponsorship Acknowledgement
The video's introduction discusses the topic of UUIDs (Universally Unique Identifiers) and the challenges in creating a video about them. The speaker mentions filming the video multiple times due to the complexity of the subject. The video is sponsored by Planet Scale, who also authored a blog post on the problems with using a UID primary key in SQL, which serves as the basis for the discussion. The speaker clarifies that Planet Scale has not influenced the content and that their sponsorship is appreciated for its educational value.
🔑 UUIDs as Primary Keys and Their Impact on Database Performance
This paragraph delves into the potential issues with using UUIDs as primary keys in MySQL databases. It explains how UUIDs are designed for uniqueness across systems and the problems that can arise when they are used incorrectly, such as performance degradation. The speaker shares their preference for ordered IDs, like time-based UUIDs, for their usefulness in tracking creation dates. The paragraph also touches on the various versions of UUIDs, highlighting the differences and issues with each, including the proposed versions 6, 7, and 8.
📈 UUIDs vs. Auto Incrementing Integers in MySQL
The speaker contrasts UUIDs with auto incrementing integers in terms of their use as primary keys in MySQL. They discuss the performance implications of inserting new records and how MySQL's B+ tree indexing structure is affected by the randomness of UUIDs, which can lead to longer rebalancing times and impact user experience. The storage utilization is also compared, noting that UUIDs require significantly more storage space than integers, which can have a substantial impact on database size and performance.
🚀 Best Practices for Using UUIDs as Primary Keys
The paragraph outlines best practices for using UUIDs as primary keys to minimize negative side effects. It suggests using the binary data type to reduce storage requirements and considering ordered UUID variants to maintain a more sequential order, thus avoiding page splitting issues. The speaker also recommends considering other identifier types like Snowflake IDs, UL IDs, and Nano IDs, which can offer similar benefits with fewer drawbacks.
🛠️ UUIDs, Security, and Alternatives in Database Design
The speaker discusses the security implications of using sequential IDs versus random UUIDs, mentioning that guessable IDs can lead to unauthorized access but also stressing that system security should not rely solely on ID complexity. They highlight the importance of choosing the right ID type for database design, considering factors like performance, storage, and uniqueness. The paragraph concludes with a recommendation to explore alternatives to UUIDs, such as cuid V2, and a mention of a useful tool for assessing ID collision probabilities.
📝 Conclusion and Final Thoughts on UUIDs in Databases
In the concluding paragraph, the speaker summarizes the key points discussed in the video about the trade-offs of using UUIDs as primary keys in MySQL. They emphasize the availability of multiple UUID versions and alternative ID types that can address these trade-offs. The speaker expresses relief at having completed the video and thanks Planet Scale for their sponsorship, which provided valuable educational content on the topic.
Mindmap
Keywords
💡UUID (Universally Unique Identifier)
💡Planet Scale
💡Primary Key
💡B+ Tree
💡Storage Utilization
💡CUID (Content-Addressable Unique Identifier)
💡Versioning of UUIDs
💡Auto-Incrementing Integer
💡Indexing
💡Snowflake ID
💡Nano ID
Highlights
The video discusses the complexities and considerations of using UUIDs (Universally Unique Identifiers) as primary keys in MySQL databases.
Sponsor Planet Scale has created a blog post that the video uses as a starting point for the discussion on UUIDs.
The video corrects the misconception that UUIDs are absolutely unique, noting that collisions can occur.
Version 8 of UUIDs allows vendor-specific implementations, which the video humorously refers to as 'barely a standard'.
The video explains the structure of different versions of UUIDs, such as V1 being time-based and V4 being random.
Using UUIDs as primary keys can hurt database performance due to the way MySQL B+ tree indexes need to be updated and balanced.
Storing UUIDs requires significantly more space compared to traditional 32-bit integers, impacting storage efficiency.
The video debates the pros and cons of using ordered UUID variants like version 6 or 7 for better performance.
The video discusses the security implications of using sequentially ordered IDs, such as the potential for unauthorized access.
Alternative ID types like Snowflake IDs, UL IDs, and Nano IDs are presented as options to UUIDs.
The video provides a detailed comparison between auto-incrementing integers and UUIDs in terms of performance and storage.
The use of UUIDs can lead to inefficient storage utilization due to the randomness of the IDs.
The video highlights the importance of choosing the right type of ID for database architecture to balance uniqueness, performance, and storage.
The video mentions the use of MySQL's built-in UUID functions and their potential benefits.
The video concludes by emphasizing the importance of making an informed decision when choosing an ID system for database design.
The video humorously refers to the complexity of UUID versions as a 'Kingdom Hearts playthrough chart', indicating the difficulty in keeping track of them.
The video provides a critical perspective on the use of UUIDs, suggesting that while they offer uniqueness, they may not always be the best choice for primary keys in every scenario.
Transcripts
this video is sponsored by Planet scale
I've wanted to do a video about uyu IDs
for a while so much so that this is my
third time filming it over 2 years last
time I filmed it was on my couch in my
old living room and the time before that
was a random stream filming like kind of
what I'm doing here neither of these
were good enough for me to release
because there's just so many details
about uyu IDs I wanted to make sure I
got right and even though one of those
videos was edited I never ended up
posting it and I wanted to take the time
to get it exactly right thankfully I
don't have to because one of my sponsors
Planet scale actually went and did it
for me they made a phenomenal blog post
the problem with using a uid primary key
in my SQL that I want to use as a
starting point to have this discussion
Planet skill does sponsor the channel
and they are sponsoring this video but
they have given me no input whatsoever I
don't even think they know what I'm
covering here they just wrote this blog
post knowing I wanted it as something to
cover and now I get to cover it so yeah
I know Planet scale is a bit of a spicy
topic right now but they're still a
phenomenal resource for a lot of
Education stuff and this is a good blog
post to start with so let's go in the
problem with using a uid primary key in
MySQL understand the different versions
of uu IDs and why using them as a
primary key in MySQL can hurt DB
performance interesting is going to be a
performance thing I'm even more Curious
now I'll lay out my biases ahead of time
I think having the ability to order your
IDs is particularly nice when you can
and there's a lot of cases where it's
nice to know when something was created
and using the time as a way to make your
unique values more uniques Not a Bad
Thing cuid V2 in particular is something
that I'm a bit of a shill of that I've
been pushing pretty hard for a while now
and I'll probably be coming back to cuid
2 during this because it is my goto my
preferred standard for IDs although I
have rolled my own both with my own
crazy requirements as well as following
along with the things that planet scale
recommends but I'm curious what they're
recommending here universally unique
identifiers also known as uu IDs are
designed to allow developers to generate
unique IDs in a way that guarantees
uniqueness without knowledge of other
systems this is the key piece here where
uu IDs are meant to be so unique that if
you generate them in multiple places at
the same time the likelihood of a
collision is effectively zero that
across the whole universe you'll never
generate the same ID twice there have
now been multiple times that uu IDs
collided so this is just sadly not true
when you also add in the fact that
random number generators aren't always
particularly random and those are being
used to then generate these uu IDs
things can get messy fast yeah sadly the
universal promise of uu IDs for many
reasons has not been true for a bit
thankfully these things are changing I
already see people in chat dropping that
the version 8 fixes some of the issues I
was talking about around time I'll open
this up so we have it to reference in a
bit but as someone else already pointed
out as well if you need eight versions
of the standard to make it right it's
probably the standard that's awful to be
determined let's keep reading these are
especially useful in distributed
architecture where you have a number of
systems and databases responsible for
creating records again very handy when
you have lots of things writing to your
database at once if they're all writing
with the same ID generator
those can never Collide if two things
are trying to write to the database at
the same time with the same ID you're
you might think that using uyu
IDs as a primary key in a database is a
great idea when used incorrectly they
can drastically hurt DB performance and
here we go let's see how this can affect
things in this article you learn about
the downsides of uids as primary keys in
your mySQL database I am almost positive
this isn't just for MySQL either way I
promise this video will be valuable even
if you're not a MySQL Dev and you're
just using other SQL Solutions or maybe
not even SQL at all I want to talk about
these idas they're very interesting and
contrasting the different solutions is
something we can all learn from the many
versions of uu IDs at the time of this
writing there are five official versions
of uu IDs and three proposed versions so
that V8 by the way that's not official
yet six seven and eight are all proposed
still yes there's three future versions
proposed that we haven't accepted yet V1
is known as a time-based uid and can be
broken down as follows we have the Time
Low Time mid Time Low inversion clock
sequence inversion so this is where some
of the random comes from and then the
general random node while much of modern
Computing uses the Unix Epoch time which
is January 1st 1970 God imagine how
convenient it would be if you were
actually born on January ver 1970 like
every system would be very trivial it
would almost be like free to
mathematically compute your birthday
anyways since that's the base uids
actually use a different date of October
10th 15 why why would you use your own
thing I didn't even know this what which
is the date that the Gregorian calendar
started to be more widely what did
anybody hear know already what October
10th 1568 was the start of the calendar
okay somebody said it in chat that's the
start of the calendar one person knew it
this feels like it's meant to be a like
a a pop quiz type thing not a technical
implementation detail what the what
the is that like that's ice spice's
original
birthday like this feels like somebody
putting in like the well actually the
calendar started in 1568 so that that
should be our starting point you never
know and somebody might need to generate
an ID in the 1600s even though we're
inventing this in the '90s yeah anyways
I don't want to talk about goryan
calendars I want to talk about uu ID
which we'll do here the embedded time
stamp within a uuu ID grows in 100 nond
increments from this date which is then
used to set the Time Low Time mid and
time high segments of the uid the third
segment of the uuid contains the version
as well as the time high and it occupies
the first character of that segment this
is true for all versions of uu IDs as
shown in subsequent examples the reserve
portion is also known as the variant of
the uid which determines how the bits
within the uuid are meant to be used
finally the last segment of The UU ID is
the node which is the unique address for
the system generating that uu ID most of
the little bits here like the reserved
and the node not super important but the
node is how it knows which of your
systems was the one doing the generation
so about V2 next version two of the uid
implemented a change compared to version
one yeah no obviously version two
is changing things from version one duh
That's How versions work anyways imagine
if it didn't change anything if version
two implemented no changes compared to
version one that would be more
interesting that's something I'd put a
sentence in here for but since it
changed something let's see what it
changed the low time segment of the
structure was replaced with a posix user
ID the theory was that these uu IDs
could be traced back to the user account
that generated them this sounds like a
terrible idea this sounds like an
atrocious idea I don't know why they
would do this since the low time segment
is where much of the variability of uu
IDs reside replacing the segment
increases the chances of collision as a
result this version of uu ID is rarely
used yeah I would hope so this is a
disaster uuid V3 and V5 versions three
and five of uids are very similar I
dislike that three and five are similar
and four isn't this kind of feels like
what happened with like the Final
Fantasy versions where like most Final
Fantasy games have nothing in common but
then like seven and 10 have two
characters that are overlapping even
though different universes this is chaos
versions three and five of uu IDs are
very similar the goal of these versions
is to allow uu IDs to be generated in a
deterministic way so that given the same
information the same uu IDs can be
generated I'm getting more scared as
this goes these implementations use two
pieces of information a namespace which
is itself a Hu ID oh boy nested U IDs
yeah it's going to go well so we have
the name space as well as a name these
values are run through a hashing
algorithm to generate 128bit value that
can be represented as a uu ID the key
differences between these versions is
that version 3 uses an md5 hashing
algorithm and version five uses an S1 oh
boy so how is four different cuz four is
what I see basically everyone using
version four is known as the random
variant because as the name implies the
value of the uid is almost entirely
random how how did we let this happen by
the way how are V3 and V5 including
meaningful information and we just have
this one version between the two though
it's like yeah what if we just made it
all random that will solve the problem
and there's since been a six and a seven
proposed and I hope we get to those in
here so version 4 is the one that almost
everyone uses because it's entirely
random the exception to this is the
first position in the third segment of
the uuid which will always be a four to
signify the version used so if you see a
four in this third section and it's
eight digits four digits four digits
this is a four it's probably a uid
before version six and here's where we
get into those ones that are proposed
version six is nearly identical to
version why we have to diagram this so
v1's become V6 V3 has for the most part
become V5 and then V4 is what we
actually use does this correctly explain
the history of uuid versions up to V6 so
far because it only gets worse from here
this is our starting point back to this
article version six is nearly identical
to version one the only difference is
that the bits used to capture the time
stamps are flipped meaning the most
significant portions of the time stamp
are stored first so time high is now in
the front revolutionary huge the graphic
below demonstrates you yeah yeah cool
makes sense the main reason for this is
to create a value that's compatible with
version one while allowing these values
to be more sortable since the most
significant portion of the time stamp is
up front this was an attempt to make uid
sortable and it failed if you want a
sortable uuid you probably want a CU uid
we go there in a bit though now for
version 7 version 7's also a time-based
uid variant but it integrates the more
commonly used Unix Epoch timestamp
instead of the Gregorian calendar used
by version one progress be cool if they
use like primagen birthday or something
that's pretty close to the like Epoch
time for uh Unix right those would be
pretty close the other key difference is
that the node which is the value based
on the systems generating the uid is
replaced with Randomness making these
uids less trackable back to their Source
also cool including the system that's
generating the uid is disgusting very
happy that they have ditched that but
now we have to update our diagram to
include V7 which um is I guess most
accurately represented by breaking off
this fixes
cool keep going uid V8 version 8 is the
latest version that permits vendor
specific implementations while adhering
to RFC standards oh boy let's solve the
problem by letting everyone introduce
their own problems that sounds great the
only requirement for uid V8 is that the
versions be specified in the first
portion of the third segment as all
other
versions so uui v8's barely even a
standard if I'm understanding this
correctly apparently I broke where V3 is
pointing yeah V3 should oops oh star to
do there fix that so V3 is still
pointing at V5 and I have to add V8
which when I add it's going to move all
the arrows we love that so fix that so
it's pointing there I'll fix this again
I want to properly emphasize that this
one's like
imaginary this is barely a
standard more A vibe cool I think that
accurately communicates how I'm feeling
thus far I wanted to maybe change the
font family of V8 but I there's nothing
here that is what I I want for it this
looks like a Kingdom Hearts playthrough
chart you're not wrong yeah uh I'm not
going to go any further if you know you
know anyways air from V4 to V7 does does
V4 go to V7 it seemed like it's based on
version one and V4 is its own thing that
doesn't have any of the date time
yeah it's almost entirely random the
only thing that in V4 that isn't random
is this digit everything else is random
and all of the others time is faked in
which is why V4 is so different oh so so
V7 is version one plus version 4 with
the random so I guess yeah the other key
difference is that the node the values
based on is replaced with Randomness so
there a little more random there's just
the node is random which the node isn't
a big part okay I guess the no's a big
part fine the no's a big part I
guess by technicality that V4 does kind
of also go into V7 cool looking better
by the minute guys sure that yeah this
will defin defitely not confuse anybody
anyways let's keep reading this post
we're going to be here forever yeah the
version 8 is a Vibe it's not a standard
let's see how we're actually using this
uu IDs and MySQL using U IDs mostly
guarantees uniqueness across all systems
in your architecture so you might be
inclined to use them as primary keys for
your records be aware that there are
several trade-offs to doing so when
compared to an auto incrementing integer
interesting that they're comparing to
Auto incrementing integers primarily we
we'll see where this goes I'm not
necessarily the biggest fan of just
using ins as our database IDs but we'll
see what they have to say insert
performance whenever a new record is
inserted into a table in MySQL the index
associated with the primary Keys needs
to be updated so querying the table is
performant indexes in MySQL take the
form of a B+ tree which is a multi-layer
data structure that allows queries to
quickly find the data that they need
there's a bunch of words for their going
through the IDS in fancy data structure
formats to actually find the things
you're looking for the following diagram
shows what this looks like with six
entries with values from 1 to six a
query comes asking for five my equal
will start at the root and know from
there if it has to Traverse to the right
for what it's looking for cool this is a
standard binary tree instead of a B+
tree the key difference is a B plus tree
the leaf nodes contain a reference to
the data while on a b tree the leaf
nodes do not cool be balanced not binary
sorry brain fart it happens make fun of
me in the comments anyways B tree we
have four is where we're at and it
points in the two different directions
one is to where 1 2 and three are and
the other is to where five and six are
so we're looking for five and we're
starting here we know we need to go to
the right and we'll find it pretty quick
cool if values 7 through 9 are added my
SQL will split the right node and
rebalance accordingly so now once we
have these new nodes we split the tree
so we have four and seven and we know
four points to 1 2 3 on the left and
five and six on the right and seven
points 5 six on the left and 8 n on the
right process is known as page splitting
and the goal is to keep the b+3
structure balanced so that MySQL can
quickly find the data it's looking for
with sequential values this process is
relatively straightforward however when
Randomness is introduced into the
algorithm it can take significantly
longer for my to rebalance the tree on a
high volume database this can hurt users
experience as MySQL tries to keep the
tree in Balance learning things already
oh boy yeah things for us to think about
for our databases cool for more
information about how B+ trees work we
have a dedicated video in our MySQL for
developers course good to know yeah RP
Aon who created this wonderful course
that is fully freely available highly
recommend checking it out if you haven't
it's one of the best ways to learn SQL
and it's dope that they made this free I
have actually considered doing this
whole thing myself as a video just to
show you guys how little I actually know
about SQL and how much I can do with it
but uh someday anyways oh we even have
people in chat saying that that video by
Aaron is dope and worth watching so very
worth checking out higher storage
utilization this I'll be honest is
something I care about a lot less
overall but I'm curious how important
they think it is all primary keys in my
SQL are indexed by default an auto
incrementing integer will consume 32
bits of storage per value compare this
with uu IDs if store in a compact binary
format a single uu ID would cons consume
128 bits on dis already that's four
times the consumption of a 32-bit
integer If instead you chose to use a
more human readable string based
representation each uid could be stored
as a care 36 consuming a whopping 688
bits per uuid I can't remember last time
I broke a sweat over 688 bits anyways
this means that each record would store
over 20 times more data than the 32-bit
integer it's a record it has other
things on it yes it adds up but if I
have like a URL or a username or a name
on these things it can get big this does
get bad for indexing though which is I'm
sure what we're about to get to in
addition to the default index created on
the primary key secondary indexes will
also consume more space this is because
secondary indexes use the primary key as
a pointer to the actual row meaning they
need to be stored with the index this is
a good point in order for a an index
that will help you find data in your
database faster to work it needs to have
all of the IDS as well so if you now for
all of your indexes have to store 20
times more data your database indes are
suddenly massive and also have to be
held in memory so that they can be
performed and if these are moving to
disc cuz they're too big now you're
getting yes I understand this can
lead to significant increases in storage
requirements for your database depending
on how many indexes are created on
tables using uuids as the primary key
this point alone is big enough that I'm
I'm swaying my my feelings here you're
making good points finally page
splitting as described in the previous
section can also negatively impact
storage utilization as well as
performance INB assumes that the primary
key will increment predictably either
numerically or lexicography
lexicographically I was going to assume
this is lexor graphically but the lexico
well I have to worry about prime reading
this article if true INB will fill the
pages to about 94% of the page size
before creating a new page when the
primary key is random the amount of
space utilized for each page can be as
low as 50% also huge due to this using
EU IDs that incorporate Randomness can
lead to excessive usage of pages to
store the index okay I've seen this this
is funny because this is a case against
the one good type of ID which was V4 and
now it's saying that the randomness
means that you have to store
significantly more data Jesus yeah good
to
know best ways to use a uid primary key
with SQL you absolutely need to use uids
as the unique identifier for records in
your table there's a few best practices
you can follow to minimize the negative
side effects of doing so first is you
can use the binary data type well uids
are often sometimes often sometimes I
just thought I was having a stroke for a
second
uh this is not a sentence I feel yeah no
I don't think the author had a stroke I
think the author made a typo or deleted
the wrong word and it made me feel like
I was having a stroke but yeah while
uids are often represented as 36
character strings they can also be
represented in their native binary
format as well if converted to a binary
value you can store it in a binary 16
column which reduces the storage
requirement down to 16 bytes this is
still quite large compared to a 32-bit
integer but certainly better than
storing the uid is a care
36 bits versus bites by the way I'm not
telling you what the difference is there
that's your problem hopefully you know
that if you made it this far in anyways
here we have a table of uu IDs U ID is
character 36 binary binary 16 inserted
with both and we can see that as binary
it's able to represent it much smaller
and take a Plus data good to know next
option use an ordered uid variant using
a uid version that supports ordering can
mitigate some of the performance and
storage impacts of using uids by making
the generated values more sequential and
by doing such you are able to avoid some
of the page splitting issues described
earlier I don't really recommend using
uu IDs as a primary ID unless you're
doing something like this I'm going to
use this opportunity to plug CU ID which
has been my go-to for this type of thing
for a while if I want like a big unique
ID that can be written from any of many
different places without having to worry
about collisions Co ID is proven to be a
very very good and secure solution which
allows me to sort properly so it handles
a lot of these problems but it's also
handling a ton of uniqueness which
significantly lowers the chance that
you'll have cion you'd have to reach
some crazy multi- trillion trillion
number to reach 50% chance of having a
collision insane oh I forgot uh it's not
good for sequential because it's not
technically sortable as a V2 apparently
they disagree about the
sortabad at Fields instead according to
this author since we have created at
fields we don't have to worry about
sorting this is something that is a big
disagreement they even call out here
that one of the reasons to use
sequential Keys is to avoid ID
fragmentation which can require a large
amount of dis space for databases with
billions of Records however at such a
large scale modern systems often use
cloud native databases that are designed
to handle terabytes of data efficiently
and at a low cost Additionally the
entire database may be stored in memory
providing fast random access Lookout
performance not if you can't fit it in
memory because you're using a giant ID
standard and have to index it all over
the place yeah this is where these
disagree if we go back to this diagram
if you can't sort the ID
you can't split the table up logically
based on an ID so if somebody wanted to
look up ID 4 and they have to find that
through your crazy tree and you can't
use the Sorting of the IDS to do that
now you have to look through everything
to find your way there utter chaos as
such I yeah I don't like that these
disagree because no offense to the cuid
guys CU as you all know I massively
respect y'all and I use cuid for a bunch
of the fact that they have backed
out of having sort ability is scary
there is a fair point that they can
cause hot spots in the database like if
you have a system that generates a large
number of IDs in a short period the IDS
will be generated in sequential order
causing the tree to become unbalanced
which will force frequent rebalancing
the counter point there is that the
rebalancing is much more expensive if
the IDS can't be like quickly identified
as which way they go if the value is
going in a random you have to rebalance
much more often because as they said
when a primary key is random the amount
of space utilized for each page can be
as low as 50% so you're rebalance Bing
almost twice as much off of this alone
so like yeah it's cheaper to do the
rebalance with sequential IDs than not
I'm I'm not vibing with the cuid post
anymore and it's crazy how quickly my
own stance on these things is changing
as I read this so yeah I happy that I'm
reading this and that we're learning so
back to this bit of using a ordered uid
variant because apparently my cuid V2 is
not something I can rely on anymore
using a uid version that supports
ordering can mitigate some of the
performance and storage impacts of using
uids by making the generated values more
sequential which avoids some of the page
splitting issues described earlier even
when they are being generated on
multiple systems time based uid such as
version 6 or 7 can guarantee uniqueness
while keeping values as close to
sequential as possible the exception to
this is uid V1 which has the least
significant portion of the time stamp
first uid V1 was such a fumble in so
many ways I can't believe they recovered
from that funny enough the main reason
people are recommending cid2
specifically the original maintainers of
cv1 is that by having the IDS be
sortable and auto incremented they
believe this introduces security issues
like unauthorized password resets via
guessable IDs unauthorized access to
gitlab Via guessable IDs unauthorized
password resets via gu ID all things
that our friend Eva absolutely loves I
haven't seen her say anything in the
last few seconds so I'm going to go
through her chat history where she said
some stuff about this earlier love uu
IDs from a Dev perspective I like
incremental IDs as a security pentester
not because they're good just because
they're easy to pentest yeah it's a lot
easier to test things when you can guess
IDs just by generating them yourself
that said if your service can be
compromised through an ID being known
you need to rethink the way that service
works because it should take a lot more
data than that IDs are still ID even if
random yes if your IDs are all it takes
to compromise your system then you need
to fix that not the ID system itself I
get it now we want super small
performance sortable unique access
across the world support a fast access
insert removal Etc okay simple yeah
super easy go build it shouldn't be that
hard at all anyways other options that
are provided here are using the built-in
MySQL uid functions probably an option
worth considering I'm sure that their
implementation is better never mind how
does anyone take the standard seriously
I yeah while it is not a great practice
to use them by themselves there's a
helper function in my SQL called uid to
bin not only does this function convert
the string value to Binary but you can
use the option swap flag which will
reorder the timestamp portion to make
the resulting binary more sequential
[Music]
I'm going
insane and the other option use an
alternative ID Type U IDs are not the
only type of identifier that provides
uniqueness with a distributed
architecture consider that they were
first created in 1987 there has been
plenty of time for other professionals
to propose different formats such as
snowflake IDs UL IDs and even Nano IDs
which is what they use at Planet scale I
actually really like this post I don't
know if I've covered this before but I
use Nano IDs for a bunch of random stuff
because of this post they even have a
little Link in here the Nano ID
Collision tool that is super useful as
you restrict things because something I
did with RS is I didn't want to include
lowercase letters because I didn't want
users to have to like if they're typing
in a a page based on the ID which was a
real use case for what we were building
I didn't want them to have to worry
about those characters so I deleted all
of that so it's just letters and numbers
and I also think I got rid of zero and O
because they look like numbers I think I
got rid of I too for the same reason to
make it a little easier to to read this
but now that we have that we can pick
different lengths and see how long it
takes for a 1% probability of a
collision and you can even change the
speed at which you're generating over
that time too super super handy tool if
you want to generate your own Nano IDs
and like know how realistic it is to hit
a collision I liked this a lot so check
out that post if you're curious about
more there so yeah they recommend all of
those options all of which make sense
did not know snowflake IDs were just
numbers and like really big ones
interesting
conclusion using a uid primary key in
MySQL can nearly guarantee uniqueness in
a distributed system however it comes
with several trade-offs luckily with the
many versions available as well as
several Alternatives you have options
that can better address some of these
trade-offs after reading this article
you should be in a better position to
make an informed decision about the ID
that you choose to use while
architecting your next database yeah
this is pretty solid let me go post the
uid chaos on Twitter quick
thank you to the Kingdom Hearts
referencer earlier that will live in my
head rentree forever that's all I to say
about yuu IDs I'm so happy this video is
finally done and people can stop asking
for it let me know in the comments how
you feel and thank you again to Planet
scale for letting me finally put this
one to rest once and for all and more
importantly showing me that cuid has the
same problems as well and until next
time peace NS
Browse More Related Video
![](https://i.ytimg.com/vi/eQ3eNd5WbH8/hq720.jpg)
How indexes work in Distributed Databases, their trade-offs, and challenges
![](https://i.ytimg.com/vi/3zw3PWP46Yc/hq720.jpg?sqp=-oaymwEmCIAKENAF8quKqQMa8AEB-AH-CYAC0AWKAgwIABABGH8gEyhKMA8=&rs=AOn4CLAQlzr_POj2CJYmsLqGgq0ZM_RxYg)
How to Build a Streaming Database in Three Challenging Steps | Materialize
![](https://i.ytimg.com/vi/9mdadNspP_M/hq720.jpg)
Which Database Model to Choose?
![](https://i.ytimg.com/vi/F8k_XIEhKWo/hq720.jpg)
Explaining SSDs: The Price/Performance Trade-off
![](https://i.ytimg.com/vi/vi-qMzxxtYs/hq720.jpg)
98 Percent Cloud Cost Saved By Writing Our Own Database
![](https://i.ytimg.com/vi/xYwPOKJnqC0/hq720.jpg)
Computer Network Devices in Hindi
5.0 / 5 (0 votes)