I Suck At SQL, Now My DB Tells Me How To Fix It
Summary
TLDRThis script showcases an enthusiastic developer's reaction to PlanetScale's newly introduced 'Schema Recommendations' feature. The developer highlights how this innovative tool automatically suggests ways to enhance database performance, reduce memory and storage consumption, and optimize schema design based on real-time production data analysis. With genuine excitement, they explore the feature's capabilities, testing it on their own database and explaining its inner workings. The script effectively conveys the developer's amazement at PlanetScale's groundbreaking approach, seamlessly integrating database management into the development workflow while empowering even non-experts with expert-level database experiences.
Takeaways
- ✨ PlanetScale introduced a new feature called 'Schema Recommendations' that automatically suggests schema improvements based on production database traffic to optimize performance, reduce memory/storage, and enhance the schema.
- 🔑 Schema Recommendations can suggest adding indexes for inefficient queries, removing redundant indexes, preventing primary key ID exhaustion, and dropping unused tables.
- 🌐 PlanetScale uses Kafka to process schema changes and trigger background jobs to examine the schema and query performance for potential recommendations.
- 🔍 The recommendations are based on query-level telemetry and analysis of column cardinalities, leveraging tools like VesSQL's query parser and MySQL's histogram analysis.
- 🛠️ Recommendations can be applied directly to a database branch for testing and safe migration, following PlanetScale's Git-like branching model for schema changes.
- 📈 An example showcased how adding an index based on a recommendation significantly improved query performance from nearly a second to instantaneous.
- 💰 PlanetScale's pricing model no longer charges based on rows read/written, addressing a previous issue where inefficient queries led to high costs.
- 🌐 The hobby tier of PlanetScale is no longer globally available, prompting the need for alternative free options in certain regions for future tutorials.
- 🤖 While not technically AI, Schema Recommendations acts as a co-pilot for databases, guiding users towards optimized schemas and performance.
- 🎯 PlanetScale aims to provide an expert-level database experience for non-experts through features like Schema Recommendations.
Q & A
What is the new feature introduced by PlanetScale that the video is discussing?
-The new feature introduced by PlanetScale is called 'Schema Recommendations'. It automatically provides recommendations to improve database performance, reduce memory and storage usage, and optimize the schema based on the production database traffic.
How does PlanetScale generate schema recommendations?
-PlanetScale uses a system called the 'Schema Adviser' which analyzes the schema and recent query performance statistics to generate tailored recommendations. It employs techniques like Query parsing, semantic analysis, and column cardinality extraction to identify inefficient queries and redundant indexes.
What are the different types of schema recommendations supported by PlanetScale?
-The four types of schema recommendations supported initially are: 1) Adding indexes for inefficient queries, 2) Removing redundant indexes, 3) Preventing primary key ID exhaustion, and 4) Dropping unused tables.
Why are indexes crucial for relational database performance?
-Indexes are crucial for relational database performance because without optimal indexes, the database may need to scan a large number of rows to satisfy queries that only match a few records, leading to performance issues.
What is the significance of the example discussed in the video where a lack of indexing led to high costs?
-The example illustrates the importance of proper indexing. In the example, a missing index on a 'vendor ID' column caused the database to read millions of rows for each query, leading to high costs of around $1,000 per day, even though the queries were fast. Adding the appropriate index drastically reduced the costs.
How does PlanetScale handle redundant indexes?
-PlanetScale scans the schema for redundant indexes every time it is changed. It suggests removing two types of redundant indexes: 1) Exact duplicate indexes, and 2) Left prefix duplicate indexes, where one index contains the same columns as the prefix of another index.
What is the purpose of the 'Preventing primary key ID exhaustion' recommendation?
-This recommendation aims to prevent auto-incremented primary keys from exceeding the maximum allowable value for the underlying column type. If a column is above 60% of the maximum allowable type, PlanetScale recommends changing the column to a larger type.
How does PlanetScale handle unused tables?
-If a table has not been queried for more than 4 weeks, PlanetScale will recommend dropping that unused table.
What is the significance of the 'p50' metric mentioned in the video?
-The 'p50' is the 50th percentile in a set of queries. It represents the time by which 50% of the queries were faster. It is used as a base metric to measure average query performance.
What is the relationship between PlanetScale and VesS (ViteSS)?
-PlanetScale is the lead maintainer and effective owner of VesS (ViteSS), which is a system built to scale MySQL databases more efficiently. PlanetScale maintains a fork of MySQL that works seamlessly with VesS and provides improved scalability.
Outlines
🚀 Planet Scale's Exciting New Feature: Schema Recommendations
The video discusses a new feature introduced by Planet Scale called Schema Recommendations, which automatically provides suggestions to improve database performance, reduce memory and storage usage, and optimize the schema based on production database traffic. It explains how Schema Recommendations work, utilizing query-level telemetry and insights from Planet Scale's monitoring tool to generate tailored recommendations in the form of DDL statements that can be directly applied to a database branch and deployed to production. The video showcases a real-life example of using Schema Recommendations on a production database, highlighting redundant indexes that can be removed.
🧩 Understanding Schema Recommendations in Depth
The video dives deeper into the schema recommendations feature, explaining how Planet Scale detects and generates recommendations. It covers various aspects, including adding indexes for inefficient queries, removing redundant indexes (exact duplicates and left prefix duplicates), preventing primary key ID exhaustion, and suggesting the removal of unused tables. The video also mentions Planet Scale's integration with Kafka and its fork of MySQL, which powers the schema analysis and recommendations. Additionally, it discusses an infamous case study involving database indexing issues that led to significant financial consequences, emphasizing the importance of proper indexing.
📊 Practical Examples and Implementation Details
The video provides a practical example of applying a new index recommendation, demonstrating how it can significantly improve query performance. It also touches upon Planet Scale's pricing model, which initially caused issues due to the lack of indexing but has since been resolved. The video further discusses the percentile metrics (such as p50 and p99) used to measure query performance and latency. The creator expresses excitement about the co-pilot-like experience that Schema Recommendations offers, enabling non-database experts to achieve expert-level database performance. The video concludes by addressing the availability of Planet Scale's hobby tier and plans for future tutorials.
🌟 Closing Thoughts and Reflections
In the final paragraph, the video creator shares their thoughts on the Schema Recommendations feature and Planet Scale in general. They express excitement about the project and appreciate Planet Scale's ability to identify and suggest improvements, particularly for those who are not SQL experts. The creator acknowledges the limitations of the hobby tier's regional availability and plans to provide alternative options for future tutorials. Overall, the video concludes on a positive note, highlighting the creator's satisfaction with Planet Scale's offerings and their intention to continue using and recommending the service.
Mindmap
Keywords
💡PlanetScale
💡Schema recommendations
💡Indexes
💡Branching
💡Insights
💡VitessIO
💡Primary key exhaustion
💡Unused tables
💡Query performance
💡Percentiles
Highlights
Planet Scale introduced a new feature called 'Schema Recommendations' that automatically provides recommendations to improve database performance, reduce memory and storage usage, and optimize the schema based on production database traffic.
Schema Recommendations use query-level telemetry to generate tailored recommendations in the form of DDL statements that can be applied directly to a database branch and then deployed to production.
Planet Scale's model allows creating a branch (identical clone of an existing database schema), making changes to the schema, and then deploying it using a pull request-like process. They also keep the old database around for 30 minutes after deployment, writing to both databases, enabling easy rollback if any issues arise.
The current open recommendations for a database can be viewed in the 'Insights' tab, and weekly reports are also sent via email.
The first recommendation shown for the demo database is to remove redundant indexes, which can slow down writes and consume additional storage and memory.
Planet Scale has built a system called 'Schema Adviser' that uses Kafka to trigger background jobs to examine the schema and make recommendations based on query performance and statistics.
Planet Scale is the lead maintainer of VesS, a system built to scale MySQL databases, and they have patched their fork of MySQL to enable better index recommendations.
The four types of schema recommendations currently supported are: adding indexes for inefficient queries, removing redundant indexes, preventing primary key ID exhaustion, and dropping unused tables.
The importance of database indexes is highlighted through an example where the lack of an index on a vendor ID column resulted in reading millions of unnecessary rows and incurring significant costs.
33% of Planet Scale databases have been found to have redundant indexes that could benefit from removal.
When a column's auto-increment primary key approaches 60% of the maximum allowable type value, a recommendation is given to change the underlying column to a larger type.
A walkthrough is provided demonstrating how to apply a recommendation to add a new index, resulting in significantly improved query performance.
The 'p50' metric refers to the 50th percentile, where 50% of requests were faster than the given value, providing a measure of average performance.
While not technically AI, the Schema Recommendations feature is described as a 'co-pilot for your database,' assisting users in optimizing their databases without requiring expert knowledge.
Planet Scale's goal is to enable users who are not database experts to have an expert-quality database experience through features like Schema Recommendations.
Transcripts
Planet scale just introduced a really
exciting new feature but before we go
any further I do want to say they pay me
sometimes they're not paying me for this
video I was not asked to make this video
but it does fall under our existing
contract and I'm sure they're going to
be pretty hyped about it they had no say
in anything I'm discussing in this video
I just wanted to react to this cuz I'm
actually genuinely excited so knowing
that let's take a look at schema
recommendations which is an actually
genuinely new idea I haven't seen others
do before automatically receive
recommendations to improve database
performance reduce memory and storage
and improve your schema based on
production database traffic also shout
out to Taylor and rer for writing this
Taylor in particular I've worked with
forever she's really good at what she
does for the last 2 years we've been
working on making Planet scale insights
the best built-in mySQL database
monitoring tool today we're releasing a
significant upgrade schema
recommendations with schema
recommendations you will automatically
receive recommendations to improve
database performance reduce memory and
storage and improve your schema based on
production database traffic schema
recommendations use Query level
Telemetry to generate tailored
recommendations in the form of ddl
statements that can be applied directly
to to a database branch and then
deployed to production this fits really
well within the existing Planet scale
model which if you're not familiar their
whole thing is to do stuff kind of the
same way that we do it in like GitHub
where you create a branch which is an
identical clone of an existing database
schema doesn't have the data inside of
but it's just the the shape of the
models then you make changes to the
schema and if all goes well you can then
put it up for review people can approve
it and then you deploy request similar
to poll request and merge that in and
now you have your new database schema
and initially this was cool by itself
but where they've pushed it even further
that's probably my favorite thing is
once you've made that deploy they keep
the old database around and they write
to both databases for 30 minutes so if
it turned out you made a mistake you can
revert without losing any data even the
data that was written in that time
mind-blowing stuff so let's read more
about this because I'm very curious how
to use schema recommendations to find
the schema recommendations for your
database go to the insights tab your
plan es scale database and click view
recommendations you'll see the current
open recommendations for your database
also if you're subscribed to your
database's weekly DB report you'll get
an email with your first recommendations
the CEO of Planet scale is actually in
chat unplanned let's go give it a shot
in the upload thing production database
Planet scale here we have the databases
for all of our core T3 stuff which is
Ping stuff names are confusing don't
worry about it here we have the upload
thing production database we go to the
insights tab we have recommendations and
we have three redundant indexes we have
an index for the key for API key on user
ID on the app and the app ID on file so
our key for managing deletions also has
the app ID key within it which isn't
something I'd really thought about
before for context on why we made this
decision we had made a separate key for
files that were deleted so it was easier
for us to only select files that were or
weren't marked through deletion when we
did size calculations but since this
index includes app ID the previous index
that is just app ID is no longer as
valuable as it used to be and this
recommends that we drop that index that
we no longer need the one slightly
annoying part of doing this this way is
that I have to go make a code change in
our code base to match the change that's
occurring here here's our actual
database schema written of course in
drizzle for this project and we can see
in here those indexes so we have app ID
idx file key idx external ID and deleted
since we have this deleted one we no
longer need the app ID one so if I was
to merge this change and then somebody
was to do another push this would break
and I would have to make sure that I've
also removed this from my code it's a
small thing and honestly the way I will
probably use this is rather than
applying the exact recommendation they
tell me to I'm going to use this as a
way to realize oh these are changes I
should make in my code base and then I
can go to my code I can delete this line
and then do a traditional deployer
request the way I normally do as an
Insight this is incredibly informative
and weirdly well written here yeah New
Branch I don't know if somebody on the
team already created this or if it was
created for us but yeah just like we
have branching in our code bases we have
branching here too very very good and
useful information I'm assuming the
other ones here user ID we also have
user ID plus tier as an index we no
longer need the one that's just the user
ID this all makes sense let's read a bit
more about what else this can do because
our our database is nice and hilariously
simple because that's how we like to
build but I'm curious how other people
are using this thing and what other
stuff it can recommend as here each
recommendation comes with an explanation
of the recommended changes the schema or
query that it will affect the exact ddl
that will apply the rec commendations as
well as the option to apply the
recommended changes to a branch for
testing into a safe migration you should
evaluate each recommendation based on
your specific use case read the schema
recommendations documentation for more
information on each recommendation
that's cool there's a whole
documentation page that describes in
detail all of the things that it can
make recommendations for and what you
should do and what you should know about
it adding indexes for inefficient
queries removing redundant indexes
preventing primary key ID exhaustion and
dropping on use tables really good stuff
A lot of people are missing these types
of things you live it's just key
deletions if we had more pressing things
like if we needed to add a key I would
absolutely do that but wasting a little
bit of data and paying you guys a little
bit more is the least of my concerns
honestly my immediate takeaway when I
saw this is I'm proud we're not missing
any indexes anymore because we were
missing indexes for a while so knowing
we're not is cool once you better
understand the recommendation you can
apply the recommendation by either
applying it directly with a database
Branch with a few clicks or making the
schema change directly in your
application orm code look they called it
out I can just make it in my own code
how Planet scale detects schema
recommendations in your database we've
built a system that we internally refer
to as the schema adviser can make schema
recommendations and understand when a
schema change closes an existing open
recommendation each time a production
branch of schema changes within Planet
scale an event is admitted to cafka this
triggers a background job to examine the
schema for potential
recommendations interesting more and
more people doing Kafka stuff recently
which is cool to see if any viewers
aren't familiar with Kafka already it is
an ancient Apache technology for for
managing events and getting messages to
and from things 80% of all for 100
companies are using it so uh does that
mean planet scills on the way to be
determined we can determine the schema
alone for some recommendations such as
finding duplicate indexes we also use
the database's recent query performance
and statistics for other recommendations
such as index recommendations this we've
already been relying on quite a bit not
necessarily the specific recommendations
but the feedback on the insights tab
where you have do we not have any
anomalies right now that's a nice change
usually we have some types of crazy ales
that have big enough spikes in
performance that we go and investigate
and figure out what's causing them so we
can look back to February 23rd and see
we have this anomaly here which is from
people uploading a bunch of files in a
burst and our calculation for storage
being used was not particularly great at
the time so we can see all of this
breakdown of what queries were taking
how much time we had 22 queries per
second seven rows are being written
every second and this caused an anomaly
which is really useful for us to dig
into and see the specific queries that
are causing these specific problems this
has been a lifesaver for us as we try to
debug more and more complex performance
related issues with our databases we
first identify potentially slow query
candidates for index suggestions using
the insight's query data we then use
vessis Query parser and semantic
analysis utilities to extract potential
indexable columns for the query when
adding indexes column order is
critically important to get that right
we patched our Fork of MySQL to create
another variant of the analyze table
update histogram command that allows us
to extract the cardinalities of each
column without impacting the databases
statistic table yes I went this far
without saying my SQL and I'm proud of
myself but it is important to know that
not only is planet skill using my squel
they are the lead maintainers and
effectively owners now of vess which is
a system built to scale your MySQL
databases much better big companies like
uber and slack and even GitHub and
YouTube itself have been using Vest for
a long time now to allow their MySQL
databases to scale to insane numbers of
users data consumers and all the other
things your database needs to serve but
that doesn't mean my SQL moves
particularly fast I think it's fair to
say anything in Oracle world is not
particularly fast moving so Planet scale
continuing to maintain their Fork that
works perfectly with the test is fully
my SQL compatible and is my SQL to have
these types of features that they need
in order to give us a good experience
that's dope it's a really cool balance
they found of existing standards modern
open source tooling and a groundbreaking
service and experience for users it is
actually really cool with all this
information combined we can make
recommendations on how to improve a
databases
schema supported schema recommendations
today we are launching with four
different schema Recs but we will add
more over time the first is adding
indexes for inefficient queries which
apparently we don't need we're on top of
our indexes now so cool point two is
that you can remove or done in indexes
which we saw we have a bunch of probably
go clean this up later another fun one
they've added is the ability to prevent
primary key ID exhaustion what does this
mean let's say you're using integer IDs
and you're possibly going to run out of
integer soon this will warn you and say
hey you probably shouldn't be using
indexes for that ID field any anymore
now we have the fourth thing it can do
which is telling you to drop unused
tables good old Bobby tables are going
to love that one I'm sure adding indexes
for inefficient queries indexes are
crucial for relational database
performance with no indexes or
suboptimal indexes MySQL may have to
scan a large number of rows to satisfy
queries that only match a few records oh
here it is spend 5K to learn how
database indexes work this is an article
I very very fondly remember I will say
this problem has long since been solved
as Planet scale has fundamentally
changed the pricing model this is
impossible to do at this point in time
but at the time pricing was based on how
many rows you read and wrote and they
didn't have indexes in their database
since Planet scales performance is nuts
it's able to read millions of rows
really quickly and still get you a
response but this comes with the problem
that now you're doing a ton of work that
they're billing you for it's just
because it happens fast doesn't mean you
meant to run a ton of stuff that you
didn't want to in this example they had
a pretty basic schema here the catch is
that vendor ID was not indexed it's just
a value they Ed to link things together
and since it wasn't an index and since
there's no foreign keys in vess there
kind of is now separate long
story you'll see this example where
you're selecting with vendor ID that
this thing has to read way more rows
than it's actually supposed to since
he's getting back only 100 rows he
assumed that it was going to be a $1.50
per 10 million rows read so reading 100
rows is fine but you also were
inspecting all of those rows to do the
lookup so every request that made this
query actually cost them 15 cents
because it was 1 million rows every time
you did it it was still fast but the
fact that you had to check a million
rows on every request uses a lot of
compute ended up costing them a lot of
money and every request ended up being
pretty expensive they ended up spending
about $1,000 a day they added this one
index which knocked it down a ton you
can see here the amount of row reads
they were getting plummeted immediately
thankfully Planet scale as mentioned in
chat immediately wrote off the expense
here didn't charge them anything and
they got down to $150 a month which is a
much more reasonable price than 5 grand
over a few days and since then the
author is still a very happy Planet
scale customer I think this was a great
story both showcased the flaws in the
existing pricing model as well as how
database indexes are important it was a
great article went viral this was
actually one of the first times I heard
about planet scale I had just started
playing with it at the time but seeing
this and the response to it really got
me to consider it more seriously so yeah
adding indexes for inefficient queries
is important and so much so that this
might have saved that person a very very
scary moment removing redundant indexes
while indexes can drastically improve
query performance having unnecessary
indexes slows down rights and consumes
additional storage and memory insights
scans your schema every time it is
changed to find redundant indexes we
suggest removing two types one is an
exact duplicate index where the index
has the exact same columns in the same
order and the second is a left prefix
duplicate index an index that has the
same columns in the same order as the
prefix of another index since you can
just use chunks of the index as you go
through it if two indexes have the same
left side one of them stops and the
other one goes fur further it matters a
lot less that you have that first one
you can use the second index and just
use the first two prefixes and read
things super quick redone indexes are
remarkably common our initial set of
recommendations found that 33% of Planet
scale databases have redone indexes that
they may benefit from removing yeah we
had three of them preventing primary key
ID exhaustion as new rows are inserted
it's possible for auto incremented
primary keys to exceed the maximum
allowable value for the underlying
column type as I mentioned before if
you're using IDs that are like an
integer and you have to many users or
too many things in that column you'll
run out of IDs now you're screwed if
insights detects that one column is
above 60% of the maximum allowable type
it'll recommend changing the underlying
column to a larger type and then
dropping unused tables pretty simple if
a table's not being used over a large
amount of time it will tell you to get
rid of it yeah if there's any tables
that are more than four weeks old and
haven't been queried in the last 4 weeks
good to know here's an example adding a
new index so walk through an example
applying a new recommendation will
create a simple post table sure we've
all seen basically this exact table
example projects selects so we have more
rows the post table a pattern
emerges the p50 time for a post title
increases linearly our queries are
taking nearly a second which is not
good since we're querying for title A
lot it can recognize that maybe we need
an index on title and make that
recommendation add new index ID post on
title on table posts exactly what we
were showing before just adds this index
to the table you click create and apply
and now instantaneously the amount of
latency and the amount of of effort it
takes to do each of these queries goes
down this is really really cool stuff I
know it's not technically AI but it's
the thing I'm excited about in that
direction this almost like co-pilot for
your database where once it's running
it's telling you hey maybe you should do
this hey maybe you should do this and as
Planet scale continues in its goal of
making it so people who aren't database
experts can have expert quality database
experiences this makes a ton of sense
and I am genuinely really hyped about
what they're shipping here quick ask to
the planet scalers in the chat is there
anything important that I missed before
I wrap up what is p50 the p50 is
percentile is what the P stands for
thank you as well in a set of queries in
this example where you have 100 queries
maybe 10 of them were instant like 3
milliseconds and 10 of them were really
slow like 3 seconds p50 would be the 50%
TI Mark so what was the average speed at
that point so 50% or more requests were
faster than this so P95 is 95% of
requests were this FAS or faster 99 is
this point where this faster faster it's
a measurement for like the worst case of
things so p50 is a pretty base low
average it should be really fast the
much higher up ones like the 99
percentile is like this is all of our
queries are falling within this range
yeah I'm also sad the hobby tier of
Planet scale isn't as globally available
anymore that was sad news I understand
why but I was not happy to see it and I
definitely am planning around that for
future tutorials and things I've already
gotten permission from planet scale for
all of my future tutorials that use
Planet scale to also have a path for
people that want to use something that's
free in their region either through
another service or through just locally
hosting sqlite or something so I'm
accounting for that we're working on it
getting rid of scaler yes and no the
thing with scaler is scaler was the same
metal as the cheapest scaler Pro Plan
and when I was on scaler I was hitting
CPU limitations more than I was hitting
number of read limitations so yeah as
bad as I am at SQL Planet scale is
making me feel much better at it at the
very least they're telling me when I'm
doing things egregiously wrong and I
certainly need that so I can focus on
the things I love which are UI
JavaScript full stack and making YouTube
videos let me know what you guys think
though cuz this is a really exciting
project thank you as always see you guys
in the next one peace nerds
浏览更多相关视频
Part2 : Database Testing | Environment Setup
Part3 : Database Testing | How To Test Schema of Database Table | Test Cases
Serverless API with Cloudflare Workers (Hono, D1 & Drizzle ORM)
Differences between Oracle Autonomous Databases ATP & ADW
What is a Database?
Designing an ER Diagram | SQL | Tutorial 22
5.0 / 5 (0 votes)