I Suck At SQL, Now My DB Tells Me How To Fix It

Theo - t3․gg
5 Mar 202415:40

Summary

TLDRThis script showcases an enthusiastic developer's reaction to PlanetScale's newly introduced 'Schema Recommendations' feature. The developer highlights how this innovative tool automatically suggests ways to enhance database performance, reduce memory and storage consumption, and optimize schema design based on real-time production data analysis. With genuine excitement, they explore the feature's capabilities, testing it on their own database and explaining its inner workings. The script effectively conveys the developer's amazement at PlanetScale's groundbreaking approach, seamlessly integrating database management into the development workflow while empowering even non-experts with expert-level database experiences.

Takeaways

  • ✨ PlanetScale introduced a new feature called 'Schema Recommendations' that automatically suggests schema improvements based on production database traffic to optimize performance, reduce memory/storage, and enhance the schema.
  • 🔑 Schema Recommendations can suggest adding indexes for inefficient queries, removing redundant indexes, preventing primary key ID exhaustion, and dropping unused tables.
  • 🌐 PlanetScale uses Kafka to process schema changes and trigger background jobs to examine the schema and query performance for potential recommendations.
  • 🔍 The recommendations are based on query-level telemetry and analysis of column cardinalities, leveraging tools like VesSQL's query parser and MySQL's histogram analysis.
  • 🛠️ Recommendations can be applied directly to a database branch for testing and safe migration, following PlanetScale's Git-like branching model for schema changes.
  • 📈 An example showcased how adding an index based on a recommendation significantly improved query performance from nearly a second to instantaneous.
  • 💰 PlanetScale's pricing model no longer charges based on rows read/written, addressing a previous issue where inefficient queries led to high costs.
  • 🌐 The hobby tier of PlanetScale is no longer globally available, prompting the need for alternative free options in certain regions for future tutorials.
  • 🤖 While not technically AI, Schema Recommendations acts as a co-pilot for databases, guiding users towards optimized schemas and performance.
  • 🎯 PlanetScale aims to provide an expert-level database experience for non-experts through features like Schema Recommendations.

Q & A

  • What is the new feature introduced by PlanetScale that the video is discussing?

    -The new feature introduced by PlanetScale is called 'Schema Recommendations'. It automatically provides recommendations to improve database performance, reduce memory and storage usage, and optimize the schema based on the production database traffic.

  • How does PlanetScale generate schema recommendations?

    -PlanetScale uses a system called the 'Schema Adviser' which analyzes the schema and recent query performance statistics to generate tailored recommendations. It employs techniques like Query parsing, semantic analysis, and column cardinality extraction to identify inefficient queries and redundant indexes.

  • What are the different types of schema recommendations supported by PlanetScale?

    -The four types of schema recommendations supported initially are: 1) Adding indexes for inefficient queries, 2) Removing redundant indexes, 3) Preventing primary key ID exhaustion, and 4) Dropping unused tables.

  • Why are indexes crucial for relational database performance?

    -Indexes are crucial for relational database performance because without optimal indexes, the database may need to scan a large number of rows to satisfy queries that only match a few records, leading to performance issues.

  • What is the significance of the example discussed in the video where a lack of indexing led to high costs?

    -The example illustrates the importance of proper indexing. In the example, a missing index on a 'vendor ID' column caused the database to read millions of rows for each query, leading to high costs of around $1,000 per day, even though the queries were fast. Adding the appropriate index drastically reduced the costs.

  • How does PlanetScale handle redundant indexes?

    -PlanetScale scans the schema for redundant indexes every time it is changed. It suggests removing two types of redundant indexes: 1) Exact duplicate indexes, and 2) Left prefix duplicate indexes, where one index contains the same columns as the prefix of another index.

  • What is the purpose of the 'Preventing primary key ID exhaustion' recommendation?

    -This recommendation aims to prevent auto-incremented primary keys from exceeding the maximum allowable value for the underlying column type. If a column is above 60% of the maximum allowable type, PlanetScale recommends changing the column to a larger type.

  • How does PlanetScale handle unused tables?

    -If a table has not been queried for more than 4 weeks, PlanetScale will recommend dropping that unused table.

  • What is the significance of the 'p50' metric mentioned in the video?

    -The 'p50' is the 50th percentile in a set of queries. It represents the time by which 50% of the queries were faster. It is used as a base metric to measure average query performance.

  • What is the relationship between PlanetScale and VesS (ViteSS)?

    -PlanetScale is the lead maintainer and effective owner of VesS (ViteSS), which is a system built to scale MySQL databases more efficiently. PlanetScale maintains a fork of MySQL that works seamlessly with VesS and provides improved scalability.

Outlines

00:00

🚀 Planet Scale's Exciting New Feature: Schema Recommendations

The video discusses a new feature introduced by Planet Scale called Schema Recommendations, which automatically provides suggestions to improve database performance, reduce memory and storage usage, and optimize the schema based on production database traffic. It explains how Schema Recommendations work, utilizing query-level telemetry and insights from Planet Scale's monitoring tool to generate tailored recommendations in the form of DDL statements that can be directly applied to a database branch and deployed to production. The video showcases a real-life example of using Schema Recommendations on a production database, highlighting redundant indexes that can be removed.

05:01

🧩 Understanding Schema Recommendations in Depth

The video dives deeper into the schema recommendations feature, explaining how Planet Scale detects and generates recommendations. It covers various aspects, including adding indexes for inefficient queries, removing redundant indexes (exact duplicates and left prefix duplicates), preventing primary key ID exhaustion, and suggesting the removal of unused tables. The video also mentions Planet Scale's integration with Kafka and its fork of MySQL, which powers the schema analysis and recommendations. Additionally, it discusses an infamous case study involving database indexing issues that led to significant financial consequences, emphasizing the importance of proper indexing.

10:02

📊 Practical Examples and Implementation Details

The video provides a practical example of applying a new index recommendation, demonstrating how it can significantly improve query performance. It also touches upon Planet Scale's pricing model, which initially caused issues due to the lack of indexing but has since been resolved. The video further discusses the percentile metrics (such as p50 and p99) used to measure query performance and latency. The creator expresses excitement about the co-pilot-like experience that Schema Recommendations offers, enabling non-database experts to achieve expert-level database performance. The video concludes by addressing the availability of Planet Scale's hobby tier and plans for future tutorials.

15:02

🌟 Closing Thoughts and Reflections

In the final paragraph, the video creator shares their thoughts on the Schema Recommendations feature and Planet Scale in general. They express excitement about the project and appreciate Planet Scale's ability to identify and suggest improvements, particularly for those who are not SQL experts. The creator acknowledges the limitations of the hobby tier's regional availability and plans to provide alternative options for future tutorials. Overall, the video concludes on a positive note, highlighting the creator's satisfaction with Planet Scale's offerings and their intention to continue using and recommending the service.

Mindmap

Keywords

💡PlanetScale

PlanetScale is a database hosting service that provides a managed MySQL database solution. It is a key topic throughout the video as the speaker discusses a new feature called 'schema recommendations' introduced by PlanetScale. PlanetScale is presented as a service that aims to make database management easier for developers, even those without extensive database expertise.

💡Schema recommendations

Schema recommendations is a new feature introduced by PlanetScale that automatically provides recommendations to improve database performance, reduce memory and storage usage, and optimize the database schema based on the production database traffic. This feature analyzes the database schema and query patterns to suggest changes like adding indexes, removing redundant indexes, preventing primary key exhaustion, and dropping unused tables. The speaker expresses genuine excitement about this feature and its potential benefits.

💡Indexes

Indexes are data structures in databases that are used to improve the speed of data retrieval operations. The video discusses how PlanetScale's schema recommendations can suggest adding indexes for inefficient queries or removing redundant indexes. The speaker shares an example where the lack of an index on a frequently queried column led to a significant increase in costs due to the need to scan a large number of rows for each query. Adding the recommended index dramatically reduced the query latency and costs.

💡Branching

Branching is a concept borrowed from version control systems like Git, where PlanetScale allows users to create a separate branch of their database schema. This branch serves as an identical clone of the existing database schema, without the data. Users can make changes to the schema in the branch, review them, and then deploy the changes to the production database, similar to merging a pull request in Git. The video highlights this feature as part of PlanetScale's workflow for managing schema changes.

💡Insights

Insights is a built-in monitoring tool provided by PlanetScale that helps users understand and analyze the performance of their databases. The speaker mentions that they have been relying on Insights to identify and investigate performance anomalies caused by inefficient queries or other issues. Insights provides detailed information about query performance, anomalies, and other metrics, which can be used in conjunction with the schema recommendations to optimize the database.

💡VitessIO

VitessIO, also known as Vitess, is an open-source project maintained by PlanetScale that is designed to scale MySQL databases. It is mentioned in the video that PlanetScale has forked and patched Vitess to create a variant of the 'ANALYZE TABLE UPDATE HISTOGRAM' command, which allows them to extract column cardinalities without impacting the database's statistics table. This is used in the process of generating schema recommendations, particularly for index suggestions.

💡Primary key exhaustion

Primary key exhaustion is a situation where a database table runs out of available unique values for its primary key column, typically when using auto-incrementing integer values as primary keys. The video mentions that PlanetScale's schema recommendations can detect when a primary key column is approaching its maximum allowable value and suggest changing the column type to a larger type to prevent exhaustion. This helps avoid potential issues and downtime caused by running out of primary key values.

💡Unused tables

Unused tables refer to database tables that are not being actively queried or accessed for an extended period of time. The schema recommendations feature can identify such tables and suggest dropping them from the database schema. Removing unused tables can help reduce storage consumption and potentially improve overall database performance by eliminating unnecessary overhead.

💡Query performance

Query performance refers to the efficiency and speed at which database queries are executed. Throughout the video, the speaker discusses how PlanetScale's schema recommendations aim to improve query performance by suggesting optimizations like adding indexes or removing redundant structures. The example provided illustrates how adding an index on a frequently queried column dramatically improved the query performance, reducing latency from nearly a second to near-instantaneous.

💡Percentiles

Percentiles, such as P50 and P99, are used to measure and report on query performance in databases. The P50 value represents the point at which 50% of queries are faster or slower, providing an average or median measure of query speed. Higher percentiles like P99 indicate the performance threshold that encompasses 99% of queries, helping identify potential outliers or worst-case scenarios. The speaker mentions these metrics in the context of analyzing query performance data used by PlanetScale to generate schema recommendations.

Highlights

Planet Scale introduced a new feature called 'Schema Recommendations' that automatically provides recommendations to improve database performance, reduce memory and storage usage, and optimize the schema based on production database traffic.

Schema Recommendations use query-level telemetry to generate tailored recommendations in the form of DDL statements that can be applied directly to a database branch and then deployed to production.

Planet Scale's model allows creating a branch (identical clone of an existing database schema), making changes to the schema, and then deploying it using a pull request-like process. They also keep the old database around for 30 minutes after deployment, writing to both databases, enabling easy rollback if any issues arise.

The current open recommendations for a database can be viewed in the 'Insights' tab, and weekly reports are also sent via email.

The first recommendation shown for the demo database is to remove redundant indexes, which can slow down writes and consume additional storage and memory.

Planet Scale has built a system called 'Schema Adviser' that uses Kafka to trigger background jobs to examine the schema and make recommendations based on query performance and statistics.

Planet Scale is the lead maintainer of VesS, a system built to scale MySQL databases, and they have patched their fork of MySQL to enable better index recommendations.

The four types of schema recommendations currently supported are: adding indexes for inefficient queries, removing redundant indexes, preventing primary key ID exhaustion, and dropping unused tables.

The importance of database indexes is highlighted through an example where the lack of an index on a vendor ID column resulted in reading millions of unnecessary rows and incurring significant costs.

33% of Planet Scale databases have been found to have redundant indexes that could benefit from removal.

When a column's auto-increment primary key approaches 60% of the maximum allowable type value, a recommendation is given to change the underlying column to a larger type.

A walkthrough is provided demonstrating how to apply a recommendation to add a new index, resulting in significantly improved query performance.

The 'p50' metric refers to the 50th percentile, where 50% of requests were faster than the given value, providing a measure of average performance.

While not technically AI, the Schema Recommendations feature is described as a 'co-pilot for your database,' assisting users in optimizing their databases without requiring expert knowledge.

Planet Scale's goal is to enable users who are not database experts to have an expert-quality database experience through features like Schema Recommendations.

Transcripts

play00:00

Planet scale just introduced a really

play00:01

exciting new feature but before we go

play00:02

any further I do want to say they pay me

play00:05

sometimes they're not paying me for this

play00:06

video I was not asked to make this video

play00:08

but it does fall under our existing

play00:09

contract and I'm sure they're going to

play00:10

be pretty hyped about it they had no say

play00:12

in anything I'm discussing in this video

play00:14

I just wanted to react to this cuz I'm

play00:15

actually genuinely excited so knowing

play00:17

that let's take a look at schema

play00:19

recommendations which is an actually

play00:21

genuinely new idea I haven't seen others

play00:23

do before automatically receive

play00:25

recommendations to improve database

play00:26

performance reduce memory and storage

play00:28

and improve your schema based on

play00:30

production database traffic also shout

play00:31

out to Taylor and rer for writing this

play00:34

Taylor in particular I've worked with

play00:35

forever she's really good at what she

play00:36

does for the last 2 years we've been

play00:38

working on making Planet scale insights

play00:39

the best built-in mySQL database

play00:41

monitoring tool today we're releasing a

play00:43

significant upgrade schema

play00:44

recommendations with schema

play00:46

recommendations you will automatically

play00:47

receive recommendations to improve

play00:49

database performance reduce memory and

play00:50

storage and improve your schema based on

play00:52

production database traffic schema

play00:53

recommendations use Query level

play00:55

Telemetry to generate tailored

play00:56

recommendations in the form of ddl

play00:58

statements that can be applied directly

play00:59

to to a database branch and then

play01:01

deployed to production this fits really

play01:03

well within the existing Planet scale

play01:04

model which if you're not familiar their

play01:06

whole thing is to do stuff kind of the

play01:07

same way that we do it in like GitHub

play01:10

where you create a branch which is an

play01:12

identical clone of an existing database

play01:14

schema doesn't have the data inside of

play01:15

but it's just the the shape of the

play01:17

models then you make changes to the

play01:18

schema and if all goes well you can then

play01:21

put it up for review people can approve

play01:23

it and then you deploy request similar

play01:25

to poll request and merge that in and

play01:27

now you have your new database schema

play01:29

and initially this was cool by itself

play01:31

but where they've pushed it even further

play01:32

that's probably my favorite thing is

play01:34

once you've made that deploy they keep

play01:36

the old database around and they write

play01:38

to both databases for 30 minutes so if

play01:40

it turned out you made a mistake you can

play01:41

revert without losing any data even the

play01:44

data that was written in that time

play01:46

mind-blowing stuff so let's read more

play01:48

about this because I'm very curious how

play01:50

to use schema recommendations to find

play01:52

the schema recommendations for your

play01:53

database go to the insights tab your

play01:54

plan es scale database and click view

play01:56

recommendations you'll see the current

play01:57

open recommendations for your database

play01:59

also if you're subscribed to your

play02:00

database's weekly DB report you'll get

play02:02

an email with your first recommendations

play02:04

the CEO of Planet scale is actually in

play02:06

chat unplanned let's go give it a shot

play02:08

in the upload thing production database

play02:10

Planet scale here we have the databases

play02:12

for all of our core T3 stuff which is

play02:15

Ping stuff names are confusing don't

play02:17

worry about it here we have the upload

play02:18

thing production database we go to the

play02:20

insights tab we have recommendations and

play02:22

we have three redundant indexes we have

play02:24

an index for the key for API key on user

play02:27

ID on the app and the app ID on file so

play02:31

our key for managing deletions also has

play02:34

the app ID key within it which isn't

play02:36

something I'd really thought about

play02:37

before for context on why we made this

play02:38

decision we had made a separate key for

play02:41

files that were deleted so it was easier

play02:42

for us to only select files that were or

play02:45

weren't marked through deletion when we

play02:46

did size calculations but since this

play02:48

index includes app ID the previous index

play02:51

that is just app ID is no longer as

play02:54

valuable as it used to be and this

play02:55

recommends that we drop that index that

play02:58

we no longer need the one slightly

play03:00

annoying part of doing this this way is

play03:02

that I have to go make a code change in

play03:04

our code base to match the change that's

play03:06

occurring here here's our actual

play03:07

database schema written of course in

play03:09

drizzle for this project and we can see

play03:12

in here those indexes so we have app ID

play03:15

idx file key idx external ID and deleted

play03:19

since we have this deleted one we no

play03:21

longer need the app ID one so if I was

play03:23

to merge this change and then somebody

play03:25

was to do another push this would break

play03:28

and I would have to make sure that I've

play03:30

also removed this from my code it's a

play03:32

small thing and honestly the way I will

play03:34

probably use this is rather than

play03:35

applying the exact recommendation they

play03:37

tell me to I'm going to use this as a

play03:39

way to realize oh these are changes I

play03:41

should make in my code base and then I

play03:43

can go to my code I can delete this line

play03:45

and then do a traditional deployer

play03:46

request the way I normally do as an

play03:48

Insight this is incredibly informative

play03:50

and weirdly well written here yeah New

play03:53

Branch I don't know if somebody on the

play03:54

team already created this or if it was

play03:55

created for us but yeah just like we

play03:57

have branching in our code bases we have

play03:59

branching here too very very good and

play04:02

useful information I'm assuming the

play04:04

other ones here user ID we also have

play04:07

user ID plus tier as an index we no

play04:09

longer need the one that's just the user

play04:10

ID this all makes sense let's read a bit

play04:12

more about what else this can do because

play04:14

our our database is nice and hilariously

play04:16

simple because that's how we like to

play04:18

build but I'm curious how other people

play04:19

are using this thing and what other

play04:21

stuff it can recommend as here each

play04:23

recommendation comes with an explanation

play04:25

of the recommended changes the schema or

play04:26

query that it will affect the exact ddl

play04:28

that will apply the rec commendations as

play04:30

well as the option to apply the

play04:31

recommended changes to a branch for

play04:33

testing into a safe migration you should

play04:34

evaluate each recommendation based on

play04:36

your specific use case read the schema

play04:38

recommendations documentation for more

play04:39

information on each recommendation

play04:41

that's cool there's a whole

play04:42

documentation page that describes in

play04:44

detail all of the things that it can

play04:46

make recommendations for and what you

play04:49

should do and what you should know about

play04:50

it adding indexes for inefficient

play04:52

queries removing redundant indexes

play04:54

preventing primary key ID exhaustion and

play04:55

dropping on use tables really good stuff

play04:57

A lot of people are missing these types

play04:59

of things you live it's just key

play05:01

deletions if we had more pressing things

play05:03

like if we needed to add a key I would

play05:05

absolutely do that but wasting a little

play05:07

bit of data and paying you guys a little

play05:08

bit more is the least of my concerns

play05:10

honestly my immediate takeaway when I

play05:12

saw this is I'm proud we're not missing

play05:13

any indexes anymore because we were

play05:15

missing indexes for a while so knowing

play05:18

we're not is cool once you better

play05:20

understand the recommendation you can

play05:21

apply the recommendation by either

play05:23

applying it directly with a database

play05:24

Branch with a few clicks or making the

play05:26

schema change directly in your

play05:27

application orm code look they called it

play05:28

out I can just make it in my own code

play05:30

how Planet scale detects schema

play05:32

recommendations in your database we've

play05:34

built a system that we internally refer

play05:36

to as the schema adviser can make schema

play05:38

recommendations and understand when a

play05:39

schema change closes an existing open

play05:41

recommendation each time a production

play05:42

branch of schema changes within Planet

play05:44

scale an event is admitted to cafka this

play05:46

triggers a background job to examine the

play05:47

schema for potential

play05:49

recommendations interesting more and

play05:51

more people doing Kafka stuff recently

play05:53

which is cool to see if any viewers

play05:55

aren't familiar with Kafka already it is

play05:57

an ancient Apache technology for for

play05:59

managing events and getting messages to

play06:02

and from things 80% of all for 100

play06:04

companies are using it so uh does that

play06:06

mean planet scills on the way to be

play06:07

determined we can determine the schema

play06:09

alone for some recommendations such as

play06:11

finding duplicate indexes we also use

play06:13

the database's recent query performance

play06:14

and statistics for other recommendations

play06:16

such as index recommendations this we've

play06:18

already been relying on quite a bit not

play06:20

necessarily the specific recommendations

play06:22

but the feedback on the insights tab

play06:24

where you have do we not have any

play06:25

anomalies right now that's a nice change

play06:27

usually we have some types of crazy ales

play06:29

that have big enough spikes in

play06:31

performance that we go and investigate

play06:32

and figure out what's causing them so we

play06:33

can look back to February 23rd and see

play06:35

we have this anomaly here which is from

play06:39

people uploading a bunch of files in a

play06:41

burst and our calculation for storage

play06:43

being used was not particularly great at

play06:45

the time so we can see all of this

play06:48

breakdown of what queries were taking

play06:50

how much time we had 22 queries per

play06:52

second seven rows are being written

play06:54

every second and this caused an anomaly

play06:56

which is really useful for us to dig

play06:58

into and see the specific queries that

play07:00

are causing these specific problems this

play07:02

has been a lifesaver for us as we try to

play07:04

debug more and more complex performance

play07:06

related issues with our databases we

play07:07

first identify potentially slow query

play07:09

candidates for index suggestions using

play07:10

the insight's query data we then use

play07:12

vessis Query parser and semantic

play07:14

analysis utilities to extract potential

play07:16

indexable columns for the query when

play07:18

adding indexes column order is

play07:19

critically important to get that right

play07:21

we patched our Fork of MySQL to create

play07:23

another variant of the analyze table

play07:25

update histogram command that allows us

play07:27

to extract the cardinalities of each

play07:28

column without impacting the databases

play07:30

statistic table yes I went this far

play07:32

without saying my SQL and I'm proud of

play07:34

myself but it is important to know that

play07:35

not only is planet skill using my squel

play07:37

they are the lead maintainers and

play07:39

effectively owners now of vess which is

play07:41

a system built to scale your MySQL

play07:43

databases much better big companies like

play07:46

uber and slack and even GitHub and

play07:48

YouTube itself have been using Vest for

play07:50

a long time now to allow their MySQL

play07:52

databases to scale to insane numbers of

play07:55

users data consumers and all the other

play07:57

things your database needs to serve but

play07:58

that doesn't mean my SQL moves

play07:59

particularly fast I think it's fair to

play08:01

say anything in Oracle world is not

play08:04

particularly fast moving so Planet scale

play08:06

continuing to maintain their Fork that

play08:08

works perfectly with the test is fully

play08:10

my SQL compatible and is my SQL to have

play08:12

these types of features that they need

play08:14

in order to give us a good experience

play08:15

that's dope it's a really cool balance

play08:17

they found of existing standards modern

play08:19

open source tooling and a groundbreaking

play08:22

service and experience for users it is

play08:24

actually really cool with all this

play08:26

information combined we can make

play08:27

recommendations on how to improve a

play08:29

databases

play08:30

schema supported schema recommendations

play08:33

today we are launching with four

play08:34

different schema Recs but we will add

play08:36

more over time the first is adding

play08:38

indexes for inefficient queries which

play08:39

apparently we don't need we're on top of

play08:41

our indexes now so cool point two is

play08:43

that you can remove or done in indexes

play08:44

which we saw we have a bunch of probably

play08:46

go clean this up later another fun one

play08:47

they've added is the ability to prevent

play08:49

primary key ID exhaustion what does this

play08:51

mean let's say you're using integer IDs

play08:53

and you're possibly going to run out of

play08:55

integer soon this will warn you and say

play08:57

hey you probably shouldn't be using

play08:57

indexes for that ID field any anymore

play08:59

now we have the fourth thing it can do

play09:00

which is telling you to drop unused

play09:02

tables good old Bobby tables are going

play09:03

to love that one I'm sure adding indexes

play09:05

for inefficient queries indexes are

play09:07

crucial for relational database

play09:08

performance with no indexes or

play09:10

suboptimal indexes MySQL may have to

play09:12

scan a large number of rows to satisfy

play09:14

queries that only match a few records oh

play09:16

here it is spend 5K to learn how

play09:17

database indexes work this is an article

play09:19

I very very fondly remember I will say

play09:22

this problem has long since been solved

play09:23

as Planet scale has fundamentally

play09:25

changed the pricing model this is

play09:26

impossible to do at this point in time

play09:28

but at the time pricing was based on how

play09:30

many rows you read and wrote and they

play09:32

didn't have indexes in their database

play09:34

since Planet scales performance is nuts

play09:36

it's able to read millions of rows

play09:38

really quickly and still get you a

play09:40

response but this comes with the problem

play09:42

that now you're doing a ton of work that

play09:44

they're billing you for it's just

play09:45

because it happens fast doesn't mean you

play09:48

meant to run a ton of stuff that you

play09:50

didn't want to in this example they had

play09:52

a pretty basic schema here the catch is

play09:54

that vendor ID was not indexed it's just

play09:57

a value they Ed to link things together

play09:59

and since it wasn't an index and since

play10:01

there's no foreign keys in vess there

play10:04

kind of is now separate long

play10:06

story you'll see this example where

play10:08

you're selecting with vendor ID that

play10:10

this thing has to read way more rows

play10:13

than it's actually supposed to since

play10:14

he's getting back only 100 rows he

play10:16

assumed that it was going to be a $1.50

play10:19

per 10 million rows read so reading 100

play10:21

rows is fine but you also were

play10:23

inspecting all of those rows to do the

play10:25

lookup so every request that made this

play10:26

query actually cost them 15 cents

play10:28

because it was 1 million rows every time

play10:30

you did it it was still fast but the

play10:32

fact that you had to check a million

play10:33

rows on every request uses a lot of

play10:35

compute ended up costing them a lot of

play10:36

money and every request ended up being

play10:38

pretty expensive they ended up spending

play10:39

about $1,000 a day they added this one

play10:42

index which knocked it down a ton you

play10:45

can see here the amount of row reads

play10:47

they were getting plummeted immediately

play10:50

thankfully Planet scale as mentioned in

play10:51

chat immediately wrote off the expense

play10:54

here didn't charge them anything and

play10:56

they got down to $150 a month which is a

play10:58

much more reasonable price than 5 grand

play11:00

over a few days and since then the

play11:02

author is still a very happy Planet

play11:03

scale customer I think this was a great

play11:05

story both showcased the flaws in the

play11:07

existing pricing model as well as how

play11:09

database indexes are important it was a

play11:11

great article went viral this was

play11:12

actually one of the first times I heard

play11:13

about planet scale I had just started

play11:15

playing with it at the time but seeing

play11:16

this and the response to it really got

play11:18

me to consider it more seriously so yeah

play11:20

adding indexes for inefficient queries

play11:22

is important and so much so that this

play11:24

might have saved that person a very very

play11:26

scary moment removing redundant indexes

play11:29

while indexes can drastically improve

play11:31

query performance having unnecessary

play11:32

indexes slows down rights and consumes

play11:34

additional storage and memory insights

play11:36

scans your schema every time it is

play11:38

changed to find redundant indexes we

play11:39

suggest removing two types one is an

play11:41

exact duplicate index where the index

play11:43

has the exact same columns in the same

play11:44

order and the second is a left prefix

play11:46

duplicate index an index that has the

play11:47

same columns in the same order as the

play11:49

prefix of another index since you can

play11:51

just use chunks of the index as you go

play11:53

through it if two indexes have the same

play11:56

left side one of them stops and the

play11:58

other one goes fur further it matters a

play12:00

lot less that you have that first one

play12:01

you can use the second index and just

play12:03

use the first two prefixes and read

play12:04

things super quick redone indexes are

play12:07

remarkably common our initial set of

play12:08

recommendations found that 33% of Planet

play12:11

scale databases have redone indexes that

play12:13

they may benefit from removing yeah we

play12:15

had three of them preventing primary key

play12:17

ID exhaustion as new rows are inserted

play12:19

it's possible for auto incremented

play12:21

primary keys to exceed the maximum

play12:22

allowable value for the underlying

play12:24

column type as I mentioned before if

play12:26

you're using IDs that are like an

play12:27

integer and you have to many users or

play12:29

too many things in that column you'll

play12:30

run out of IDs now you're screwed if

play12:32

insights detects that one column is

play12:34

above 60% of the maximum allowable type

play12:37

it'll recommend changing the underlying

play12:39

column to a larger type and then

play12:41

dropping unused tables pretty simple if

play12:42

a table's not being used over a large

play12:44

amount of time it will tell you to get

play12:46

rid of it yeah if there's any tables

play12:47

that are more than four weeks old and

play12:48

haven't been queried in the last 4 weeks

play12:50

good to know here's an example adding a

play12:52

new index so walk through an example

play12:54

applying a new recommendation will

play12:55

create a simple post table sure we've

play12:57

all seen basically this exact table

play12:59

example projects selects so we have more

play13:01

rows the post table a pattern

play13:03

emerges the p50 time for a post title

play13:06

increases linearly our queries are

play13:08

taking nearly a second which is not

play13:10

good since we're querying for title A

play13:13

lot it can recognize that maybe we need

play13:15

an index on title and make that

play13:16

recommendation add new index ID post on

play13:19

title on table posts exactly what we

play13:21

were showing before just adds this index

play13:22

to the table you click create and apply

play13:25

and now instantaneously the amount of

play13:27

latency and the amount of of effort it

play13:29

takes to do each of these queries goes

play13:30

down this is really really cool stuff I

play13:34

know it's not technically AI but it's

play13:36

the thing I'm excited about in that

play13:37

direction this almost like co-pilot for

play13:39

your database where once it's running

play13:40

it's telling you hey maybe you should do

play13:42

this hey maybe you should do this and as

play13:44

Planet scale continues in its goal of

play13:45

making it so people who aren't database

play13:47

experts can have expert quality database

play13:50

experiences this makes a ton of sense

play13:52

and I am genuinely really hyped about

play13:54

what they're shipping here quick ask to

play13:55

the planet scalers in the chat is there

play13:57

anything important that I missed before

play13:59

I wrap up what is p50 the p50 is

play14:02

percentile is what the P stands for

play14:03

thank you as well in a set of queries in

play14:06

this example where you have 100 queries

play14:08

maybe 10 of them were instant like 3

play14:10

milliseconds and 10 of them were really

play14:12

slow like 3 seconds p50 would be the 50%

play14:16

TI Mark so what was the average speed at

play14:19

that point so 50% or more requests were

play14:21

faster than this so P95 is 95% of

play14:24

requests were this FAS or faster 99 is

play14:27

this point where this faster faster it's

play14:29

a measurement for like the worst case of

play14:32

things so p50 is a pretty base low

play14:34

average it should be really fast the

play14:36

much higher up ones like the 99

play14:39

percentile is like this is all of our

play14:41

queries are falling within this range

play14:42

yeah I'm also sad the hobby tier of

play14:44

Planet scale isn't as globally available

play14:46

anymore that was sad news I understand

play14:48

why but I was not happy to see it and I

play14:50

definitely am planning around that for

play14:52

future tutorials and things I've already

play14:53

gotten permission from planet scale for

play14:55

all of my future tutorials that use

play14:56

Planet scale to also have a path for

play14:59

people that want to use something that's

play15:00

free in their region either through

play15:02

another service or through just locally

play15:03

hosting sqlite or something so I'm

play15:05

accounting for that we're working on it

play15:07

getting rid of scaler yes and no the

play15:09

thing with scaler is scaler was the same

play15:12

metal as the cheapest scaler Pro Plan

play15:15

and when I was on scaler I was hitting

play15:17

CPU limitations more than I was hitting

play15:19

number of read limitations so yeah as

play15:22

bad as I am at SQL Planet scale is

play15:24

making me feel much better at it at the

play15:26

very least they're telling me when I'm

play15:27

doing things egregiously wrong and I

play15:29

certainly need that so I can focus on

play15:30

the things I love which are UI

play15:32

JavaScript full stack and making YouTube

play15:34

videos let me know what you guys think

play15:35

though cuz this is a really exciting

play15:36

project thank you as always see you guys

play15:37

in the next one peace nerds

Rate This
★
★
★
★
★

5.0 / 5 (0 votes)

Related Tags
Database OptimizationPlanetScaleMySQLPerformance TuningSchema RecommendationsQuery AnalysisDeveloper ToolsTechnical TutorialSoftware EngineeringInformational Video