Things every developer absolutely, positively needs to know about database indexing - Kai Sassnowski

Laracon EU
25 Jan 201941:49

Summary

TLDRIn this insightful talk, the speaker, Keselowski, enlightens developers on the intricacies of database indexing, emphasizing its importance for application performance. He clarifies misconceptions, delves into data structures like B-trees, and discusses the impact of functions and inequality operators on index usability. Through a live coding session, he illustrates common pitfalls, such as the inefficiency of full table scans and the importance of index order in multi-column indexes. The talk underscores that indexing is a nuanced, developer-centric task, crucial for optimizing query performance.

Takeaways

  • 🗣️ The speaker emphasizes the importance of database indexing for developers, highlighting that it's a crucial aspect of ensuring application performance.
  • 🔍 Indexes are primarily used to improve read performance in databases, which can significantly enhance the speed of queries and overall application responsiveness.
  • 📚 The analogy of a phone book is used to explain what an index is, illustrating how an ordered representation of data can expedite searches.
  • 🌳 The B-tree is introduced as the underlying data structure for database indexes, with a focus on its balanced nature to ensure efficient searching.
  • 🔑 The index only contains the values of the columns it's created on, and uses a row ID to reference the original table for data not stored in the index.
  • ⚡ The benefits of indexing include fast search capabilities due to binary search-like operations, with logarithmic scalability.
  • 🛑 However, indexing is not without trade-offs; it can slow down write operations such as inserts, updates, and deletes due to the need to update the index as well.
  • 👀 Understanding execution plans is vital for diagnosing how a database will use an index to execute a query, with different 'access types' indicating different strategies.
  • 💡 The talk demonstrates practical scenarios where mismanagement of indexes can lead to suboptimal performance, emphasizing the need for developers to have a deep understanding of how to design effective indexes.
  • 🚫 Common pitfalls include the misuse of functions in WHERE clauses which can invalidate the use of indexes, and the importance of column order in multi-column indexes.
  • 🔄 The order of columns in an index matters significantly, as does the presence of inequality operators which can limit the effectiveness of an index.

Q & A

  • What is the main topic of the speaker's presentation?

    -The main topic of the speaker's presentation is database indexing and why every developer should understand its importance and nuances.

  • Why did the speaker change the title of Joel Spolsky's blog post in their talk?

    -The speaker changed the title to focus on database indexing instead of Unicode because they wanted to emphasize the importance of understanding indexing in the context of their talk.

  • What is the speaker's profession and where is their workplace located?

    -The speaker works for a software development agency located in Berlin.

  • What is the common mistake the speaker sees in developers' understanding of database indexing?

    -The common mistake is that developers often think that adding an index to every column in the 'where' clause of a query will improve performance, without understanding the nuances and potential downsides of indexing.

  • What is the B-tree and why is it significant in the context of database indexing?

    -The B-tree is a balanced tree data structure used in databases to store indexes. It is significant because it allows for efficient searching, insertion, deletion, and access to data in a sorted manner.

  • Why is the order of data important when creating an index?

    -Order is important because it allows for more efficient searching, such as binary search, which is much faster on ordered data compared to unordered data.

  • What is the purpose of the doubly linked list in the leaf nodes of a B-tree?

    -The doubly linked list in the leaf nodes allows for efficient sequential scanning of data without having to go back up the tree each time, thus improving performance during searches.

  • What additional piece of information does the database store along with the indexed values?

    -The database stores a row ID, which is a database-internal identifier that points to a specific row in a table, allowing the database to retrieve the full row if needed.

  • What is the difference between a 'range scan' and a 'full index scan' according to the speaker?

    -A 'range scan' uses the index to find the starting point of a range and then scans through the leaf nodes within that range. A 'full index scan' scans through every value in the index without using it for limiting or filtering the rows.

  • Why did the speaker's initial attempt to optimize a query with an index not improve performance?

    -The initial attempt did not improve performance because the query involved a function on the indexed column (`YEAR` function on the `created_at` column), which prevented the database from using the index effectively.

  • What is the consequence of using functions on indexed columns in a query's WHERE clause?

    -Using functions on indexed columns in a WHERE clause can prevent the database from using the index, as the function's result may not correlate with the index values, thus negating the benefits of indexing.

  • What is the 'force index' and why should it be used with caution?

    -The 'force index' is a way to force the database to use a specific index for a query. It should be used with caution because it can lead to suboptimal query performance if the chosen index is not the most efficient for the query's needs.

  • What is the term used to describe an index that can be used to satisfy a query entirely from memory?

    -An 'index only scan' is used to describe an index that contains all the data needed for a query, allowing the operation to be performed entirely in memory without additional disk reads.

  • Why did adding the 'total' column to the index improve the performance of the query?

    -Adding the 'total' column to the index improved performance because it allowed the query to perform an 'index only scan', meaning all necessary data was available in the index, eliminating the need for additional disk reads.

  • What is the significance of the order of columns in a multi-column index?

    -The order of columns in a multi-column index is significant because it determines which parts of the index can be used for filtering data in a query. The database can only use the index from left to right and cannot skip columns.

  • What is the impact of using inequality operators on the usage of a multi-column index?

    -Using inequality operators on any of the columns in a multi-column index can limit the effectiveness of the index. The index can only be used up to the point where the inequality operation is applied, as if the index stops at that column.

  • What is the final recommendation the speaker gives regarding indexing and query optimization?

    -The speaker recommends that developers should consider indexing as their concern, design indexes specifically for the queries they write, and understand that the context of data access is crucial for creating efficient indexes.

Outlines

00:00

🗣️ Database Indexing Introduction

The speaker, Keselowski, opens the session with an energetic introduction, expressing excitement about the event and the topic of database indexing. He references Joel Spolsky's blog post on Unicode, changing the subject to indexing, and shares his contact information. Keselowski works for a software development agency in Berlin and hints at hiring opportunities. He discusses the importance of understanding what a database index is and its role in improving query performance, which is crucial for application performance. He emphasizes that while developers generally understand the purpose of indexes, many lack deep knowledge about how to use them effectively, which is the gap he aims to address in his talk.

05:03

📚 The Theory of Indexing

Keselowski dives into the theory behind database indexing, starting with the analogy of a phone book to explain how indexes work. He introduces the concept of a B-tree, a balanced tree data structure used in databases for indexing. The speaker explains the properties of B-trees, such as all leaf nodes being at the same level, which ensures consistent search time. He also discusses the doubly linked list nature of leaf nodes for efficient sequential scanning. The summary clarifies that indexes only contain the columns they are created on, and the database uses an additional row ID to retrieve the full row from the table when needed.

10:03

🔎 Understanding Execution Plans

The speaker discusses the importance of understanding execution plans for optimizing database queries. He explains that execution plans show the steps a database takes to execute a query, and the 'type' column in the plan indicates how the database accesses the data. Keselowski outlines different access types like 'const', 'ref', 'range', 'index', and 'all', each with its implications for performance. He emphasizes that developers should aim for 'const' or 'ref' in their execution plans for efficient querying and avoid 'all', which indicates a full table scan.

15:03

🛠️ Live Coding and Indexing Pitfalls

Keselowski presents a live coding session to demonstrate the practical application of indexing. He creates a hypothetical scenario where a report is requested for all orders placed in 2013. Initially, he makes a common mistake by adding an index on the 'created_at' column without considering the query's full requirements. This leads to a full table scan despite the index, highlighting the importance of aligning indexes with query needs. The speaker also points out the pitfalls of using functions in the WHERE clause, which can prevent the use of indexes.

20:06

🚀 Optimizing Queries with Indexes

The speaker revisits the previous query and explains how to optimize it by adjusting the index to include necessary columns for the query's SELECT and WHERE clauses. By creating a composite index that includes 'created_at', 'total', and 'user_id', Keselowski demonstrates how to perform an index-only scan, which can dramatically improve query performance by eliminating disk reads. He emphasizes the trade-off between creating specific indexes for particular queries and maintaining a balance to avoid excessive indexing.

25:07

🔄 Adjusting Indexes for Changing Requirements

Keselowski addresses a new requirement to modify the report for a single user's orders. He shows how adding a 'user_id' to the query without adjusting the index leads to a full table scan. The solution is to include 'user_id' in the index to avoid unnecessary disk reads. However, he also explains the limitations of multi-column indexes, noting that the order of columns in an index is crucial and that skipping columns or using inequality operators can limit the index's effectiveness.

30:07

🔍 The Nuances of Multi-Column Indexes

The speaker further explores the complexities of multi-column indexes, explaining that they can only be used from left to right and that inequality operators can restrict their use. He demonstrates how changing the order of columns in an index can affect its usability with different queries. Keselowski emphasizes the importance of designing indexes specifically for the queries they will support, rather than creating generic indexes.

35:09

🏁 Conclusion: Indexing as a Developer Responsibility

In the conclusion, Keselowski reiterates that indexing is a critical concern for developers. He stresses that indexes should be designed in the context of specific queries and that developers, who understand their data access patterns, are best positioned to create effective indexes. The speaker encourages developers to take ownership of indexing and to make informed decisions based on the needs of their applications and queries.

Mindmap

Keywords

💡Database Indexing

Database indexing is the process of creating a data structure that improves the speed of data retrieval operations at the cost of additional writes and storage space to maintain the index. In the context of the video, indexing is crucial for performance optimization as it can drastically reduce the time taken to execute queries. The speaker emphasizes the importance of understanding indexing for developers, as it directly impacts application performance.

💡Query Performance

Query performance refers to how quickly a database can retrieve data in response to a query. The video discusses the impact of slow queries on application performance and how proper indexing can mitigate this by making reads faster. The speaker uses the analogy of a site being 'super snappy' or 'literally unusable' to illustrate the difference that query performance can make.

💡B-Tree

A B-Tree is a balanced tree data structure that maintains sorted data and allows searches, insertions, and deletions in logarithmic time. In the video, the B-Tree is introduced as the underlying data structure for database indexes, which ensures that the index remains balanced and efficient for retrieval operations, regardless of the size of the data set.

💡Index Design

Index design involves creating indexes that are tailored to the specific needs of the queries being executed. The video script highlights that a poorly designed index can actually degrade performance. The speaker illustrates this with examples, such as creating an index that includes necessary columns for a query to run faster.

💡Execution Plan

An execution plan is a detailed report that shows the steps a database engine takes to execute a query. In the video, the speaker explains how to read and interpret execution plans to understand how a query is being processed and whether the database is effectively using indexes to improve performance.

💡Index Only Scan

An index only scan occurs when a query can be satisfied entirely using the index without having to access the underlying table. The video explains that this can significantly improve query performance since all necessary data is in the index, reducing the need for disk reads. The speaker demonstrates this concept by creating an index that includes all columns needed for a specific query.

💡Full Table Scan

A full table scan happens when the database engine reads every row in a table to satisfy a query, which is often an inefficient operation. In the video, the speaker discusses how a full table scan can sometimes be faster than using an index if the query involves aggregation and the index does not include all necessary columns.

💡Multi-Column Index

A multi-column index is an index that includes more than one column, allowing for more complex query patterns. The video script explains that the order of columns in a multi-column index is significant and that the index can only be used from left to right, which affects how effectively it can be used for a given query.

💡Function-Based Index

A function-based index is an index created on the result of a function applied to a column. The video mentions that while some databases support function-based indexes, others do not, and the speaker suggests alternative strategies for dealing with functions in queries to maintain index usability.

💡Index Pitfalls

Index pitfalls refer to common mistakes or misunderstandings when designing and using indexes. The video script outlines several pitfalls, such as the misuse of functions in where clauses, the importance of column order in multi-column indexes, and the limitations of using indexes with inequality operators.

Highlights

The importance of understanding database indexing for developers to improve application performance.

Indexing is not just about faster reads; it's a nuanced aspect of database management.

The analogy of a phone book to explain the concept of an index in a database.

Introduction of the B-tree as the fundamental data structure behind database indexes.

The significance of order in an index for efficient data retrieval.

How the B-tree maintains balance to ensure consistent search performance.

The role of leaf nodes and doubly linked lists in efficient index traversal.

Clarification that an index does not contain all table data, only the indexed columns and a row ID.

The impact of using functions on columns within a WHERE clause, negating the use of indexes.

The concept of an 'index-only scan' and its potential for maximizing query performance.

The pitfalls of designing indexes without considering the specific queries they need to serve.

The demonstration of how adding unnecessary columns to an index can degrade performance.

The importance of column order in multi-column indexes and how it affects query performance.

The issue with inequality operators in multi-column indexes and their effect on index utilization.

The practical example of optimizing a query for orders placed in 2013, illustrating the complexities of indexing.

The conclusion emphasizing that indexing is a critical concern for developers, not just database administrators.

Transcripts

play00:00

[Music]

play00:02

[Applause]

play00:05

[Music]

play00:09

[Applause]

play00:12

welcome everyone let me start my timer

play00:14

really quickly so welcome everyone I am

play00:18

super excited to be here it's been an

play00:20

awesome event so far I've really been

play00:21

enjoying myself greatly talking to all

play00:23

sorts of cool people here I'm slightly

play00:25

mad at risky for hanging the bar so high

play00:27

but anyways I am here to talk to you

play00:30

about things that I believe every

play00:32

developer absolutely positively needs to

play00:35

know about database indexing which by

play00:38

the way is a the title of an old blog

play00:40

post by Joel Spolsky which was called

play00:42

things I believe every developer

play00:44

absolutely positively needs to know

play00:45

about Unicode so I just changed the last

play00:48

words because I thought that was cool my

play00:51

name is Keselowski you can find me at my

play00:54

blog where I post once every two years

play00:56

on github or on Twitter where I post

play00:59

once every three to four months I work

play01:02

for this awesome group of people we are

play01:03

a software development agency in Berlin

play01:06

we are hiring so if you're interesting

play01:08

come talk to me after the talk now when

play01:13

I interview people I like to ask them

play01:16

the question what is the database index

play01:18

and what do we use it for and most

play01:21

people are able to answer the question

play01:24

something like this well a an index

play01:27

makes reading faster improves query

play01:30

performance and yes that is that is

play01:34

actually correct that is the answer of

play01:35

why we index our database now the

play01:37

question is like why do we want fast

play01:39

reads then well turns out slow queries

play01:42

are one of the most common if not the

play01:44

most common causes of poor application

play01:46

performance I believe we have all

play01:48

experiences at least once in our careers

play01:50

where we just happens one query that

play01:52

when it runs it just grinds everything

play01:54

to a halt a slow query can be the

play01:57

difference between a site being super

play01:58

snappy and being literally unusable and

play02:01

sometimes the difference between these

play02:02

two extremes is just one good index away

play02:05

so if most developers can already answer

play02:09

the question of why do we use an index

play02:11

to improve read performance and I

play02:14

believe we can all agree on this that

play02:15

slow queries are in fact

play02:18

what am I doing here why am i talking to

play02:21

you right now

play02:22

the reason is that developers do not

play02:25

know enough about indexing indexing is a

play02:27

lot more nuanced than just throwing an

play02:29

index at every column in your where

play02:31

clause I'm hoping that one of them

play02:32

sticks if you don't know what you're

play02:34

doing a bad index can actually make

play02:36

performance worse excuse me I got a

play02:39

little emotional there so here's what's

play02:42

gonna happen I won't be able to make you

play02:44

into an expert there's simply not enough

play02:45

time for that

play02:46

my main goal for this talk is to show

play02:49

you how much there actually is to this

play02:51

topic if you leave this talk thinking

play02:54

man I should really read up on this

play02:55

indexing stuff then I'm happy to have

play02:58

done my job nevertheless we are gonna

play03:00

learn a few practical things but first

play03:02

there's gonna be some theory we are

play03:05

gonna talk about what an index actually

play03:07

is and I'm talking data structures this

play03:10

is to me the most important part of the

play03:11

talk because everything else everything

play03:14

about how an index behaves in certain

play03:16

situation why it sometimes works and

play03:17

sometimes doesn't can be deduced from

play03:20

understanding how the underlying data

play03:21

structure works you don't have to

play03:23

memorize everything that's coming up in

play03:25

the rest of the talk if you understand

play03:27

this part of it you're in a very good

play03:29

shape then points two and three are

play03:32

understanding the execution plan and

play03:34

common pitfalls when designing indices

play03:37

and these are gonna happen roughly at

play03:39

the same time we're gonna explore both

play03:41

as we go through the example okay

play03:45

so what's an index when I ask people

play03:47

what an index is most of them like to

play03:51

use the sort of mental image of a phone

play03:54

book I don't know how much longer I will

play03:56

be able to make that example until

play03:59

people go a phone what case in point I

play04:02

could not find a picture of a phone book

play04:05

on the stock site I was using so I

play04:08

improvised here's a phone in a book okay

play04:12

if you think about a phone book for

play04:14

simplicity's sake let's say it has three

play04:16

pieces of information it has the first

play04:18

name the last name and the phone number

play04:20

and a phone book is almost always going

play04:23

to be ordered by last name so if I give

play04:27

you someone's last thing you can find

play04:28

the person in the phone book pretty

play04:30

quickly because the

play04:31

book is ordered for exactly that type of

play04:33

search query if I only give you

play04:36

someone's first name you're gonna have

play04:38

to look through every single entry and

play04:39

you probably end up with a whole bunch

play04:41

of results so you could say in a phone

play04:45

book is like having an index on last

play04:47

name more generally speaking an index is

play04:51

an order representation of the index

play04:53

data now why is order so important turns

play04:58

out what we're doing when we're creating

play04:59

our data is we are searching our data

play05:02

for a subset of that data and it turns

play05:06

out that searching an ordered input is a

play05:08

lot more efficient than searching in an

play05:10

unordered input think binary search

play05:12

write very very fast assumes your input

play05:15

is sorted okay enough with all the

play05:18

mental models and helper images the

play05:20

training wheels are about to come off we

play05:22

are going to look at what an index

play05:23

actually is introducing the b-tree what

play05:26

does the B in B tree stand for it is not

play05:30

binary thank you for saying binary it is

play05:33

in fact balanced it is a balanced tree

play05:36

we're gonna look at what exactly that

play05:39

means and how it is balanced but it is a

play05:42

balanced tree I was hoping I wouldn't I

play05:44

wasn't sure how I was gonna react with

play05:45

someone set balanced so thank you

play05:48

whoever said binary okay so what is the

play05:50

B tree looked like surprisingly it looks

play05:53

a lot like a tree now if you look at the

play05:57

very top note at 23 this is called the

play06:00

root node you can see that it has two

play06:02

sub trees it has the left sub tree and

play06:04

has the right sub tree all the values in

play06:06

the left sub tree are less than 23 and

play06:08

all the values on the right sub tree are

play06:10

greater than or equal to 23 and this is

play06:13

true for every node in the tree the left

play06:15

sub tree is always less than the right

play06:17

sub tree is always greater than or equal

play06:18

to this is how the tree is ordered what

play06:24

we can also see is that the nodes at the

play06:27

very bottom these are called the leaf

play06:29

nodes they don't have any further sub

play06:31

trees the leaf nodes are all at the same

play06:34

level of the tree or the same depth of

play06:37

the tree this is important because this

play06:41

way we can guarantee that it takes the

play06:42

same

play06:43

number of steps to find every node or

play06:46

any value in the tree if we had one side

play06:49

of the tree there was just way deeper

play06:50

right and it was unbalanced then

play06:52

searching for a value in that part of

play06:54

the tree would take longer which is not

play06:55

what we want okay so this is why it's a

play06:59

balanced tree the leaf nodes will always

play07:01

be at the same level the leaf nodes also

play07:05

form what's called a doubly linked list

play07:08

which you can see by the tiny arrows I

play07:10

added pointing back and forth so each

play07:13

set of leaf nodes has a pointer to the

play07:15

next set of leaf nodes and I pointed to

play07:17

the previous set of leaf nodes so there

play07:19

should actually be two more arrows on

play07:21

either side pointing to nala now why is

play07:24

this important

play07:25

turns out that when we're searching for

play07:26

data in an index we are gonna spend a

play07:29

lot of our time down in those leave

play07:31

notes scanning through them if we didn't

play07:34

have a doubly linked list we would have

play07:36

- as soon as we hit the end of one of

play07:38

those leaf nodes we'd have to go back up

play07:39

the tree find the next leaf node Skaar

play07:42

scanning go up again and so on that's

play07:44

not very efficient by having the W

play07:47

linked list we can just scan through

play07:48

them sequentially without having to go

play07:50

back up the tree every time so an

play07:55

interesting question now is what data is

play07:57

actually being stored on the index if we

play08:01

have a very simple table like this we

play08:04

just have employee ID and first names

play08:07

and say we put an index on the name

play08:10

column the index you would end up with

play08:13

looks like this the employee ID employee

play08:17

ID is nowhere to be found on this index

play08:20

it only contains the name in other words

play08:23

an index only contains the values of the

play08:26

columns you actually put the index on

play08:28

now this sounds very dull but I've heard

play08:32

people describe an index as well it's

play08:34

like a copy of your table that is

play08:36

ordered by a different column that is

play08:37

incorrect that actually has consequences

play08:40

and we're gonna see why it's important

play08:42

to know what data is actually being

play08:43

stored on the index this kind of begs

play08:46

the question now what if I need data

play08:50

that is not only index how do I get back

play08:53

to the table if I actually want to find

play08:55

out

play08:56

what is the employee ID of Taylor so

play09:00

this image is actually incomplete the

play09:03

database stores one additional piece of

play09:04

information on the index for every value

play09:07

the so-called row ID it's not literally

play09:10

a string row ID it is a database

play09:13

internal identifier that identifies a

play09:17

particular row in a table it is not your

play09:19

primary key it's a database internal

play09:21

value and so that the database stores

play09:24

that bit of information for every value

play09:26

in the index and that way we can go back

play09:28

to the table and find the corresponding

play09:31

row to which this value in the index

play09:35

belongs to okay so much for the data

play09:39

structure now what are the consequences

play09:41

of choosing that data structure

play09:43

first of all searching for a value is

play09:46

very very fast because essentially what

play09:49

you're doing is binary search right with

play09:51

every step you go either left or right

play09:53

and you eliminate a very large portion

play09:55

of your remaining data and so you can

play09:57

narrow down your search very quickly and

play10:00

either find the value or find out that

play10:03

there is no result and it scales really

play10:06

really well because it has logarithmic

play10:09

scalability which is a fancy way of

play10:10

saying it pretty much doesn't matter how

play10:12

large your input gets this will still

play10:14

keep performing okay so if indexing

play10:19

makes our queries faster then we should

play10:22

just put an index on every single column

play10:25

right chance to redeem yourself whoever

play10:26

said binary right no okay of course not

play10:33

there are no silver bullets we should

play10:35

notice by now there's always only

play10:36

trade-offs using an index improves

play10:39

retime yes but it makes writing slower

play10:43

with every insert update and delete you

play10:46

perform you also have to insert update

play10:48

and delete into the index which would

play10:50

mean the index will have to be

play10:52

potentially rebalance which can take a

play10:54

while and if you have a whole bunch of

play10:56

indices your rights are gonna be very

play10:58

very slow so as a rule of thumb you want

play11:01

to have as many indices as necessary but

play11:04

as few as possible okay moving on

play11:09

understanding the execution plan and in

play11:12

tiny font it says my sequel version this

play11:15

is because the output of the execution

play11:18

plan looks different for each database

play11:20

vendor and I've just made a gamble that

play11:23

most of us here will be using my sequel

play11:26

who here is not using my sequel or Maria

play11:28

to be okay to people like okay so sorry

play11:33

but you're gonna have to be a bit of

play11:34

extra work the concepts I'm going to

play11:38

talk about are gonna be the same across

play11:39

database vendors it's just that they use

play11:41

different terminology to speak about

play11:43

them now what the key execution plan

play11:46

actually is it's these steps that the

play11:48

database needs to perform to execute

play11:51

your query so here's how you get the

play11:55

execution plan that's pretty uniform

play11:57

across all database vendors you just

play11:58

pretend and explain to your query so

play12:01

instead of saying select star you say

play12:03

explain select star and then on my

play12:07

sequel you will get a result that looks

play12:09

something like this do note that this is

play12:12

actually one row and I have just

play12:14

formatted it that way so it fits better

play12:16

on a slide so it is one row and the

play12:19

columns are ID select type table

play12:21

partitions and so on now that's a lot of

play12:24

rows we're not gonna look I'm sorry

play12:26

columns we're not going to look at all

play12:29

of them but we're going to look at the

play12:30

most important one right now which is

play12:32

the type column right here it says type

play12:35

Const if you look up the column in the

play12:38

documentation it'll call it the join

play12:40

type which I think is not the best name

play12:43

for it because like in this example

play12:45

there is no joins I'm not really sure

play12:47

what it's referring to right it's a bit

play12:49

confusing to me I think a better name

play12:52

for this would be access type because

play12:55

what this really tells us is how the

play12:57

database is going to access our data how

play13:00

exactly it is going to use an index or

play13:03

not use an index to execute our query so

play13:07

we're gonna go through the possible

play13:09

values you might encounter in the

play13:10

explain output in that column and talk

play13:13

about the characteristics of each one of

play13:14

them really quickly don't worry if this

play13:16

is all a bit much right now we will come

play13:18

back to them as we go through the

play13:20

examples is just so that you've seen

play13:22

them and have a rough on

play13:23

standing in how they differ so starting

play13:27

with cons or a craft now these two

play13:32

values are actually distinct values they

play13:34

might appear on their own in the

play13:36

explained output I have grouped them

play13:38

together here because for our purposes

play13:40

they behave the same way they access the

play13:42

data in the same way so I'll treat them

play13:44

as one

play13:45

so what constant EKF do is the database

play13:51

is gonna perform a bee tree traversal to

play13:54

find a single value in the index tree so

play13:58

basically you're doing binary search

play14:01

right this can only get used if you can

play14:05

somehow guarantee uniqueness of your

play14:09

result set that means you can somehow

play14:10

guarantee that that can be at most one

play14:12

result there's two ways you can do this

play14:14

one would be to have a primary key on

play14:16

the column which is what I had in the

play14:18

example I had a where clause on the ID

play14:20

the other one is you have a unique

play14:23

constraint on that column fun fact limit

play14:26

does not guarantee uniqueness if you say

play14:28

limit 1 it does not guarantee uniqueness

play14:31

because you might still fetch more than

play14:33

one row you're just discarding them

play14:34

afterwards okay so in limit does not

play14:36

guarantee uniqueness ok so basically

play14:41

you're starting at the top of the tree

play14:42

and then you go left or right depending

play14:45

on if the value is less than or greater

play14:46

then until you either find a value or

play14:48

you find out that there is no value and

play14:51

as you can imagine that is super fast if

play14:54

you seek on story craft in your explain

play14:56

output stop optimizing it's not gonna

play14:58

get any faster moving on ref and range

play15:01

again these are two distinct values but

play15:03

we're grouping them together because for

play15:04

our purposes they are the same they

play15:06

behave the same way they are known what

play15:10

they are known as an index range scan

play15:13

what they do is they perform a b-tree

play15:16

traversal but instead of finding a

play15:18

single value they find the starting

play15:20

point of a range and then they scan from

play15:24

that point on so let's say we had a

play15:26

query where we say where ID greater than

play15:28

15 and less than 30 what this would do

play15:32

is it would do a bee tree traversal to

play15:34

find the first value that is greater

play15:35

than 50

play15:36

and from that point on it'll start

play15:39

scanning through the leaf nodes remember

play15:41

this is where the double linked list

play15:42

comes in until it hits the first value

play15:44

that is less than or greater than or

play15:47

equal to 30 and these rows are the only

play15:50

rules that database has eaten has to

play15:52

look at okay so what a range or a ref

play15:55

does is it limits the total number of

play15:58

rows the database has to inspect to

play16:00

perform your query which is a good thing

play16:02

right less work for the database next

play16:07

one index this is also known as a full

play16:10

index scan we are still using the index

play16:14

but we are not using it to limit the

play16:16

number of rows like we just did we are

play16:18

literally starting at the very first

play16:20

leaf node and we're gonna scan through

play16:22

all of them until we hit the very last

play16:24

leaf node okay so there's no filtering

play16:26

of anything going on you simply traverse

play16:29

through the entire index but you're

play16:31

still using the index and lastly there

play16:34

is all which is a so-called full table

play16:37

scan which is everything as bad as it

play16:39

sounds you do not want this think of all

play16:42

as avoid at all costs okay a full table

play16:45

scan is pretty much never what you want

play16:47

you do not use an index at all you are

play16:51

gonna load every row and every column of

play16:53

the table into memory go through them

play16:55

one by one and then omit or discard them

play16:58

depending on whether or not they fulfill

play17:00

the filter criteria okay so to recap we

play17:04

have cons or a craft basically binary

play17:07

search to find a single value super fast

play17:09

stop if you got that it's perfect next

play17:13

we have referent we are using the index

play17:15

to limit the number of rows to a certain

play17:17

subset of rows that we even have to look

play17:19

at next we have index we still use the

play17:23

index but we're not using it for

play17:24

limiting or filtering we're just

play17:25

scanning through every value in the

play17:27

index and lastly we have all avoided all

play17:29

costs the full table scan where we just

play17:31

load up everything and go through it

play17:34

okay so this is where the live coding

play17:36

part comes in this is where it gets

play17:38

scary

play17:40

let me see you don't see anything I'm

play17:42

gonna have to mirror my displays for

play17:44

that so I don't break my neck

play17:49

there we go oh no I'm giving everything

play17:52

away so let's see I have prepared a

play18:02

example database that contains a single

play18:05

table that is a formula unfortunately

play18:09

contains a single table called orders it

play18:12

has roughly 2.2 point five million rows

play18:15

in it and it's just garbage data

play18:17

basically it's randomly generated data

play18:19

it has four columns the ID the total

play18:22

which is the total number of cents of

play18:25

the order the user ID there's not even a

play18:27

users table and the created add date

play18:30

when that order was placed now here's

play18:33

what we need to do management comes to

play18:35

us and says we want a report we want to

play18:39

know the total sum of all orders that

play18:43

were placed in 2013 so let's see sounds

play18:51

easy enough so we're going to do

play18:54

something like this we're going to say

play18:55

select the sum of the total column from

play18:59

the orders table we're okay we're what

play19:03

orders placed in 2013 so you might do

play19:06

something like this I have seen this so

play19:09

there is a year function in my sequel

play19:12

that basically extracts the Year part

play19:15

from a date and returns it as either a

play19:18

number or a string I'm not quite sure so

play19:20

you can say what year of created add

play19:22

equals 2013 you run that it returns a

play19:28

value looks plausible enough you commit

play19:31

your force push you go home okay next

play19:35

day your boss comes to you and says yes

play19:38

we got the report numbers look good but

play19:40

oh my god is it slow

play19:42

can you please optimize and so you say

play19:45

okay I need to improve read performance

play19:47

that means I need an index I only have

play19:51

one value in my where class but just the

play19:53

created adds so I'm just gonna put an

play19:55

index on that let's do that I'm gonna

play19:58

use the GUI for this because it's faster

play20:00

so I

play20:01

ad you see that add an index I don't

play20:05

care what it's called but I want it to

play20:07

be on the created at column of the table

play20:12

so I save that it actually takes a few

play20:15

seconds as you can see every because it

play20:17

has to build up the index for the first

play20:18

time and now we are going to rerun the

play20:23

same query and it's still around 600

play20:26

milliseconds it did not change at all

play20:28

okay at this point you're thinking back

play20:32

to Larrick only you 2018 there was this

play20:34

guy talking about database indexing and

play20:36

there was something about and explained

play20:38

statements and maybe that can help so

play20:42

let's just look at the query execution

play20:44

plan I'm gonna prepend and explain to

play20:47

this query and I get back this output

play20:50

this time as a row not as a bunch of

play20:52

columns like on the slide and right away

play20:55

we can see here it says type all we are

play20:59

doing a full table scan we are loading

play21:01

all 2.5 million rows into memory and

play21:04

then we're gonna go through them one by

play21:05

one that sounds terrible what's weird is

play21:10

the two columns next to the type which

play21:12

are these two they are called I don't

play21:14

think it can read this

play21:14

they're called possible keys and key

play21:18

possible keys would contain the names of

play21:21

the indices that the database set could

play21:23

potentially be used for this query and

play21:25

then key contains the name of the index

play21:28

it actually used now the interesting

play21:30

thing is and this is kind of hard to see

play21:32

visit for some reason it sort of cuts

play21:34

off half of the text possible keys is

play21:37

null so for some reason even though we

play21:40

have an index on that column it is not

play21:43

even being considered right so the

play21:46

database things

play21:47

this index cannot work at all so what is

play21:50

going on which brings me to the first

play21:53

pitfalls which is functions if you have

play21:57

a query like this where in your where

play22:00

clause you say where year off created at

play22:03

something what a database essentially

play22:06

sees is this it doesn't matter what goes

play22:09

into the function because the column

play22:11

will not be seen by the database at all

play22:14

this is because you cannot guarantee

play22:16

that the output of the function has

play22:18

anything to do with the index values

play22:19

right let's assume you have a function

play22:22

instead of a year you have a function

play22:24

that calculates the number of string

play22:26

characters on a string so the output

play22:28

would be a number but you have your

play22:30

index on just the string field okay so

play22:33

that can't possibly work because now

play22:35

you're trying to use an index that's on

play22:37

a string column with the result of this

play22:39

function call which is going to return a

play22:40

number so there's no connection between

play22:42

these two values so if you use a

play22:46

function on a column in a where Clause

play22:48

you cannot use an index on that function

play22:51

anymore by saying here of created ad we

play22:54

have lost the ability to use an index on

play22:56

the created add column so this sucks

play23:00

what can we do there's a couple things

play23:03

we can do in Postgres there are things

play23:05

called function based indices which is

play23:08

basically instead of putting in index on

play23:10

the Year column you put the index on

play23:13

year I sorry on the created add column

play23:15

you put the index directly on year I've

play23:18

created at it's like a computed index

play23:20

and then this will behave correctly my

play23:22

sequel doesn't have that even in my

play23:24

sequel eight I checked this morning even

play23:28

my C plate doesn't have function based

play23:29

indices it has something that is a bit

play23:32

similar call that generated column so

play23:35

you basically add a new column to your

play23:36

table and in its declaration you can say

play23:39

the value of this field will be the

play23:42

result of calling year of created ad on

play23:44

that row so it's like I generated a

play23:47

computed field if you want and then

play23:49

since this is just a regular column in

play23:51

your table you could have put an index

play23:52

on that column and it would work but in

play23:55

our case please do not do this this is

play23:58

not how you work with dates in a

play24:00

database do not use a year column a

play24:03

month column add a column an hour column

play24:05

right there are ways to deal with dates

play24:07

in the database that are much better

play24:10

suited to this there's a reason that

play24:11

there's a date/time field in a database

play24:13

so what we're gonna do we're gonna get

play24:15

rid of this and let's think about what

play24:18

we want to do we want to get all orders

play24:20

that were placed in 2013 so why not just

play24:23

define the explicit range we are looking

play24:25

for why not say where

play24:28

created at between between what the

play24:32

first possible value in 2013 that is the

play24:35

first of December ad midnight and the

play24:40

last possible value in 2013 which would

play24:44

be the 31st of December at one second

play24:48

before midnight so this is how you deal

play24:52

with dates because now you can do much

play24:54

more interesting things you can say give

play24:56

me all rows that are between the 3rd of

play24:58

April at 7:00 in the morning

play25:00

and the 28th of August at 3:00 the

play25:05

afternoon right you can't do that with a

play25:06

year column so looks like we just solved

play25:11

our problem and we remove the function

play25:13

and it should be blazingly fast the

play25:15

place is gonna go wild and so let's just

play25:18

lets us run this let's just look at the

play25:20

explained statement for now oh boy still

play25:24

a full table scan but something changed

play25:28

the possible keys column now at least it

play25:32

sees our index as something that could

play25:34

potentially be used but for some reason

play25:36

it is deciding not to use the index and

play25:39

this is the point where if you don't

play25:41

know what's going on you might be

play25:43

tempted to make a mistake like this

play25:44

where you say piece-of-crap database I

play25:47

know better than you I just designed

play25:49

this handcrafted index for exactly this

play25:53

query you should really use it so in my

play25:56

sequel and this is do not try at home

play25:57

territory in my sequel you can do force

play26:01

index and then give it the name of the

play26:05

index like if it begins with force it's

play26:09

probably a bad idea

play26:10

so let's run the explain statement with

play26:13

the force index

play26:14

okay take a look at that we changed from

play26:18

a full table scan to arrange scan so now

play26:22

thinking back to the slide the previous

play26:25

slides a range scan means we are now

play26:26

using the index to find the beginning of

play26:28

a range and then just scanning until we

play26:30

hit the end of the range and don't look

play26:32

at anything other we can actually double

play26:35

check that my sequel the explained

play26:38

output has this column back here called

play26:41

rows

play26:41

which is terribly named like so many

play26:43

things in this output because Rose is

play26:46

not the number of rows in your result

play26:49

set it is the number of rows the

play26:52

database estimates it has to look at to

play26:55

execute your query so in this case it's

play26:58

around 460 thousand so that's around the

play27:01

fifth of our of our rows if I just

play27:05

remove the force index again to see the

play27:08

full table scan Rose says 2.4 million ok

play27:11

so we have to look at everything and

play27:13

with the range scan with the range scan

play27:16

we only look have to look at 466

play27:19

thousand so we've just proven the

play27:22

database wrong because everything is

play27:24

better now we are gonna remove the

play27:26

explain from a query run it it's gonna

play27:29

be amazing we're gonna get promoted and

play27:31

take a look at this oh my god it is

play27:33

still running it just took 4 seconds so

play27:39

we clearly have just discovered a bug in

play27:42

my sequel we should go and open a ticket

play27:46

and flame the maintainer 'he's about

play27:49

this piece of software don't do it

play27:52

there's some people pulling out their

play27:53

phones don't this is a joke ok but this

play27:58

is confusing if you don't notice what's

play27:59

going on this is super weird and this is

play28:00

what I meant if you don't know what

play28:02

you're doing you can make performance

play28:03

worse it just went from 600 milliseconds

play28:06

to 4 seconds I'm bad at math but that's

play28:09

more than twice as slow I think so

play28:12

what's going on is this comes back to

play28:15

what data is actually being stored on

play28:17

the index if you look at the query oh

play28:19

sorry if you think back to the index we

play28:22

have an index on only they created that

play28:24

column the query however says select sum

play28:29

of total we are also referencing the

play28:32

total column the total column is not on

play28:34

the index so what the database needs to

play28:37

do is for all of these 466 thousand rows

play28:41

it estimates it has to look at it will

play28:44

have to take the row ID go back to the

play28:47

table fetch the corresponding row that

play28:50

is a read from disk right fetch the row

play28:52

take the total column sum

play28:54

and do that 466 thousand times that is

play28:58

466 thousand is that's bad so you might

play29:04

say well isn't a full table scan even

play29:06

worse it has to do that 2.4 million

play29:09

times

play29:10

dramatic pause while I drink no it isn't

play29:16

the database is actually smart enough to

play29:18

know that if I have to do a full table

play29:20

scan anyways I know from the get-go I

play29:22

have to read everything anyways so it's

play29:25

not gonna read them one by one it'll

play29:27

actually batch read them and read a

play29:28

couple thousand at a times and the

play29:31

amount of disk IO is gonna be way less

play29:33

and turns out in this example that is

play29:36

actually way faster even though you have

play29:39

to still look at five times as many rows

play29:41

but you cut down on the number of disk

play29:43

IO the disk reads you have to perform

play29:45

and in total that is much faster than

play29:48

using the index and having to read from

play29:50

disk so the database rightly decided in

play29:53

this case a full table scan is actually

play29:55

faster than using the index so now we

play30:00

know what's happening still sucks so

play30:03

what can we do if we look at our where

play30:05

clause we've kind of exhausted our

play30:07

options there right we already have a an

play30:09

index on the created add column but if

play30:12

the issue is that not all data is

play30:15

present on the index why not put the

play30:17

data on the index why not put the total

play30:19

column on the index as well so this is

play30:22

what I'm gonna do I'm gonna take our

play30:23

existing index and just gonna change it

play30:26

to also include the total column gonna

play30:29

save that it'll take a second or two or

play30:34

three okay there we go

play30:35

and now let's rerun this explained

play30:38

statement there we go now it is

play30:42

voluntarily choosing our index because

play30:44

it can see that it doesn't have to read

play30:46

from disk anymore the interesting thing

play30:50

to note is that in this extra column

play30:52

which I think is the worst column in

play30:54

this entire output because it's not

play30:55

quite clear what is extra and the values

play30:58

that appear in this column are usually

play30:59

really weird like using index

play31:04

because what this really means is using

play31:07

index only we have just put all the data

play31:11

that this query needs on the index that

play31:14

means this operation can now be

play31:16

performed entirely in memory because my

play31:19

sequel stores its indices in memory we

play31:21

have put all the data that this query

play31:23

needs on the index so there are no reads

play31:25

from disk at all this is what's called

play31:28

an index only scan it's one of the most

play31:31

powerful tools you can use when you're

play31:33

designing an index it is also one of the

play31:36

most aggressive ones what do we mean by

play31:38

aggressive we have just created a very

play31:41

specific index and index terms probably

play31:44

only works for this query because not

play31:47

only have we indexed for the where

play31:49

clause we have index for the Select

play31:50

clause as well okay and thinking about

play31:53

the whole you should try to keep the

play31:54

number of indices to the minimum that

play31:56

you need introducing an index like that

play31:58

that can probably only be used by a

play32:00

single query is a trade-off that you you

play32:03

have to consider right but it can

play32:05

dramatically improve performance so if

play32:08

we now run this query without the

play32:10

explain it just took 92 milliseconds so

play32:13

from 600 milliseconds to below 100

play32:16

milliseconds okay great

play32:19

you got a home don't forget the force

play32:22

push next day management comes to you

play32:26

and says great job on the report we have

play32:28

a feature request and the hair on the

play32:32

back of your neck starts standing up and

play32:34

they say we want the same report but now

play32:36

we want to be able to do it for only a

play32:38

single employee they have a sigh of

play32:41

relief like that's easy okay so that's

play32:44

just for illustration purposes let's

play32:46

just take a random user ID no that's a

play32:49

total this one one three six and you say

play32:52

well that is easy we just say and user

play32:55

ID equals this one you run it it's still

play33:00

running it took two seconds okay let's

play33:03

look at the explain what's going on here

play33:05

oh my god we're doing a full table scan

play33:07

again well luckily the solution to this

play33:10

is pretty simple because we just

play33:12

basically reintroduced the same problem

play33:13

we just solved we added a column to the

play33:15

query that is

play33:17

not on the index and then we have to do

play33:18

all that reading from disk again so if

play33:20

the problem is the same why not get with

play33:23

the same solution just add the user ID

play33:25

column to the index I mean this indexing

play33:27

stuff is easy really so I'm just going

play33:30

to put the user ID on the index as well

play33:33

save it wait for it to finish and then

play33:39

let's rerun the explained statement

play33:40

there we go we are now back to using the

play33:43

index we're doing a range scan something

play33:46

is still not right though the rows

play33:49

column says it's still things it has to

play33:54

look at around half a million rows that

play33:56

doesn't seem right that's the same

play33:57

number it thought it had to look at when

play34:00

we didn't have the user ID constrain

play34:02

that should be a much more limiting

play34:04

constraint right a user should not maybe

play34:06

have a couple dozen orders so why is it

play34:09

thinking that it has to use this look at

play34:12

the same number of rows than before this

play34:14

is telling me that it

play34:15

yes it's using the index but it's not

play34:17

using it to its full extent which brings

play34:22

me to the next pitfall so this is what

play34:26

our Inuk looks right now as a table okay

play34:29

we have created add we have total we

play34:31

have user ID so this means this table or

play34:33

the index is sorted first by created add

play34:36

then if two values have the same created

play34:39

add value they're sorted by total and if

play34:41

they have the same created add and total

play34:43

value they are then sorted by user ID

play34:47

here's what you need to know about multi

play34:49

column indices you can use a multi

play34:52

column index from left to right so you

play34:55

can use this index for a query that uses

play34:59

that filters on created at filters on

play35:02

created a done total and created a total

play35:05

and user ID you cannot skip columns what

play35:09

we're doing is we have a where clause on

play35:11

created add and user ID we're skipping

play35:14

the total that doesn't work why doesn't

play35:16

it work because the user ID itself is

play35:18

not sorted it is only sorted in respect

play35:21

to the total and the created add so if

play35:24

we just leave out the middle column the

play35:26

user ID column is essentially unsorted

play35:28

so it is still

play35:30

using the index but it's only using it

play35:32

up until created at point being the

play35:36

column order in an index matters and

play35:39

index on a and B is not the same as an

play35:41

index on BN a so if this is the problem

play35:46

and we have a where clause on on created

play35:51

and user ID the solution seems to be to

play35:54

put the user ID into the second place

play35:56

right so we can use the first two

play35:57

columns of the index so let's do that

play35:59

I'm gonna rearrange the columns in the

play36:01

index and put the user ID into the

play36:03

second position save it and then when we

play36:07

go back and we rerun it it should now

play36:11

have to look at way fewer rows than

play36:13

before

play36:15

it doesn't it actually has to look at

play36:17

more rows but that's just because this

play36:19

is an estimation so it fluctuates a bit

play36:21

but it still says it has to look at four

play36:24

hundred five hundred thousand rows

play36:26

nothing changed

play36:28

so everything I just said is correct

play36:30

right the column order matters there's

play36:32

actually another problem with this query

play36:35

which brings me funnily enough to my

play36:38

last pitfall which is inequality

play36:40

operators yes you can use an index from

play36:44

left to right but as soon as you have an

play36:47

inequality operation on any of those

play36:49

columns in the index it's as if the

play36:51

index just stops there we actually have

play36:55

an inequality operation we have a

play36:57

between on the created add column the

play37:00

created at column is the first column in

play37:03

our index so it's as if our index just

play37:06

stops there

play37:08

this is why it's basically unchanged

play37:11

between having a user ID and not having

play37:15

the user in a query because it doesn't

play37:17

even get to that point right the index

play37:18

can only be used up until created app

play37:21

because we have an inequality operation

play37:23

on that column so again if this is the

play37:26

problem then we should just put the user

play37:30

ID into first position right because we

play37:32

have an equality operation on user ID

play37:34

and then we put the created either into

play37:37

the second position because we have an

play37:40

inequality operation on that so let's

play37:43

let's try

play37:44

I'm gonna change the index one last time

play37:48

put the nut of total put the user ID

play37:54

into first position created at into

play37:57

first position so this could should mean

play37:59

we should be able to use the index to

play38:01

limit the number of rows on just the

play38:03

rows that the user ordered and then

play38:06

limit them further to only the orders

play38:09

that were placed in 2013 so if

play38:11

everything went according to plan and we

play38:13

rerun this query now there we go

play38:17

886 rows okay now it's using the index

play38:21

to its full extent to use both column in

play38:23

the index to filter our number of rows

play38:25

we have to look at if we remove the

play38:28

explained statement

play38:30

it just ran under a millisecond okay we

play38:34

went from four seconds to under a

play38:37

millisecond that is really really fast

play38:40

okay so you commit you for suppose you

play38:42

go home next day management says cool

play38:46

the user report is really well but we

play38:48

noticed that the report on all orders

play38:52

seems to have gotten slower again and

play38:54

you're like no this is this isn't

play38:55

happening let's have a look remove the

play38:59

user ID run the query Oh God we're back

play39:02

to 600 milliseconds what is going on

play39:05

let's take a look at the explain oh this

play39:08

is a new one index you've seen that one

play39:12

before

play39:12

so now we're doing a full index scan we

play39:16

are still using the index but we're not

play39:18

using it too limiting the number of rows

play39:21

we have to look at we are basically

play39:23

starting at the very first leaf node and

play39:25

then just traverse through all 200 2.5

play39:28

million rows which we can see back here

play39:31

in the rows column where it says yes I

play39:33

do have to look at every single column

play39:36

so what just happened well we changed

play39:39

the order of the columns now the user

play39:41

IDs in first place that query doesn't

play39:44

have a user ID and so since we can use

play39:47

since we can only use an index from left

play39:49

to right and we can skip columns we

play39:52

basically can't use the index anymore

play39:54

because the created add is not

play39:55

second column okay so the point is there

play40:01

is no query that can satisfy us our

play40:04

there is no index that can satisfy both

play40:06

these queries and this is sort of the

play40:10

the moral of this entire talk is

play40:14

indexing is a developer concern it is

play40:18

not the concern of your database person

play40:21

or the guys sitting in the corner over

play40:23

there who seems to have my sequel

play40:26

workbench open all the time

play40:27

it is a developer concern and it's

play40:29

because a index and a query always have

play40:33

to go together you do not design an

play40:35

index in a vacuum you always design an

play40:38

index for a query and only we as

play40:41

developers know how our queries actually

play40:43

look like how we are accessing the data

play40:45

so only we are in a position to write a

play40:49

good index so in this case you would

play40:51

actually have to decide do I introduce a

play40:54

second index is the report that's run

play40:57

for all users maybe only run once a year

play40:59

so it really doesn't matter if it takes

play41:02

600 milliseconds right it's it's

play41:04

something that you can only decide if

play41:06

you know the context and you know how

play41:08

your index how your data is being

play41:10

accessed so to close this talk everyone

play41:14

repeat after me I am a developer

play41:19

indexing is my concern indexing is my

play41:25

concern thank you all for listening

play41:29

[Applause]

play41:30

[Music]

Rate This

5.0 / 5 (0 votes)

Связанные теги
Database IndexingQuery OptimizationDeveloper TipsPerformance TuningData StructuresB-Tree IndexIndex DesignExecution PlanIndex PitfallsTechnical Talk
Вам нужно краткое изложение на английском?