Learn Hash Tables in 13 minutes #️⃣

Bro Code
20 Oct 202113:26

Summary

TLDRThis script explains hash tables as a data structure for storing key-value pairs efficiently. It covers how keys are hashed to find their index in the table, using modulus to handle large numbers. Collisions, where different keys hash to the same index, are resolved using linked lists. The script also discusses the impact of table size on collisions and efficiency, with a focus on Java implementation and practical examples.

Takeaways

  • 🔑 **Key-Value Pairs**: A hash table is a collection of key-value pairs, where each pair is called an entry.
  • 📚 **Data Types**: The key and value can be of any data type, such as integers and strings.
  • 🔢 **Hash Function**: The hash function takes a key and converts it into an integer called a hash.
  • 🔄 **Collision Handling**: If two keys have the same hash, it results in a collision, which is resolved by chaining.
  • 🔄 **Chaining**: Collisions are handled by turning each bucket into a linked list if more than one entry is at the same index.
  • 📏 **Index Calculation**: The index for placing an entry is calculated by taking the hash modulo the capacity of the hash table.
  • 🔍 **Lookup Efficiency**: Hash tables are efficient for lookups, insertions, and deletions, especially with a low number of collisions.
  • 📈 **Dynamic Resizing**: Hash tables can dynamically resize to accommodate more elements, maintaining efficiency.
  • 💾 **Memory Usage**: Increasing the size of the hash table reduces collisions but increases memory usage.
  • 📊 **Runtime Complexity**: The runtime complexity of a hash table is between O(1) for best case (no collisions) and O(n) for worst case (all entries in one bucket).

Q & A

  • What is a hash table?

    -A hash table is a data structure that stores unique keys mapped to values, where each key-value pair is known as an entry.

  • What are the two pieces of data in each entry of a hash table?

    -Each entry in a hash table consists of a key and a value, where the key is used to access the corresponding value.

  • How does a hash table determine the index for each entry?

    -A hash table uses a hashcode method to convert the key into an integer called a hash. Then, it calculates the index by taking the hash modulo the capacity of the hash table.

  • What happens when two keys have the same hash value?

    -When two keys have the same hash value, it's called a collision. To resolve this, each bucket in the hash table can be turned into a linked list to store multiple entries.

  • How can collisions in a hash table be reduced?

    -Collisions can be reduced by increasing the size of the hash table, which decreases the likelihood of multiple keys having the same hash value.

  • What is the most common resolution for collisions in a hash table?

    -The most common resolution for collisions is chaining, where each bucket is turned into a linked list to store multiple entries that hash to the same index.

  • How does the size of the hash table affect its efficiency?

    -A larger hash table size reduces the number of collisions, increasing lookup efficiency. However, it also increases memory usage.

  • What is the runtime complexity of a hash table in the best and worst cases?

    -The best-case runtime complexity of a hash table is O(1), when there are no collisions. The worst-case complexity is O(n), when all entries collide and are stored in a single bucket as a linked list.

  • How does the data type of the key affect the hash code in a hash table?

    -Different data types have different hash code formulas. For example, the hash code of an integer key is the number itself, while the hash code of a string key is calculated by summing the ASCII values of its characters.

  • How can you access and display all key-value pairs in a hash table?

    -You can access values using the get method with a key, and display all pairs by iterating over the keys using an enhanced for loop and printing both the key and the value retrieved by the get method.

  • What is the purpose of the modulus operator in the context of hash tables?

    -The modulus operator is used to find the remainder when the hash value is divided by the hash table's capacity, which determines the index or bucket where the entry will be placed.

Outlines

00:00

🔑 Understanding Hash Tables

The first paragraph introduces the concept of hash tables, which are collections of key-value pairs, also known as entries. It uses the analogy of a teacher creating a hash table for student names and unique IDs. The key is an integer, and the value is a string. The process of hashing involves converting keys into a hashcode using a formula, which for integers is simply the number itself. The hashcode is then used to determine the index in the hash table by dividing it by the table's capacity and using the remainder as the index. The concept of collisions, where different keys produce the same hash, is introduced, and the solution of using linked lists within each bucket is explained.

05:01

💻 Implementing a Hash Table in Java

The second paragraph focuses on implementing a hash table in Java. It discusses the declaration of a hash table with specific data types for keys and values, in this case, integers and strings. The paragraph explains how to add key-value pairs using the 'put' method and access values using the 'get' method. It also demonstrates how to iterate over the keys of a hash table using an enhanced for loop and how to remove entries using the 'remove' method. The importance of displaying hash codes for each element is emphasized to understand how keys are placed within the hash table. The paragraph concludes with a discussion on changing the data type to strings and the need for a different hashing formula for strings.

10:03

🔍 Collisions and Hash Table Efficiency

The third paragraph delves into the issue of collisions in hash tables and how they can be mitigated by increasing the size of the hash table. It explains that different data types require different hash code formulas and shows how to calculate the index for each entry by using the modulus operator with the size of the hash table. The paragraph illustrates how increasing the table size can reduce collisions but also increases memory usage. It concludes by discussing the efficiency of hash tables, noting that fewer collisions lead to better performance. The runtime complexity of hash tables is also explained, ranging from O(1) in the best case to O(n) in the worst case, depending on the number of collisions.

Mindmap

Keywords

💡Hash Table

A hash table is a data structure that implements an associative array abstract data type, a structure that can map keys to values. In the video, hash tables are used to store student names and IDs, where each student's ID is a key and their name is the corresponding value. The hash table allows for efficient storage and retrieval of data, making it easy to look up a student's name by their unique ID.

💡Key-Value Pairs

Key-value pairs are the fundamental units of data in a hash table. Each key is unique and maps to a specific value. The video uses the example of students where the key is the student's ID (an integer) and the value is the student's name (a string). This concept is central to understanding how data is organized within a hash table.

💡Hashcode Method

The hashcode method is a function that takes a key as input and returns a hash, which is an integer. The video explains that for integers in Java, the hashcode is simply the number itself. This method is crucial for determining where in the hash table a key-value pair will be stored.

💡Collision

A collision occurs when two keys have the same hash value, meaning they would理论上 be placed at the same index in the hash table. The video demonstrates this with string keys, where 'spongebob' and 'patrick' both end up with a hash that maps to index 5, causing a collision. Collisions are a challenge in hash table implementation because they can affect the efficiency of data retrieval.

💡Bucket

A bucket is an indexed storage location within a hash table. Each index in the hash table is a bucket that can hold one or more entries. When collisions occur, buckets can store multiple entries, which are then managed as a linked list. The video mentions that in the case of a collision, each bucket can be treated as a linked list to handle multiple entries.

💡Modulus Operator

The modulus operator is used to find the remainder of a division. In the context of hash tables, it helps determine the index at which a key-value pair should be stored. The video explains that after calculating the hash of a key, you divide it by the capacity of the hash table and then take the modulus to find the index, as shown with the calculation 'hash % capacity'.

💡Linked List

A linked list is a linear data structure where each element is a separate object, and each object points to the next. In hash tables, when multiple key-value pairs hash to the same index (bucket), they are stored in a linked list to handle collisions. The video illustrates how entries in the same bucket are linked together to allow for efficient searching.

💡Load Factor

The load factor is a measure of how full the hash table is, defined as the ratio of the number of entries to the number of buckets. The video mentions that Java's default initial load factor is 0.75, meaning the hash table will expand when 75% of its buckets are filled. Adjusting the load factor can help manage memory usage and performance.

💡Chaining

Chaining is a collision resolution technique where each bucket in a hash table contains a linked list of all the entries that have the same hash index. The video explains chaining as a way to handle collisions by turning each bucket into a linked list, allowing for multiple entries at the same index.

💡Efficiency

Efficiency in the context of hash tables refers to how quickly and with what resources operations such as insertion, lookup, and deletion can be performed. The video emphasizes that fewer collisions lead to greater efficiency, as each entry ideally has its own bucket. The efficiency is also tied to the hash function and the size of the hash table.

💡Runtime Complexity

Runtime complexity is a measure of the time it takes for an algorithm to run as a function of the length of the input. The video discusses how the runtime complexity of a hash table can vary from O(1) in the best case (no collisions) to O(n) in the worst case (all entries collide and are in a single bucket). This concept is crucial for understanding the performance characteristics of hash tables.

Highlights

A hash table is a collection of key-value pairs.

Each key-value pair is known as an entry.

Keys can be of any data type.

Hashcode method converts keys into integers.

In Java, the hash of an integer is the number itself.

Hashes are too large, so they are divided by the hash table's capacity.

The remainder from the hash divided by capacity is used as an index.

The modulus operator is used to find the remainder.

Collisions occur when two hashes have the same value.

A collision resolution technique is turning each bucket into a linked list.

Lookup involves going to the index and searching linearly if necessary.

Hash tables are efficient for large data sets but have overhead for small ones.

The hash function's formula varies based on the key's data type.

An index in a hash table is also known as a bucket.

Chaining is the process of resolving collisions by linking entries.

To reduce collisions, increase the size of the hash table.

The runtime complexity of a hash table ranges from O(1) to O(n).

Hash tables allow for fast insertion, lookup, and deletion of key-value pairs.

Transcripts

play00:02

all right what's going on everybody hash

play00:04

tables a hash table is a collection of

play00:07

key value pairs each key value pair is

play00:10

known as an entry we have two pieces of

play00:13

data the first is the key and the second

play00:16

is the value in this example let's

play00:18

pretend that we're a teacher and we need

play00:20

to create a hash table of all of our

play00:22

students each student has a name and a

play00:25

unique student id number but these can

play00:28

be of any data type that you would like

play00:30

in this example the key will be an

play00:32

integer and the value will be a string

play00:34

so how do we know at which index to

play00:36

place each of these entries well what we

play00:39

could do is take each key and insert it

play00:42

into a hashcode method the hashcode

play00:44

method will take a key as input plug it

play00:47

into a formula and spit out an integer

play00:50

this integer is known as a hash now if

play00:52

we're finding the hashcode of an integer

play00:55

in java that's actually really easy the

play00:57

formula is the number itself so the hash

play01:00

of 100 is 100 123 is 123. so on and so

play01:05

forth with the other keys so after

play01:08

finding the hash of your key now what

play01:10

can we do these numbers are way too

play01:12

large and the size of our hash table is

play01:15

only 10 elements what we'll do next is

play01:17

take each of these hashes and divide

play01:20

each of them by the capacity the size of

play01:23

our hash table we have 10 elements so

play01:26

take each hash divided by the capacity

play01:29

of our hash table whatever the remainder

play01:31

is we will use the remainder as an index

play01:35

and to find that we will use the modulus

play01:37

operator so 100 divides by 10 evenly the

play01:41

remainder is zero so 100 modulus 10

play01:45

gives us a remainder of zero so

play01:47

spongebob's entry we will place at index

play01:51

0 within our hash table so the hash of

play01:54

patrick's key is 123. 123 modulus 10

play01:58

gives us a remainder of 3. patrick's

play02:01

entry will be inserted at index 3 of our

play02:04

hash table so here's a little shortcut

play02:06

if you have a number modulus 10 you

play02:09

could find the remainder and that is the

play02:10

last digit 321 modulus 10 will give us a

play02:14

remainder of 1. sandy's entry will be

play02:17

placed at index 1. then squidward's

play02:19

entry will be placed at index 5.

play02:23

555 modulus 10 is 5 and gary's will be

play02:27

at index 7 following the same pattern

play02:29

now here's one situation what if two

play02:32

hashes are calculated to be the same

play02:34

value that is known as a collision and i

play02:37

can best demonstrate that with a

play02:39

separate example

play02:40

in this next example let's say that each

play02:43

key is now a string each entry is a pair

play02:46

of strings we will first need to find

play02:49

the hash of each of these keys

play02:51

so the hash code of a string uses a

play02:54

different formula basically speaking

play02:56

we're going to take the ascii value of

play02:59

each character within the string and

play03:01

plug it into this formula i went ahead

play03:03

and calculated the hash of each of these

play03:06

strings using this formula of the string

play03:09

hashcode method and the next steps are

play03:11

the same as before take each hash

play03:13

divided by the capacity of our hash

play03:16

table and find the remainder so

play03:18

beginning with the first hash

play03:21

48625 modulus 10 gives us a remainder of

play03:25

5 spongebob's entry is now at index 5

play03:29

within our hash table

play03:31

now patrick's will be at index zero now

play03:34

here's sandy's sandy's will also be zero

play03:38

we have a collision both of these

play03:40

entries will be located at the same

play03:42

index so what do we do each of these

play03:45

storage locations is also known as a

play03:47

bucket and the most common resolution

play03:50

for a collision in a hash table is that

play03:53

we will turn each bucket into a linked

play03:55

list if this bucket already has an entry

play03:58

within the first entry we will also add

play04:01

an address to the location of the next

play04:04

entry and keep on adding more if there's

play04:07

more entries within this bucket so in

play04:10

this way this becomes a linked list if

play04:12

we're looking up a key we first go to

play04:14

the index in which it's located if

play04:17

there's more than one entry we will

play04:19

search linearly through this bucket as

play04:21

if it were a linked list until we find

play04:24

the key that we're looking for so that's

play04:26

the most common resolution when there is

play04:28

a collision but ideally you would want

play04:31

each of these entries to be within their

play04:32

own bucket based on the hash of

play04:35

squidward's key squidward's entry has an

play04:38

index of nine and gary gary has an index

play04:41

of five and there's another collision we

play04:44

will add the address of gary's location

play04:47

to the first entry and this bucket

play04:49

becomes a separate linked list this

play04:51

process is known as chaining the less

play04:54

collisions that there are the more

play04:56

efficient that this hash table is going

play04:58

to look up a value ideally you would

play05:00

want each entry to have their own bucket

play05:03

but collisions are possible to reduce

play05:05

the number of collisions you can

play05:07

increase the size of the hash table but

play05:09

then again the hash table is going to

play05:11

use up more memory then so people

play05:12

usually find a balance between the two

play05:15

so yeah those are hash tables in theory

play05:17

let's create our own now all right

play05:19

everybody so let's implement a hash

play05:21

table in java so we will need to declare

play05:24

this hash table and list the data types

play05:27

of our key value pairs if we need to

play05:29

store primitive data types we can use

play05:31

the appropriate wrapper class so let's

play05:33

store integers and strings

play05:36

integer and string the integers will be

play05:39

the key is the strings will be the

play05:41

values we'll map student id numbers and

play05:44

student names and i'll name this hash

play05:46

table just simply table

play05:48

equals new

play05:50

hash table

play05:53

there we go so in java when we create a

play05:56

hash table these have an initial

play05:58

capacity of 11 elements and a load

play06:00

factor of 0.75 so once 75 of our

play06:04

elements are filled this hash table will

play06:07

dynamically expand to accommodate more

play06:09

elements now you can set a different

play06:11

capacity for your hash table instead of

play06:13

11 let's say 10 to be consistent with

play06:16

our example in the previous part of this

play06:17

lesson and if you would like to change

play06:19

the load factor you can add that as well

play06:21

instead of 75 let's say 50

play06:24

so we would pass in a floating point

play06:26

number

play06:27

so 0.5 then add an f at the end for

play06:30

floating point numbers but in this

play06:32

example let's just keep the load factor

play06:34

consistent so let's start adding some

play06:36

key value pairs to add an element to

play06:38

your hash table use the put method so

play06:41

table dot put and then we will pass in

play06:45

an integer as the key and a string as

play06:48

the value so our first student is

play06:50

spongebob he has a student id of let's

play06:53

say 100

play06:54

and let's pass in a string for the value

play06:57

spongebob

play06:58

okay that is our first student so let's

play07:00

add a couple more from the previous

play07:02

example

play07:03

so we have spongebob

play07:05

patrick with an id of one two three

play07:08

sandy with an id of three two one

play07:11

squidward with an id of five five five

play07:16

and gary with an id of 777

play07:20

now to access one of these values you

play07:22

can use the get method of tables

play07:25

so i'll display this within a print line

play07:26

statement

play07:28

table dot get and i will pass in a key

play07:31

let's get the value at key number 100 so

play07:34

this student is spongebob how can we

play07:37

display all of the key value pairs of a

play07:39

table well we could use a for loop

play07:42

so i'm going to create a for loop

play07:45

and place this within it

play07:48

so to iterate over the keys of our table

play07:51

this is what we can write we can use an

play07:53

enhanced for loop so we are iterating

play07:55

over integers

play07:57

so the data type is integer

play07:59

key

play08:01

colon so to make our hash table iterable

play08:03

we can get all of the keys from our

play08:05

table and put them within a set a set is

play08:08

iterable

play08:09

so we will iterate over table dot key

play08:13

set method

play08:14

this will take all of our keys and

play08:16

return a set and a set is something we

play08:18

can iterate over and within our print

play08:20

line statement let's print each key

play08:23

key

play08:24

plus

play08:25

then maybe i'll add a tab to separate

play08:27

these

play08:28

plus

play08:29

table

play08:30

dot get

play08:32

then pass in whatever our key is okay so

play08:35

after running this

play08:37

this will display all of our key value

play08:39

pairs and if you need to remove an entry

play08:42

well there's a remove method table dot

play08:45

remove then pass in a key let's remove

play08:48

gary so remove the entry with this key

play08:51

777

play08:52

and gary is no longer within our table

play08:55

but we'll keep them in i'll turn this

play08:56

line into a comment now just to get a

play08:58

better understanding of where these key

play09:00

value pairs are being placed let's also

play09:03

display each hash code for each of these

play09:05

elements

play09:06

so preceding our key let's display each

play09:08

hash code i'll precede our key with a

play09:11

tab

play09:15

and let's display each key's hash code

play09:18

key dot hashcode method if we're using

play09:22

the hash code of integers this will

play09:24

return the primitive integer value

play09:27

represented by the key that we're

play09:29

passing in if we're using the hashcode

play09:31

method of integers well the hash is

play09:34

going to end up being the same integer

play09:35

so you can see that these numbers are

play09:37

the same to calculate an index we can

play09:39

follow the hash with modulus operator

play09:42

then the size of our table

play09:44

we set this to originally be 10.

play09:48

so we have gary at index seven squidward

play09:51

at index five patrick at three sandy at

play09:53

one spongebob at zero now if our data

play09:56

type was strings we would use a

play09:58

different hashing formula so let's

play10:00

change the data type to string and all

play10:02

of these keys are now strings

play10:07

then let's remove modulus 10

play10:10

and change the data type of our for loop

play10:12

to strings

play10:13

string key

play10:16

okay these are the new hashes for each

play10:18

of our keys

play10:19

this key has this hash number this key

play10:22

has this hash number so on and so forth

play10:25

so different data types will have

play10:26

different hash code formulas now let's

play10:29

calculate the element in which each of

play10:31

these entries is going to be placed by

play10:33

adding modulus the size of our hash

play10:35

table 10.

play10:39

so here are the elements and we actually

play10:40

have two collisions we have a collision

play10:43

with these two keys they're both within

play10:45

bucket five as well as these two entries

play10:48

so both of these will be placed into

play10:49

bucket zero since there's more than one

play10:52

entry within the same element we will

play10:54

treat this bucket as a linked list and

play10:56

just iterate over it linearly until we

play10:58

reach the key that we're looking for

play11:00

now one way in which we can avoid

play11:01

collisions is to increase the size of

play11:04

our hash table if we set this to the

play11:06

default of 11 and change this to modulus

play11:09

11 well then these will be placed within

play11:11

different buckets and you can see that

play11:13

they changed however we still have a

play11:15

collision with spongebob and squidward

play11:18

so what if we increase this to 21.

play11:20

do we have any collisions then

play11:23

nope we do not these keys are within

play11:25

their own buckets

play11:27

all right everybody so in conclusion a

play11:29

hash table is a data structure that

play11:32

stores unique keys to values when you

play11:35

declare a hash table you state the data

play11:37

types of what you're storing and these

play11:40

are reference data types each key value

play11:42

pair is known as an entry and a benefit

play11:45

of hash tables is that they have a fast

play11:48

insertion lookup and deletion of key

play11:51

value pairs but they're not ideal for

play11:53

small data sets since there's a lot of

play11:55

overhead but they're great with large

play11:57

data sets hashing in the context of a

play12:00

hash table takes a key and computes an

play12:03

integer it utilizes a formula which will

play12:06

vary based on the key as input and its

play12:08

data type then to calculate an index we

play12:11

follow this formula we take a hash

play12:13

modulus the capacity of our table to

play12:17

calculate an index each index is also

play12:20

known as a bucket it's an indexed

play12:22

storage location for one or more entries

play12:25

they can store more than one entry in

play12:27

the case of a collision and in case that

play12:29

happens we treat each bucket as a linked

play12:31

list each entry knows where the next

play12:34

entry is located and as we discussed a

play12:36

collision is when a hash function

play12:38

generates the same index for more than

play12:40

one key less collisions equals more

play12:43

efficiency and the runtime complexity of

play12:45

a hash table varies if there are no

play12:48

collisions the best case would be a

play12:50

runtime complexity of big o of one it

play12:53

runs in constant time in case there are

play12:55

exclusively collisions as in we place

play12:58

all of our entries within the same

play13:00

bucket it's going to be one giant linked

play13:02

list and the runtime complexity of a

play13:04

linked list is big o of n it runs in

play13:07

linear time on average the runtime

play13:09

complexity of a hash table will be

play13:12

somewhere within this range so yeah

play13:14

everybody those are hash tables if you

play13:16

would like a copy of all my notes here

play13:18

i'll post them in the comments section

play13:19

down below and well yeah those are hash

play13:22

tables in computer science

Rate This

5.0 / 5 (0 votes)

相关标签
Hash TablesJava ProgrammingData StructuresKey-Value PairsCollision HandlingComputer ScienceCoding TutorialData StorageAlgorithmsEducational
您是否需要英文摘要?