Learn Hash Tables in 13 minutes #️⃣
Summary
TLDRThis script explains hash tables as a data structure for storing key-value pairs efficiently. It covers how keys are hashed to find their index in the table, using modulus to handle large numbers. Collisions, where different keys hash to the same index, are resolved using linked lists. The script also discusses the impact of table size on collisions and efficiency, with a focus on Java implementation and practical examples.
Takeaways
- 🔑 **Key-Value Pairs**: A hash table is a collection of key-value pairs, where each pair is called an entry.
- 📚 **Data Types**: The key and value can be of any data type, such as integers and strings.
- 🔢 **Hash Function**: The hash function takes a key and converts it into an integer called a hash.
- 🔄 **Collision Handling**: If two keys have the same hash, it results in a collision, which is resolved by chaining.
- 🔄 **Chaining**: Collisions are handled by turning each bucket into a linked list if more than one entry is at the same index.
- 📏 **Index Calculation**: The index for placing an entry is calculated by taking the hash modulo the capacity of the hash table.
- 🔍 **Lookup Efficiency**: Hash tables are efficient for lookups, insertions, and deletions, especially with a low number of collisions.
- 📈 **Dynamic Resizing**: Hash tables can dynamically resize to accommodate more elements, maintaining efficiency.
- 💾 **Memory Usage**: Increasing the size of the hash table reduces collisions but increases memory usage.
- 📊 **Runtime Complexity**: The runtime complexity of a hash table is between O(1) for best case (no collisions) and O(n) for worst case (all entries in one bucket).
Q & A
What is a hash table?
-A hash table is a data structure that stores unique keys mapped to values, where each key-value pair is known as an entry.
What are the two pieces of data in each entry of a hash table?
-Each entry in a hash table consists of a key and a value, where the key is used to access the corresponding value.
How does a hash table determine the index for each entry?
-A hash table uses a hashcode method to convert the key into an integer called a hash. Then, it calculates the index by taking the hash modulo the capacity of the hash table.
What happens when two keys have the same hash value?
-When two keys have the same hash value, it's called a collision. To resolve this, each bucket in the hash table can be turned into a linked list to store multiple entries.
How can collisions in a hash table be reduced?
-Collisions can be reduced by increasing the size of the hash table, which decreases the likelihood of multiple keys having the same hash value.
What is the most common resolution for collisions in a hash table?
-The most common resolution for collisions is chaining, where each bucket is turned into a linked list to store multiple entries that hash to the same index.
How does the size of the hash table affect its efficiency?
-A larger hash table size reduces the number of collisions, increasing lookup efficiency. However, it also increases memory usage.
What is the runtime complexity of a hash table in the best and worst cases?
-The best-case runtime complexity of a hash table is O(1), when there are no collisions. The worst-case complexity is O(n), when all entries collide and are stored in a single bucket as a linked list.
How does the data type of the key affect the hash code in a hash table?
-Different data types have different hash code formulas. For example, the hash code of an integer key is the number itself, while the hash code of a string key is calculated by summing the ASCII values of its characters.
How can you access and display all key-value pairs in a hash table?
-You can access values using the get method with a key, and display all pairs by iterating over the keys using an enhanced for loop and printing both the key and the value retrieved by the get method.
What is the purpose of the modulus operator in the context of hash tables?
-The modulus operator is used to find the remainder when the hash value is divided by the hash table's capacity, which determines the index or bucket where the entry will be placed.
Outlines
🔑 Understanding Hash Tables
The first paragraph introduces the concept of hash tables, which are collections of key-value pairs, also known as entries. It uses the analogy of a teacher creating a hash table for student names and unique IDs. The key is an integer, and the value is a string. The process of hashing involves converting keys into a hashcode using a formula, which for integers is simply the number itself. The hashcode is then used to determine the index in the hash table by dividing it by the table's capacity and using the remainder as the index. The concept of collisions, where different keys produce the same hash, is introduced, and the solution of using linked lists within each bucket is explained.
💻 Implementing a Hash Table in Java
The second paragraph focuses on implementing a hash table in Java. It discusses the declaration of a hash table with specific data types for keys and values, in this case, integers and strings. The paragraph explains how to add key-value pairs using the 'put' method and access values using the 'get' method. It also demonstrates how to iterate over the keys of a hash table using an enhanced for loop and how to remove entries using the 'remove' method. The importance of displaying hash codes for each element is emphasized to understand how keys are placed within the hash table. The paragraph concludes with a discussion on changing the data type to strings and the need for a different hashing formula for strings.
🔍 Collisions and Hash Table Efficiency
The third paragraph delves into the issue of collisions in hash tables and how they can be mitigated by increasing the size of the hash table. It explains that different data types require different hash code formulas and shows how to calculate the index for each entry by using the modulus operator with the size of the hash table. The paragraph illustrates how increasing the table size can reduce collisions but also increases memory usage. It concludes by discussing the efficiency of hash tables, noting that fewer collisions lead to better performance. The runtime complexity of hash tables is also explained, ranging from O(1) in the best case to O(n) in the worst case, depending on the number of collisions.
Mindmap
Keywords
💡Hash Table
💡Key-Value Pairs
💡Hashcode Method
💡Collision
💡Bucket
💡Modulus Operator
💡Linked List
💡Load Factor
💡Chaining
💡Efficiency
💡Runtime Complexity
Highlights
A hash table is a collection of key-value pairs.
Each key-value pair is known as an entry.
Keys can be of any data type.
Hashcode method converts keys into integers.
In Java, the hash of an integer is the number itself.
Hashes are too large, so they are divided by the hash table's capacity.
The remainder from the hash divided by capacity is used as an index.
The modulus operator is used to find the remainder.
Collisions occur when two hashes have the same value.
A collision resolution technique is turning each bucket into a linked list.
Lookup involves going to the index and searching linearly if necessary.
Hash tables are efficient for large data sets but have overhead for small ones.
The hash function's formula varies based on the key's data type.
An index in a hash table is also known as a bucket.
Chaining is the process of resolving collisions by linking entries.
To reduce collisions, increase the size of the hash table.
The runtime complexity of a hash table ranges from O(1) to O(n).
Hash tables allow for fast insertion, lookup, and deletion of key-value pairs.
Transcripts
all right what's going on everybody hash
tables a hash table is a collection of
key value pairs each key value pair is
known as an entry we have two pieces of
data the first is the key and the second
is the value in this example let's
pretend that we're a teacher and we need
to create a hash table of all of our
students each student has a name and a
unique student id number but these can
be of any data type that you would like
in this example the key will be an
integer and the value will be a string
so how do we know at which index to
place each of these entries well what we
could do is take each key and insert it
into a hashcode method the hashcode
method will take a key as input plug it
into a formula and spit out an integer
this integer is known as a hash now if
we're finding the hashcode of an integer
in java that's actually really easy the
formula is the number itself so the hash
of 100 is 100 123 is 123. so on and so
forth with the other keys so after
finding the hash of your key now what
can we do these numbers are way too
large and the size of our hash table is
only 10 elements what we'll do next is
take each of these hashes and divide
each of them by the capacity the size of
our hash table we have 10 elements so
take each hash divided by the capacity
of our hash table whatever the remainder
is we will use the remainder as an index
and to find that we will use the modulus
operator so 100 divides by 10 evenly the
remainder is zero so 100 modulus 10
gives us a remainder of zero so
spongebob's entry we will place at index
0 within our hash table so the hash of
patrick's key is 123. 123 modulus 10
gives us a remainder of 3. patrick's
entry will be inserted at index 3 of our
hash table so here's a little shortcut
if you have a number modulus 10 you
could find the remainder and that is the
last digit 321 modulus 10 will give us a
remainder of 1. sandy's entry will be
placed at index 1. then squidward's
entry will be placed at index 5.
555 modulus 10 is 5 and gary's will be
at index 7 following the same pattern
now here's one situation what if two
hashes are calculated to be the same
value that is known as a collision and i
can best demonstrate that with a
separate example
in this next example let's say that each
key is now a string each entry is a pair
of strings we will first need to find
the hash of each of these keys
so the hash code of a string uses a
different formula basically speaking
we're going to take the ascii value of
each character within the string and
plug it into this formula i went ahead
and calculated the hash of each of these
strings using this formula of the string
hashcode method and the next steps are
the same as before take each hash
divided by the capacity of our hash
table and find the remainder so
beginning with the first hash
48625 modulus 10 gives us a remainder of
5 spongebob's entry is now at index 5
within our hash table
now patrick's will be at index zero now
here's sandy's sandy's will also be zero
we have a collision both of these
entries will be located at the same
index so what do we do each of these
storage locations is also known as a
bucket and the most common resolution
for a collision in a hash table is that
we will turn each bucket into a linked
list if this bucket already has an entry
within the first entry we will also add
an address to the location of the next
entry and keep on adding more if there's
more entries within this bucket so in
this way this becomes a linked list if
we're looking up a key we first go to
the index in which it's located if
there's more than one entry we will
search linearly through this bucket as
if it were a linked list until we find
the key that we're looking for so that's
the most common resolution when there is
a collision but ideally you would want
each of these entries to be within their
own bucket based on the hash of
squidward's key squidward's entry has an
index of nine and gary gary has an index
of five and there's another collision we
will add the address of gary's location
to the first entry and this bucket
becomes a separate linked list this
process is known as chaining the less
collisions that there are the more
efficient that this hash table is going
to look up a value ideally you would
want each entry to have their own bucket
but collisions are possible to reduce
the number of collisions you can
increase the size of the hash table but
then again the hash table is going to
use up more memory then so people
usually find a balance between the two
so yeah those are hash tables in theory
let's create our own now all right
everybody so let's implement a hash
table in java so we will need to declare
this hash table and list the data types
of our key value pairs if we need to
store primitive data types we can use
the appropriate wrapper class so let's
store integers and strings
integer and string the integers will be
the key is the strings will be the
values we'll map student id numbers and
student names and i'll name this hash
table just simply table
equals new
hash table
there we go so in java when we create a
hash table these have an initial
capacity of 11 elements and a load
factor of 0.75 so once 75 of our
elements are filled this hash table will
dynamically expand to accommodate more
elements now you can set a different
capacity for your hash table instead of
11 let's say 10 to be consistent with
our example in the previous part of this
lesson and if you would like to change
the load factor you can add that as well
instead of 75 let's say 50
so we would pass in a floating point
number
so 0.5 then add an f at the end for
floating point numbers but in this
example let's just keep the load factor
consistent so let's start adding some
key value pairs to add an element to
your hash table use the put method so
table dot put and then we will pass in
an integer as the key and a string as
the value so our first student is
spongebob he has a student id of let's
say 100
and let's pass in a string for the value
spongebob
okay that is our first student so let's
add a couple more from the previous
example
so we have spongebob
patrick with an id of one two three
sandy with an id of three two one
squidward with an id of five five five
and gary with an id of 777
now to access one of these values you
can use the get method of tables
so i'll display this within a print line
statement
table dot get and i will pass in a key
let's get the value at key number 100 so
this student is spongebob how can we
display all of the key value pairs of a
table well we could use a for loop
so i'm going to create a for loop
and place this within it
so to iterate over the keys of our table
this is what we can write we can use an
enhanced for loop so we are iterating
over integers
so the data type is integer
key
colon so to make our hash table iterable
we can get all of the keys from our
table and put them within a set a set is
iterable
so we will iterate over table dot key
set method
this will take all of our keys and
return a set and a set is something we
can iterate over and within our print
line statement let's print each key
key
plus
then maybe i'll add a tab to separate
these
plus
table
dot get
then pass in whatever our key is okay so
after running this
this will display all of our key value
pairs and if you need to remove an entry
well there's a remove method table dot
remove then pass in a key let's remove
gary so remove the entry with this key
777
and gary is no longer within our table
but we'll keep them in i'll turn this
line into a comment now just to get a
better understanding of where these key
value pairs are being placed let's also
display each hash code for each of these
elements
so preceding our key let's display each
hash code i'll precede our key with a
tab
and let's display each key's hash code
key dot hashcode method if we're using
the hash code of integers this will
return the primitive integer value
represented by the key that we're
passing in if we're using the hashcode
method of integers well the hash is
going to end up being the same integer
so you can see that these numbers are
the same to calculate an index we can
follow the hash with modulus operator
then the size of our table
we set this to originally be 10.
so we have gary at index seven squidward
at index five patrick at three sandy at
one spongebob at zero now if our data
type was strings we would use a
different hashing formula so let's
change the data type to string and all
of these keys are now strings
then let's remove modulus 10
and change the data type of our for loop
to strings
string key
okay these are the new hashes for each
of our keys
this key has this hash number this key
has this hash number so on and so forth
so different data types will have
different hash code formulas now let's
calculate the element in which each of
these entries is going to be placed by
adding modulus the size of our hash
table 10.
so here are the elements and we actually
have two collisions we have a collision
with these two keys they're both within
bucket five as well as these two entries
so both of these will be placed into
bucket zero since there's more than one
entry within the same element we will
treat this bucket as a linked list and
just iterate over it linearly until we
reach the key that we're looking for
now one way in which we can avoid
collisions is to increase the size of
our hash table if we set this to the
default of 11 and change this to modulus
11 well then these will be placed within
different buckets and you can see that
they changed however we still have a
collision with spongebob and squidward
so what if we increase this to 21.
do we have any collisions then
nope we do not these keys are within
their own buckets
all right everybody so in conclusion a
hash table is a data structure that
stores unique keys to values when you
declare a hash table you state the data
types of what you're storing and these
are reference data types each key value
pair is known as an entry and a benefit
of hash tables is that they have a fast
insertion lookup and deletion of key
value pairs but they're not ideal for
small data sets since there's a lot of
overhead but they're great with large
data sets hashing in the context of a
hash table takes a key and computes an
integer it utilizes a formula which will
vary based on the key as input and its
data type then to calculate an index we
follow this formula we take a hash
modulus the capacity of our table to
calculate an index each index is also
known as a bucket it's an indexed
storage location for one or more entries
they can store more than one entry in
the case of a collision and in case that
happens we treat each bucket as a linked
list each entry knows where the next
entry is located and as we discussed a
collision is when a hash function
generates the same index for more than
one key less collisions equals more
efficiency and the runtime complexity of
a hash table varies if there are no
collisions the best case would be a
runtime complexity of big o of one it
runs in constant time in case there are
exclusively collisions as in we place
all of our entries within the same
bucket it's going to be one giant linked
list and the runtime complexity of a
linked list is big o of n it runs in
linear time on average the runtime
complexity of a hash table will be
somewhere within this range so yeah
everybody those are hash tables if you
would like a copy of all my notes here
i'll post them in the comments section
down below and well yeah those are hash
tables in computer science
5.0 / 5 (0 votes)