Learn Hash Tables in 13 minutes #️⃣
Summary
TLDRCe script vidéo explique les hash tables comme une collection de paires clé-valeur. Il montre comment les clés sont transformées en codes de hachage grâce à une formule, et comment ces codes sont utilisés pour stocker les données dans un tableau de hachage de taille limitée. Le script aborde également les collisions et leur résolution par chaînage, ainsi que l'importance de la taille du tableau de hachage pour la performance. Enfin, il illustre la mise en œuvre d'un tableau de hachage en Java avec des exemples concrets.
Takeaways
- 🔑 Un tableau de hachage (hash table) est une collection de paires clé-valeur.
- 👨🏫 Dans l'exemple donné, les clés sont des identifiants uniques des étudiants et les valeurs sont leurs noms.
- 🔢 La méthode `hashCode` prend une clé en entrée et renvoie un entier appelé hach (hash).
- 📏 Pour un entier, le hach est simplement le nombre lui-même.
- 🗄️ Le hach est ensuite utilisé pour déterminer l'index où placer la paire clé-valeur dans le tableau de hachage.
- 🔄 En cas de collision (deux clés ayant le même hach), on utilise une liste chaînée pour gérer les entrées dans le même compartiment (bucket).
- 🔄 La résolution la plus courante d'une collision est de transformer chaque compartiment en liste chaînée.
- 📚 Le script illustre comment gérer les collisions avec des exemples concrets.
- 💾 Le tableau de hachage peut être dynamiquement agrandi pour accueillir plus d'éléments.
- 🔍 La recherche dans un tableau de hachage est rapide, mais les collisions peuvent ralentir le processus.
- 📉 L'efficacité d'un tableau de hachage dépend de la taille du tableau et du nombre de collisions.
Q & A
Qu'est-ce qu'un tableau de hachage (hash table) ?
-Un tableau de hachage est une collection d'appaires clé-valeur. Chaque appairage est appelé une entrée, avec une clé et une valeur associée.
Comment se fait-il le hachage d'un entier en Java ?
-En Java, le hachage d'un entier est très simple, car la formule est le nombre lui-même. Par exemple, le hach de 100 est 100 et le hach de 123 est 123.
Que signifie 'collision' dans un tableau de hachage ?
-Une 'collision' se produit lorsqu'un hachage calculé donne la même valeur pour deux clés différentes.
Comment est résolue une collision dans un tableau de hachage ?
-La résolution la plus courante d'une collision est de transformer chaque compartiment, ou 'bucket', en une liste chaînée. Si un compartiment contient déjà une entrée, on y ajoute l'adresse de la prochaine entrée.
Quel est le but de la méthode 'put' dans un tableau de hachage ?
-La méthode 'put' sert à ajouter une nouvelle entrée, ou à mettre à jour une entrée existante, dans le tableau de hachage en utilisant une clé spécifique.
Comment se fait-il le hachage d'une chaîne de caractères ?
-Le hachage d'une chaîne de caractères utilise une formule différente qui prend la valeur ASCII de chaque caractère dans la chaîne et l'intègre dans la formule.
Quelle est la méthode utilisée pour accéder aux valeurs dans un tableau de hachage ?
-La méthode 'get' est utilisée pour accéder aux valeurs dans un tableau de hachage en passant la clé correspondante.
Comment est-il possible de supprimer une entrée dans un tableau de hachage ?
-La méthode 'remove' permet de supprimer une entrée en utilisant la clé associée à cette entrée.
Quelle est la complexité temporelle d'un tableau de hachage en cas d'absence de collision ?
-En l'absence de collision, la meilleure complexité temporelle d'un tableau de hachage est O(1), ce qui signifie qu'il fonctionne en temps constant.
Comment peut-on réduire le nombre de collisions dans un tableau de hachage ?
-On peut réduire le nombre de collisions en augmentant la taille du tableau de hachage, ce qui permet de répartir les entrées de manière plus efficace.
Outlines
📚 Introduction aux tableaux de hachage
Le premier paragraphe introduit les tableaux de hachage comme une collection de paires clé-valeur, où chaque clé est unique et associée à une valeur. On explique comment les enseignants peuvent utiliser ces tableaux pour associer des identifiants d'étudiants à leurs noms. Le processus de hachage est décrit, où chaque clé est transformée en un entier appelé 'hach' grâce à une formule. Dans le cas des entiers, le hach est simplement l'entier lui-même. Ensuite, on explique comment les hachs sont répartis dans le tableau de hachage en utilisant le modulo de la taille du tableau pour obtenir l'index. On mentionne également les collisions qui se produisent lorsqu'au moins deux clés donnent le même hach et on explique comment gérer ces collisions en utilisant des listes chaînées.
💻 Implémentation d'un tableau de hachage en Java
Le deuxième paragraphe décrit comment implémenter un tableau de hachage en Java. On commence par déclarer le tableau de hachage et on définit les types de données des paires clé-valeur. On utilise les classes wrapper pour les types primitifs, comme Integer et String. On crée un tableau de hachage avec une capacité initiale et un facteur de charge. On utilise la méthode 'put' pour ajouter des éléments au tableau de hachage et la méthode 'get' pour y accéder. On explique également comment itérer sur les clés d'un tableau de hachage à l'aide d'une boucle for et comment utiliser la méthode 'remove' pour en supprimer. On illustre comment afficher les codes de hachage des clés et comment calculer les index correspondants dans le tableau de hachage.
🔍 Gestion des collisions et optimisation des tableaux de hachage
Le troisième paragraphe traite de la gestion des collisions et des stratégies pour optimiser les tableaux de hachage. On explique que les collisions peuvent être réduites en augmentant la taille du tableau de hachage, ce qui consomme plus de mémoire. On montre comment le hachage fonctionne pour les chaînes de caractères en utilisant la valeur ASCII de chaque caractère. On calcule les nouveaux hachs pour chaque clé et on détermine l'index où chaque entrée sera placée en utilisant le modulo de la taille du tableau de hachage. On illustre comment les collisions peuvent être évitées en augmentant la taille du tableau et on explique comment le temps d'exécution d'un tableau de hachage varie en fonction du nombre de collisions. On conclut en soulignant que les tableaux de hachage sont un excellent moyen de stocker des paires clé-valeur avec une insertion, une recherche et une suppression rapides, surtout pour les grands ensembles de données.
Mindmap
Keywords
💡Table de hachage
💡Paire clé-valeur
💡Hachage
💡Collision
💡Chaînage
💡Modulo
💡Chargement
💡Complexité en temps d'exécution
💡Balise
💡Formule d'hachage
Highlights
Hash table is a collection of key-value pairs.
Each key-value pair is known as an entry.
Keys can be of any data type.
Hashcode method converts a key into an integer called a hash.
For integers, the hash is the number itself.
Hashes are reduced to fit the hash table's size using modulus.
The remainder from modulus operation is used as an index.
Collision occurs when two keys have the same hash.
Collisions are resolved by turning each bucket into a linked list.
Hash table lookup involves finding the index and then searching the bucket.
Efficiency of hash table is inversely proportional to the number of collisions.
The size of the hash table can be increased to reduce collisions.
Hash table has fast insertion, lookup, and deletion.
Hash table is not ideal for small data sets due to overhead.
Hash table runtime complexity varies from O(1) to O(n).
Hash table's efficiency is maximized with a balance between size and collisions.
Transcripts
all right what's going on everybody hash
tables a hash table is a collection of
key value pairs each key value pair is
known as an entry we have two pieces of
data the first is the key and the second
is the value in this example let's
pretend that we're a teacher and we need
to create a hash table of all of our
students each student has a name and a
unique student id number but these can
be of any data type that you would like
in this example the key will be an
integer and the value will be a string
so how do we know at which index to
place each of these entries well what we
could do is take each key and insert it
into a hashcode method the hashcode
method will take a key as input plug it
into a formula and spit out an integer
this integer is known as a hash now if
we're finding the hashcode of an integer
in java that's actually really easy the
formula is the number itself so the hash
of 100 is 100 123 is 123. so on and so
forth with the other keys so after
finding the hash of your key now what
can we do these numbers are way too
large and the size of our hash table is
only 10 elements what we'll do next is
take each of these hashes and divide
each of them by the capacity the size of
our hash table we have 10 elements so
take each hash divided by the capacity
of our hash table whatever the remainder
is we will use the remainder as an index
and to find that we will use the modulus
operator so 100 divides by 10 evenly the
remainder is zero so 100 modulus 10
gives us a remainder of zero so
spongebob's entry we will place at index
0 within our hash table so the hash of
patrick's key is 123. 123 modulus 10
gives us a remainder of 3. patrick's
entry will be inserted at index 3 of our
hash table so here's a little shortcut
if you have a number modulus 10 you
could find the remainder and that is the
last digit 321 modulus 10 will give us a
remainder of 1. sandy's entry will be
placed at index 1. then squidward's
entry will be placed at index 5.
555 modulus 10 is 5 and gary's will be
at index 7 following the same pattern
now here's one situation what if two
hashes are calculated to be the same
value that is known as a collision and i
can best demonstrate that with a
separate example
in this next example let's say that each
key is now a string each entry is a pair
of strings we will first need to find
the hash of each of these keys
so the hash code of a string uses a
different formula basically speaking
we're going to take the ascii value of
each character within the string and
plug it into this formula i went ahead
and calculated the hash of each of these
strings using this formula of the string
hashcode method and the next steps are
the same as before take each hash
divided by the capacity of our hash
table and find the remainder so
beginning with the first hash
48625 modulus 10 gives us a remainder of
5 spongebob's entry is now at index 5
within our hash table
now patrick's will be at index zero now
here's sandy's sandy's will also be zero
we have a collision both of these
entries will be located at the same
index so what do we do each of these
storage locations is also known as a
bucket and the most common resolution
for a collision in a hash table is that
we will turn each bucket into a linked
list if this bucket already has an entry
within the first entry we will also add
an address to the location of the next
entry and keep on adding more if there's
more entries within this bucket so in
this way this becomes a linked list if
we're looking up a key we first go to
the index in which it's located if
there's more than one entry we will
search linearly through this bucket as
if it were a linked list until we find
the key that we're looking for so that's
the most common resolution when there is
a collision but ideally you would want
each of these entries to be within their
own bucket based on the hash of
squidward's key squidward's entry has an
index of nine and gary gary has an index
of five and there's another collision we
will add the address of gary's location
to the first entry and this bucket
becomes a separate linked list this
process is known as chaining the less
collisions that there are the more
efficient that this hash table is going
to look up a value ideally you would
want each entry to have their own bucket
but collisions are possible to reduce
the number of collisions you can
increase the size of the hash table but
then again the hash table is going to
use up more memory then so people
usually find a balance between the two
so yeah those are hash tables in theory
let's create our own now all right
everybody so let's implement a hash
table in java so we will need to declare
this hash table and list the data types
of our key value pairs if we need to
store primitive data types we can use
the appropriate wrapper class so let's
store integers and strings
integer and string the integers will be
the key is the strings will be the
values we'll map student id numbers and
student names and i'll name this hash
table just simply table
equals new
hash table
there we go so in java when we create a
hash table these have an initial
capacity of 11 elements and a load
factor of 0.75 so once 75 of our
elements are filled this hash table will
dynamically expand to accommodate more
elements now you can set a different
capacity for your hash table instead of
11 let's say 10 to be consistent with
our example in the previous part of this
lesson and if you would like to change
the load factor you can add that as well
instead of 75 let's say 50
so we would pass in a floating point
number
so 0.5 then add an f at the end for
floating point numbers but in this
example let's just keep the load factor
consistent so let's start adding some
key value pairs to add an element to
your hash table use the put method so
table dot put and then we will pass in
an integer as the key and a string as
the value so our first student is
spongebob he has a student id of let's
say 100
and let's pass in a string for the value
spongebob
okay that is our first student so let's
add a couple more from the previous
example
so we have spongebob
patrick with an id of one two three
sandy with an id of three two one
squidward with an id of five five five
and gary with an id of 777
now to access one of these values you
can use the get method of tables
so i'll display this within a print line
statement
table dot get and i will pass in a key
let's get the value at key number 100 so
this student is spongebob how can we
display all of the key value pairs of a
table well we could use a for loop
so i'm going to create a for loop
and place this within it
so to iterate over the keys of our table
this is what we can write we can use an
enhanced for loop so we are iterating
over integers
so the data type is integer
key
colon so to make our hash table iterable
we can get all of the keys from our
table and put them within a set a set is
iterable
so we will iterate over table dot key
set method
this will take all of our keys and
return a set and a set is something we
can iterate over and within our print
line statement let's print each key
key
plus
then maybe i'll add a tab to separate
these
plus
table
dot get
then pass in whatever our key is okay so
after running this
this will display all of our key value
pairs and if you need to remove an entry
well there's a remove method table dot
remove then pass in a key let's remove
gary so remove the entry with this key
777
and gary is no longer within our table
but we'll keep them in i'll turn this
line into a comment now just to get a
better understanding of where these key
value pairs are being placed let's also
display each hash code for each of these
elements
so preceding our key let's display each
hash code i'll precede our key with a
tab
and let's display each key's hash code
key dot hashcode method if we're using
the hash code of integers this will
return the primitive integer value
represented by the key that we're
passing in if we're using the hashcode
method of integers well the hash is
going to end up being the same integer
so you can see that these numbers are
the same to calculate an index we can
follow the hash with modulus operator
then the size of our table
we set this to originally be 10.
so we have gary at index seven squidward
at index five patrick at three sandy at
one spongebob at zero now if our data
type was strings we would use a
different hashing formula so let's
change the data type to string and all
of these keys are now strings
then let's remove modulus 10
and change the data type of our for loop
to strings
string key
okay these are the new hashes for each
of our keys
this key has this hash number this key
has this hash number so on and so forth
so different data types will have
different hash code formulas now let's
calculate the element in which each of
these entries is going to be placed by
adding modulus the size of our hash
table 10.
so here are the elements and we actually
have two collisions we have a collision
with these two keys they're both within
bucket five as well as these two entries
so both of these will be placed into
bucket zero since there's more than one
entry within the same element we will
treat this bucket as a linked list and
just iterate over it linearly until we
reach the key that we're looking for
now one way in which we can avoid
collisions is to increase the size of
our hash table if we set this to the
default of 11 and change this to modulus
11 well then these will be placed within
different buckets and you can see that
they changed however we still have a
collision with spongebob and squidward
so what if we increase this to 21.
do we have any collisions then
nope we do not these keys are within
their own buckets
all right everybody so in conclusion a
hash table is a data structure that
stores unique keys to values when you
declare a hash table you state the data
types of what you're storing and these
are reference data types each key value
pair is known as an entry and a benefit
of hash tables is that they have a fast
insertion lookup and deletion of key
value pairs but they're not ideal for
small data sets since there's a lot of
overhead but they're great with large
data sets hashing in the context of a
hash table takes a key and computes an
integer it utilizes a formula which will
vary based on the key as input and its
data type then to calculate an index we
follow this formula we take a hash
modulus the capacity of our table to
calculate an index each index is also
known as a bucket it's an indexed
storage location for one or more entries
they can store more than one entry in
the case of a collision and in case that
happens we treat each bucket as a linked
list each entry knows where the next
entry is located and as we discussed a
collision is when a hash function
generates the same index for more than
one key less collisions equals more
efficiency and the runtime complexity of
a hash table varies if there are no
collisions the best case would be a
runtime complexity of big o of one it
runs in constant time in case there are
exclusively collisions as in we place
all of our entries within the same
bucket it's going to be one giant linked
list and the runtime complexity of a
linked list is big o of n it runs in
linear time on average the runtime
complexity of a hash table will be
somewhere within this range so yeah
everybody those are hash tables if you
would like a copy of all my notes here
i'll post them in the comments section
down below and well yeah those are hash
tables in computer science
Voir Plus de Vidéos Connexes
Complément : Fonctions de hachage
Excel #31: Tableau de bord pour visualiser les indicateurs de performance du service commercial.
APPRENDRE LE PYTHON #2 ? LES VARIABLES
Représenter un partage à l'aide d'une fraction - Sixième
Comment sommes nous connectés ? | Feat. E-penser, Manon Bril & bien d'autres | EPISODE #9
Utiliser la notation des puissances - Quatrième
5.0 / 5 (0 votes)