AQA A’Level Hash tables - Part 1

Craig'n'Dave
3 Feb 201809:20

Summary

TLDRThis video script introduces hash tables as a data structure that combines the benefits of arrays and linked lists for efficient data retrieval. It explains the concept of hashing, hash functions, and how they map keys to fixed indices in a hash table. The script also addresses collisions and their resolution through rehashing and linked lists, preventing clustering and ensuring fast access to data.

Takeaways

  • 📚 **Hash Tables Overview**: The script introduces hash tables as a data structure that combines the best features of arrays and linked lists for efficient data retrieval.
  • 🔑 **Key Concept**: A hash table uses a hash function to map a key to a fixed index in an array, allowing for quick data access.
  • 🔄 **Collision Handling**: Collisions occur when different keys map to the same index; they are handled by techniques like rehashing.
  • 📈 **Advantages of Arrays**: Arrays provide quick and easy random access to elements due to their fixed nature and direct indexing.
  • 🔗 **Dynamic Nature of Linked Lists**: Linked lists can grow dynamically as they store data with pointers to the next element, but lack random access efficiency.
  • 🚫 **Limitations of Arrays**: Arrays are static data structures with a fixed size, making them unsuitable for situations where data size is unpredictable.
  • 🔍 **Inefficiency in Linked Lists**: Searching for an item in a linked list requires traversing from the start, which can be slow for large lists.
  • 🔐 **Hash Function Role**: The hash function is crucial as it consistently generates the same hash value for a given key, facilitating data retrieval.
  • 🔄 **Simple Hash Function Example**: The script demonstrates a simple hash function using modulo operation to convert keys into hash values.
  • 🔄 **Collision Resolution**: When a collision occurs, the next available space in the hash table is used, which can lead to clustering if not managed properly.
  • 🔗 **Use of Linked Lists in Hash Tables**: Linked lists are used in hash tables to resolve collisions, allowing for efficient storage even when multiple keys hash to the same index.

Q & A

  • What are the main topics covered in the upcoming videos on data structures?

    -The upcoming videos will cover hash tables, including their concept, uses, simple hashing algorithms, collision handling through rehashing, and a comparison with arrays and linked lists.

  • What is the primary advantage of using arrays as a data structure?

    -Arrays allow for quick and easy random access to all elements due to each element being associated with its own index.

  • What is a limitation of the array data structure mentioned in the script?

    -A limitation of arrays is that they are fixed in size, making them static data structures that cannot easily grow.

  • How does a linked list differ from an array in terms of memory storage?

    -In a linked list, each element is not stored continuously in memory; instead, each element contains data and a pointer to the next element's memory address.

  • What is the trade-off when using a linked list compared to an array?

    -While linked lists are dynamic and can grow during program use, they do not allow for random access to elements and require sequential searching, which can be inefficient for large lists.

  • How does a hash table combine the best features of arrays and linked lists?

    -A hash table combines the fast access of arrays with the dynamic nature of linked lists, making it ideal for scenarios where quick lookup, deletion, or insertion is a priority.

  • What is a hash function and how does it relate to a hash table?

    -A hash function is a piece of code that takes a key and produces a hash value, which maps the key to a fixed index in the hash table. It's used to determine the storage location for data associated with a given key.

  • What is a collision in the context of hash tables?

    -A collision occurs when two keys produce the same hash value, leading to a conflict over the same index in the hash table.

  • How is the collision handled in the example provided in the script?

    -In the example, when a collision occurs, the next available space in the hash table is checked, and if that is also occupied, the data is stored in a linked list at the overflow for that index.

  • What is clustering, and how can it be a problem in hash tables?

    -Clustering is the problem that arises when multiple items are stored in the same overflow list, increasing the likelihood of further collisions and slowing down the search process.

  • What is the solution proposed in the script to avoid clustering?

    -The script suggests using linked lists to handle overflow, which helps maintain fast access times and avoids clustering by distributing colliding items across different overflow lists.

Outlines

00:00

📚 Introduction to Data Structures: Hash Tables

The video script introduces the concept of hash tables, emphasizing their importance in data structure studies. It mentions a comparison with arrays and linked lists to highlight the unique advantages of hash tables. Arrays are praised for their ability to provide quick and easy random access to elements due to their fixed nature and continuous memory allocation. However, their limitation is the inability to grow dynamically. Linked lists, on the other hand, are dynamic structures that can grow during program execution but lack the ability for random access, requiring a sequential search which can be inefficient for large lists. The script then sets the stage for a detailed exploration of hash tables, which are described as combining the best features of both arrays and linked lists, prioritizing speed in lookup, deletion, and insertion of items. A hash table is depicted as an array paired with a hash function that maps data (keys) to a fixed index, ensuring quick data retrieval.

05:03

🔑 Understanding Hash Functions and Collisions

This section delves into the mechanics of hash functions and how they translate data into hash values that correspond to indices in a hash table. It uses a simplistic hash function to demonstrate the process of mapping keys to table indices. The script also addresses the issue of collisions, which occur when different keys generate the same hash value, leading to the same index. It explains how collisions are handled by rehashing, using the next available space in the hash table. The concept of clustering, where frequent collisions in the same index can slow down data retrieval, is introduced. The paragraph concludes by discussing the use of linked lists to manage collisions efficiently, allowing for fast access while avoiding clustering issues. The video script promises to cover basic pseudocode for hash table operations in the subsequent video, including adding, searching, and removing data.

Mindmap

Keywords

💡Hash Tables

Hash tables are a data structure that implements an associative array abstract data type, a structure that can map keys to values. In the video, hash tables are introduced as a way to combine the best features of arrays and linked lists, offering fast access times for lookups, insertions, and deletions. The concept is central to the video's theme as it sets the stage for discussing how data can be efficiently managed and accessed.

💡Hashing Algorithms

Hashing algorithms are functions that take an input (or 'key') and return a fixed-size string of bytes. In the context of the video, simple hashing algorithms are mentioned as a method to apply to keys to determine their position in the hash table. The script uses a basic example where the hash function is the key value modulo 50, illustrating how keys are mapped to indices in the hash table.

💡Collision

A collision in hash tables occurs when two keys hash to the same index in the table. The video explains collisions as a common issue that arises when using hash tables, particularly with simplistic hash functions. An example given is when 'Dave' and 'Amy' both hash to index 1, causing a collision.

💡Rehashing

Rehashing is the process of resolving collisions in a hash table. The video mentions rehashing as a concept used to handle collisions. When a collision occurs, the script suggests checking the next available space in the hash table, which is a form of rehashing to find an alternative location for the colliding key.

💡Arrays

Arrays are a data structure that stores elements of the same type in contiguous memory locations, each indexed by integers. The video discusses arrays as a precursor to hash tables, highlighting their advantage of providing quick and easy random access to elements, which is beneficial for binary trees that require such access.

💡Linked Lists

Linked lists are a dynamic data structure consisting of a sequence of nodes, where each node contains data and a reference to the next node in the sequence. The video contrasts linked lists with arrays, noting that while they can grow dynamically, they do not allow for random access to elements, which can lead to slow search times.

💡Random Access

Random access in the context of data structures refers to the ability to access any element directly in constant time. The video emphasizes the random access feature of arrays, which allows for efficient retrieval of elements using their indices, as opposed to linked lists which require sequential access.

💡Static Data Structures

Static data structures, like arrays, have a fixed size and cannot grow beyond their initial allocation. The video points out that arrays are static, meaning they cannot easily expand once declared, which is a limitation when more data needs to be stored.

💡Dynamic Data Structures

Dynamic data structures, such as linked lists, can grow or shrink in size during the runtime of a program. The video contrasts dynamic structures with static ones, highlighting how linked lists can adapt to changing data requirements, but at the cost of not supporting random access.

💡Hash Function

A hash function is a critical component of a hash table that takes an input (or 'key') and computes a value (or 'hash value') which determines the index at which the data will be stored in the hash table. The video introduces the concept of a hash function and demonstrates how it is used to map keys to indices, using a simple hash function for illustrative purposes.

💡ASCII Character Codes

ASCII character codes are numerical representations of characters in the ASCII standard. The video script uses ASCII codes as an example to demonstrate how string data values can be converted into numerical values for the purpose of hashing, showing a practical application of character encoding in the context of hash table operations.

Highlights

Introduction to hash tables and their uses in data structures.

Explanation of simple hashing algorithms.

Definition and handling of collisions in hash tables.

Concept of rehashing to manage collisions.

Advantages and limitations of arrays as data structures.

Advantages and limitations of linked lists.

Arrays allow quick random access but are static.

Linked lists are dynamic but have slow search times.

Hash tables combine the best features of arrays and linked lists.

Hash tables are used for quick lookup, deletion, or insertion.

Hash table is an array coupled with a hash function.

Hash function maps a key to a fixed index in the hash table.

Example of a simple hash function using mod 50.

How hash functions convert string data into ASCII codes.

Collision occurs when two keys map to the same index.

Solution to collisions by checking the next available space.

Problem of clustering when using next available space.

Use of linked lists to avoid clustering in hash tables.

Benefits of fast random access and avoiding clustering.

Overview of the process for adding, searching, and removing data from a hash table.

Transcripts

play00:05

over the next two videos we'll be taking

play00:08

a look at the data structure hash tables

play00:09

you'll become familiar with the concept

play00:11

of a hash table and its uses we have to

play00:13

apply simple hashing algorithms and know

play00:16

what is meant by collision and how

play00:18

collisions are handled by using a

play00:19

concept known as rehashing before we

play00:24

dive into looking at hash tables we're

play00:26

first going to look at the advantages

play00:27

and limitations of a couple of other

play00:30

data structures these were arrays and

play00:32

linked lists now array already be

play00:35

familiar with and as you know we use

play00:39

them to store a single data type

play00:42

continuously in memory this is because

play00:46

every element of an array is associated

play00:49

with its own index this is a very

play00:53

powerful feature of an array and it

play00:55

means we have quick and easy random

play00:58

access to all the elements of an array

play01:00

with a single command here you can see

play01:05

that we directly access a single element

play01:07

of the array in this case element four

play01:10

with just a single line of code this is

play01:13

important you've already seen the binary

play01:16

tree store their information in array

play01:19

structures and require random access

play01:22

directly into any element of an array a

play01:25

limitation though of the array data

play01:28

structure is that it's fixed inside this

play01:32

is because the array stores items

play01:34

continuously in memory so when you

play01:37

declare it you have to fix it sighs your

play01:40

on effect telling the operating system

play01:42

when you declare the array to set aside

play01:44

as much memory for use as you think

play01:47

you're going to need there is no way to

play01:49

guarantee that more memory adjacent to

play01:52

your array will be available for use

play01:54

later on this is why arrays can't easily

play01:57

grow and they're known as static data

play01:59

structures

play02:01

now this next data structure is not

play02:04

technically in the specification for a

play02:06

QA level this called a linked list but

play02:09

it's very useful data structure to know

play02:10

about especially when trying to

play02:12

understand hash tables a bit later a

play02:14

linked list is a dynamic data structure

play02:17

and as

play02:18

such it could grow while a programs

play02:20

being used this is because each item or

play02:23

element in a linked list is not stored

play02:26

continuously in memory each element in a

play02:30

linked list contains the data we want

play02:34

plus a pointer effectively a memory

play02:37

address to the location of the next item

play02:41

in the linked list in memory of course

play02:44

there's price for this we now have a

play02:46

dynamic data structure but we can't now

play02:48

access is elements randomly if we want

play02:52

to find an item in a linked list we have

play02:55

to start at the beginning and we have to

play02:57

follow the linked list checking each

play02:59

element as we go until we find the item

play03:02

we want this could make for a very

play03:04

inefficient search especially if the

play03:06

list is large and the item we're looking

play03:08

for stored to the end so just a recap

play03:11

quickly arrays allow random access to

play03:14

any element but as static whereas lists

play03:18

links are dynamic but clambake can be

play03:20

slow to search hash tables combine the

play03:24

best features of both data types and

play03:27

they're most commonly used when speed of

play03:30

lookup or deletion or insertion of items

play03:33

is a priority so what exactly is a hash

play03:37

table in effect it is simply an array

play03:41

which we call a hash table which is

play03:44

coupled together with a hash function

play03:48

the hash function is a piece of code

play03:51

which takes in some kind of data which

play03:55

we call a key and then Chuck's out a

play03:58

value which we call a hash value the

play04:03

hash value Maps our initial key to a

play04:08

fixed index in our hash table in the

play04:12

first instance you use the hash function

play04:14

to work out where in the hash table to

play04:17

store the data for a given key the hash

play04:21

function will always chuck out the same

play04:23

value for any given key so later you can

play04:26

use it to determine where in the hash

play04:28

table a given item will be found

play04:31

based on its key here we're using an

play04:36

incredibly simple hash function for

play04:38

demonstration purposes so the address is

play04:43

going to equal value and then mod 50 in

play04:46

other words it's going to take in a

play04:48

numerical value or a key it's going to

play04:53

perform mod 50 on it and then it's going

play04:57

to spit out our key our value in this

play05:02

example we have simply converted our

play05:05

string data values into their

play05:08

corresponding ASCII character codes and

play05:11

code those values or hard-coded those

play05:13

values here for simplicity of course

play05:15

this would actually happen

play05:16

programmatically so the numbers of our

play05:21

keys here are passed one at a time into

play05:26

a hash function Dave when thrown in

play05:31

generates the hash value one Craig

play05:36

generates the hash value three Sam nine

play05:40

kerala eight and finally mark seven this

play05:44

allows us to map these initial keys to

play05:48

these locations in our hash table should

play05:50

we ever wish to retrieve Dave from the

play05:53

hash table we simply feed the key Dave

play05:55

into the hash function and it will

play05:58

return the position of where to find

play06:00

Dave position 1 so far so good so let's

play06:07

try adding another item to our hash

play06:09

table this time Amy the key amy gets fed

play06:15

into our hash function and it spits out

play06:19

the value 1 again so what happens now

play06:24

well if we go to index 1 in our hash

play06:28

table we find that dave is already here

play06:32

this is known as a collision one

play06:38

solution is to check the next available

play06:40

space index 2 we discover it's empty so

play06:45

we could store Amy here instead simple

play06:50

enough

play06:51

okay so let's now try adding Jessie

play06:55

well once again Jessie generates a hash

play07:01

value of 1 and of course dave is already

play07:05

in this position if we check the next

play07:08

one

play07:09

Amy's there Kraig them full we could now

play07:12

put Jess in location for now in fact any

play07:19

name which ends in letter e will

play07:22

generate a hash value of 1 and this of

play07:23

course is because we're using an

play07:24

incredibly simplistic and inefficient

play07:26

hash function a good hash function

play07:28

should be designed so it tries to avoid

play07:31

creating duplicate hash values however

play07:34

they are unavoidable and here we have a

play07:37

problem when a collision occurs we're

play07:41

using the next available space and a

play07:43

hash table however every time we do this

play07:45

we increase the likelihood of further

play07:48

collisions and this problem is known as

play07:51

clustering if you've been paying

play07:56

attention to the start video you

play07:58

probably won't even know solution and

play08:00

one of those is to use linked lists so

play08:03

if we go back to our Amy example we feed

play08:06

the key into the hash function and the

play08:08

hash value 1 is generated we discover

play08:11

Dave is already in index 1 so now we go

play08:14

to our overflow for index 1 and restore

play08:17

Amy here jess would end up in the linked

play08:21

list

play08:21

directly after amy and so on we've

play08:24

gained all the benefits of fast random

play08:27

access to an array type data structure

play08:29

while avoiding clustering if the hat and

play08:32

hash function is good we're not going to

play08:34

use this over the loaf a much so it's

play08:36

going to be short

play08:37

we've massively sped up finding data we

play08:41

simply feed a key into the hash function

play08:44

go to the hash table in the location

play08:46

given and if the item we want is not

play08:48

there we search the overflow

play08:51

sequentially okay that's a pretty

play08:53

comprehensive overview of hash tables in

play08:55

the next video we're going to look at

play08:57

some basic pseudocode

play08:58

for adding searching and removing data

play09:00

from a hash table

play09:09

you

Rate This

5.0 / 5 (0 votes)

Étiquettes Connexes
Data StructuresHash TablesHashingCollisionRehashingArraysLinked ListsRandom AccessDynamic DataEfficiencyAlgorithms
Besoin d'un résumé en anglais ?