ARRAYLIST VS LINKEDLIST
Summary
TLDRThis video script explores the intricacies of arrays in programming, contrasting their behavior across different languages. It explains that systems programming languages require explicit type and size specification for arrays, whereas scripting languages like JavaScript and Python offer more flexibility, with JavaScript arrays being essentially hashmaps. The script delves into memory allocation, indexing, and the trade-offs between dynamic sizing and cache efficiency, highlighting data structures like linked lists and array lists to overcome array limitations.
Takeaways
- π Arrays are fundamental in programming, but their functionality varies by language, with systems programming languages requiring explicit type and size specification.
- π Scripting languages offer more flexibility with arrays, allowing dynamic resizing and the inclusion of diverse data types.
- π οΈ In systems programming, the inability to expand arrays and store different data types is due to the need for contiguous memory allocation and the risk of overwriting other values.
- π Arrays use indexing, starting with zero, to access elements, which is a concept inherited from C and widely used due to its pointer arithmetic simplicity.
- π§ The base address of an array is crucial for indexing, as it helps calculate the memory address of any element by adding the product of the element's size and its index to this base address.
- π« Accessing elements outside the array's bounds can lead to segmentation faults or undefined behavior if the accessed memory is within the process's boundaries.
- π To overcome array limitations, data structures like linked lists and array lists are used, providing dynamic sizing but with trade-offs in caching efficiency and performance.
- π Linked lists allow dynamic sizing but suffer from poor cache performance due to scattered elements in memory.
- π Array lists combine the benefits of arrays with dynamic sizing, using a capacity that can grow as needed, though this can incur performance costs during resizing.
- π JavaScript arrays are implemented as hashmaps rather than traditional arrays, allowing for flexible indexing and efficient memory use.
- π The choice of data structure depends on the specific requirements of the application, including performance, memory usage, and the need for dynamic resizing.
Q & A
What is the primary difference between arrays in systems programming languages and scripting languages?
-In systems programming languages, arrays require the specification of both the type and size of elements they will store, while scripting languages offer more flexibility, allowing for the addition of an unlimited number of elements and the inclusion of diverse data types within the same array.
Why can't arrays in systems programming languages inherently expand to accommodate more elements?
-Arrays in systems programming languages can't inherently expand because adding elements without considering the memory could potentially overwrite other necessary values. This is why such languages often lack methods like append, insert, or push, which are common in scripting languages.
How does the memory address play a role in accessing elements in an array?
-Memory addresses are crucial for accessing array elements because the computer accesses values in terms of memory addresses rather than variables. The base address of the array is used to calculate the memory address of a specific element by adding the product of the element's size and its index to the base address.
What is the main advantage of using linked lists over arrays in terms of size flexibility?
-Linked lists provide a dynamically sized list of elements by storing them wherever there is available memory space. They use nodes that contain an item and a pointer to the next node, which allows for the addition of new elements without the need to know the total number of elements in advance.
Why are arrays preferred over linked lists in terms of cache performance?
-Arrays are preferred for cache performance because they keep elements compacted in memory, which increases the chances of cache hits. Linked lists, with their elements scattered throughout memory, are less likely to be cached efficiently due to their non-contiguous nature.
What is the main idea behind an ArrayList in contrast to a simple array?
-An ArrayList is a data type that includes a reference to an array along with additional information about the array, such as capacity and length. It handles dynamic resizing, allowing the array to grow as elements are added, while also providing constant time access to elements by index.
How does an ArrayList manage to grow its size dynamically while maintaining performance?
-When an ArrayList's array is full, it allocates a new array with a larger capacity, typically twice the size of the full array. It then copies all the values from the old array to the new one, updates the reference and capacity, and finally adds the new element, ensuring dynamic growth without significant performance degradation.
What is the trade-off when considering the use of linked lists for removing elements?
-While removing an element from a linked list is faster because it only requires updating pointers and does not involve shifting subsequent elements, finding the element to remove in the first place can be slow due to the need to traverse the list from the head to the desired position.
How do scripting languages like Python handle arrays differently from systems programming languages?
-Scripting languages like Python use a specialized data structure, such as a PyListObject, which is an array of object pointers. This allows for the addition of elements of different types and simulates the behavior of an array without the strict type and size limitations of systems programming languages.
What is the underlying structure of a JavaScript array, and how does it differ from traditional arrays?
-JavaScript arrays are actually implemented as hashmaps, where keys correspond to indexes. This allows for the addition of elements at any index without the need to allocate memory for all the positions in between, providing a flexible and efficient way to handle arrays.
Outlines
π Fundamentals of Arrays and Their Variability Across Languages
This paragraph introduces the concept of arrays as a basic tool for developers, emphasizing how their behavior can differ significantly across programming languages. It explains the necessity in systems programming languages to define both the type and size of array elements, contrasting this with the flexibility offered by scripting languages, which allow dynamic resizing and the mixing of data types within a single array. The paragraph sets the stage for a discussion on why scripting languages can handle arrays differently and hints at the topic of 'real arrays' in JavaScript, which will be debunked later in the video.
π Deep Dive into Array Indexing and Memory Management
The second paragraph delves into the technical aspects of array indexing and memory management. It describes how arrays are stored contiguously in memory and the potential issues that can arise from dynamic resizing, such as overwriting other memory values. The explanation covers the low-level operations of how CPUs access memory addresses and the implications of incorrect memory access due to array resizing. It also touches on the lack of inherent size information in arrays at a low level of abstraction, leading to potential out-of-bounds access errors, and contrasts this with the higher-level abstractions provided by scripting languages.
π οΈ Addressing Array Limitations with Data Structures
This paragraph discusses the limitations of arrays and how they are addressed through the use of data structures. It introduces linked lists as a simple method for creating a dynamically sized list, explaining the concept of nodes and how they are used to store elements in available memory spaces. The paragraph also covers the drawbacks of linked lists, particularly their inefficiency for caching due to scattered memory allocation. It then introduces array lists (or dynamic arrays) as a compromise between the flexibility of linked lists and the compact memory usage of arrays, explaining how they manage memory and grow in size to accommodate more elements.
π Performance Implications of Array and List Operations
The fourth paragraph examines the performance implications of operations on arrays and lists. It discusses the time complexity of adding and removing elements from both data structures, highlighting the constant time complexity for accessing elements in an array list due to its internal array storage. The paragraph also addresses the inefficiency of accessing elements in a linked list due to its linear time complexity and the caching issues associated with non-contiguous memory allocation. It further explores the impact of shifting elements in an array when removing them and the advantages of knowing the approximate number of elements in advance to optimize performance.
π Generics and Scripting Languages' Unique Approach to Arrays
The final paragraph explores how scripting languages, such as Python and JavaScript, handle arrays differently from systems programming languages. It explains that scripting languages use interpreters or virtual machines to emulate the behavior of arrays, which are implemented as specialized data structures like Python's 'py list object' or JavaScript's hashmap approach. The paragraph highlights the flexibility of these implementations, which allow for arrays to hold elements of different types and to dynamically resize without the performance penalties associated with systems programming languages. It also touches on the performance considerations and the clever solutions employed by these languages to overcome the limitations of traditional arrays.
Mindmap
Keywords
π‘Arrays
π‘Indexing
π‘Memory Address
π‘Segmentation Fault
π‘Data Structures
π‘Caching
π‘Generics
π‘Interpreters
π‘HashMaps
π‘Dynamic Resizing
Highlights
Arrays in different programming languages exhibit varying behaviors, with systems programming languages requiring specification of element type and size, while scripting languages offer flexibility in element addition and data type diversity within the same array.
In systems programming, arrays are allocated with fixed size and type to prevent overwriting of adjacent memory values, explaining the lack of dynamic methods like append or push.
Scripting languages handle arrays differently, allowing dynamic resizing and mixed data types, which is contrasted with systems programming languages.
The video explores why JavaScript arrays are not 'real' arrays, but rather hashmaps, a unique approach to array handling.
Arrays are simple data structures with elements indexed from zero, allowing for retrieval and modification based on position.
Memory management is crucial as adding elements to an array can overwrite other values if not carefully managed, a common issue in low-level programming.
The distinction between array elements in memory and their representation in hardware is highlighted, emphasizing the importance of base address for indexing.
C programming language introduced syntax to simplify pointer arithmetic, influencing most major programming languages to adopt zero-based indexing.
Arrays in low-level languages do not contain inherent size information, leading to potential risks of accessing out-of-bound elements.
Linked lists offer a dynamically sized alternative to arrays, using nodes that contain elements and pointers to the next node.
Linked lists can suffer from caching inefficiencies due to scattered elements in memory, contrasting with the compact nature of arrays.
ArrayLists combine the benefits of arrays with dynamic resizing, using additional information about capacity and length to manage elements.
The process of resizing an ArrayList involves heap allocation and element copying, which can have performance implications.
ArrayLists provide constant time access for elements, unlike linked lists, which require linear time complexity for indexing.
Removal operations in arrays require shifting elements, which can be time-consuming, while linked lists can remove elements with constant time complexity by adjusting pointers.
Generic types in programming languages like Rust are handled differently, with the compiler generating specific code for each instance of a generic type at compile time.
Java handles custom types differently from primitive types in arrays, using an array of pointers to manage non-primitive elements.
Python's implementation uses a py list object, an array of object pointers, allowing for diverse data types within the same list.
The video concludes by revealing that JavaScript arrays are implemented as hashmaps rather than traditional arrays, a clever solution with unique memory management.
Transcripts
arrays are a fundamental tool for
developers but their behavior can vary
significantly depending on the
programming language being used in
systems programming languages it's
necessary to specify both the type and
size of elements the arrays will store
scripting languages offer more
flexibility allowing for the addition of
an unlimited number of elements while
permitting the inclusion of diverse data
types within the same array so our
scripting languag is just better in
today's video we'll look at how
scripting languages handle this issue
why it exists and how lower level
programming languages overcome it we'll
also discuss why JavaScript arrays
aren't even real
arrays hi friends my name is George and
this is core
dumped before starting I want to send
special thanks to the first members of
this Channel and to today's sponsor
codec Crafters more about them later in
the
video arrays are quite simple they
contain elements each occupying a
position in programming the first
position is usually zero the second is
one and so for these positions are
called indexes which we use to retrieve
and change the element at that position
in the array at a low level the idea is
to maintain each array element adjacent
to one another in memory but there's a
problem here since memory also holds
other values arrays should not
inherently expand disregarding this fact
and adding elements could potentially
overwrite other values that might be
required later in this example
immediately after adding elements to the
array a numeric Rec addition operation
follows the problem here is that
computers don't access values in terms
of variables but in terms of memory
addresses when the operation is executed
the machine code produced by the
compiler instructs the CPU to add the
numbers located in the specified memory
addresses which is right because it
refers to where both values were
initially stored however since both
values were overwritten by the array the
CPU will execute the addition between
incorrect
values as you can see see this presents
a significant issue one that is commonly
addressed by simply not permitting
programmers to add elements to arrays
this explains why arrays and systems
programming languages lack methods such
as append insert or push as seen in
Python JavaScript or other scripting
languages apart from the fact that
arrays are not objects at low level
let's see how systems programming
languages enforce this basic
rule when the array is declared we
specify both the type of elements it
will contain and its length indicating
how many of those specific elements the
array will contain the reason behind the
inability to have different types within
the same array lies in the use of this
information to determine the required
memory space for the array the type
specification informs the compiler about
the number of bytes each element
occupies in memory by multiplying this
value by the array's length the precise
amount of memory required to store the
entire array can be
determined whether the array is stored
in the stack or the the Heap this
information is essential to know how
much to move the stack pointer and
allocating a sufficiently large memory
space respectively you can check out my
two previous videos for more details
about
this okay here you see a clear
distinction between elements but
remember this is only an illustration in
the bare Hardware things look more like
this and the only information available
to us is the memory address where the
array begins referred to as the Base
address
this is how indexing Works since we know
the number of bytes each element
occupies within the array when we want
to access a specific element we can
calculate the memory address where that
element starts to achieve this we add
the product of the element's size and
its position referred to as the index to
the memory Base
address this explains why we utilize
zero instead of one for the first
element when attempting to access the
initial element moving the pointer is
unnecessary
as far as I know the C programming
language introduced this sugar syntax to
simplify a process known as pointer
arithmetic and the reason why almost any
major programming language out there
uses indexing starting with zero is
because their syntax is inspired by
C pointer arithmetic is a concept very
important in C I want to make a video
about it in the future but for now
here's a very good explanation about it
from one of my favorite
channels to conclude this section of the
video it's important to note that the
array itself does not contain
information about its size at this level
of abstraction an array is simply a
block of memory lacking an inherent
property such as length to verify the
validity of an index being passed
therefore without caution there's a high
likelihood of attempting to access an
element outside the actual bounds of the
array when this happens there are two
possible scenarios if the array is
allocated at or near the end of the
memory Gra granted to our process the
operating system will detect attempts to
access memory locations beyond our
allotted
space in such instances the operating
system terminates the execution of our
process resulting in a segmentation
fault
error if the array is allocated further
away from the memory limit the operating
system will not terminate the execution
since we'll be accessing memory within
our process's
boundaries in this example even though
we're accessing values out of the array
it doesn't matter if that region is
holding other value as everything is
ones and zeros at low level the accessed
region will be interpreted as if it were
of the same type as the elements within
the array allowing the program to
continue
execution you might think the second
situation is the best case scenario
because our program won't crash but stop
the video for a second and think about
it is it preferable for our program to
crash or to continue execution under the
assumption that the array actually had a
Fifth Element I mean just imagine if
this array represents purchases made by
an
individual okay now that we understand
why arrays can't grow and why they are
limited to storing only one kind of
element at a time let's address the
non-dynamic size Behavior by utilizing
data structures but before that a quick
message from today's sponsor if you
follow this channel you're probably the
kind of person who loves to understand
how things work if you've ever wondered
how your favorite developer tools like
git redus or Docker work under the hood
the best way to gain that knowledge is
by recreating them yourself
codec Crafters offers challenge-based
courses that guide you through crafting
these tools step by step if there is a
concept that you don't know their
platform will provide you with the
necessary information to help you
continue progressing the best part is
you can start the courses by choosing a
programming language of your choice so
you're not only recreating these tools
you're also improving your programming
skills with the language you prefer if
you're interested and want to support me
you can sign up for code crafters and
choose the plan that best suits you
using the referral Link in the
description below I've been thinking
about solving these challenges on stream
but I don't know if that's something you
guys are interested in I'm looking
forward to hearing your thoughts in the
comments you can follow me on Twitter
for updates on this matter and now back
to the video data structures in case you
haven't watched my previous video let me
quickly explain linked lists to
you the simplest method to obtain a
dynamically sized list of elements is by
storing them wherever there is available
memory space
to do this we create a custom data type
called a node a node contains one item
and one pointer the item represents the
element of the list and the pointer
directs to the address where the next
node containing the next element resides
in
memory we only need to maintain a
reference to the first node typically in
the stack when we wish to add a new
element to the list we search for an
available space big enough for that node
then we update the last node of the list
to point to the new node which becomes
the last one this approach eliminates
the concern about the list of elements
growing however linked lists suffer from
a significant drawback they are not
efficient for caching
data to give you some context cache is a
small extremely fast memory embedded
directly in the CPU its sole purpose is
to store a copy of a small region of
memory when the CPU requires data
fetching it from memory can take a
considerable amount of time relative to
the CPU speed of course but if the
required value is cached the process of
loading data into registers can be
orders of magnitude faster when the data
required for a CPU operation is found in
Cache we refer to it as a cach hit
conversely if the data is not in Cache
we refer to it as a cach
Miss due to its limited size the cache
cannot store a copy of the entire memory
for Optimal Performance our data should
ideally be as compact as possible
allowing it to fit into the cach and
increasing the chances of cash
hits this may sound like a trivial
concept but when our programs are
optimized to take advantage of caching
their performance can increase
tremendously furthermore in modern
Computing cash isn't just a feature it's
the norm therefore cash misses will be
our primary concern for the remainder of
this
video the
problem as linked lists elements are
scattered throughout memory they are
very unlikely to be
cached here arrays are a clear winner as
they keep elements compacted in
memory but now we are back where we
started arrays are too Limited in
functionalities here's where array lists
come into play as the name suggests we
still use arrays but in a smarter way
instead of just holding a reference to
an array we create a data type that
includes that reference plus additional
information about the
array when created this data type
allocates memory for an empty
array the capacity attribute indicates
how many elements the array can hold
which in this example is
five the length attribute indicates how
many elements the array contains at any
given time initially since the array is
allocated memory with no elements its
length is
zero the idea behind this data type is
that it'll handle the array for us so it
comes with some Associated
methods as the name suggests the push
method accepts a value and adds it to
the array every time we add an element
to the array the length attribute
increases but note that eventually the
array will be filled with
elements when this happens if we try to
add more elements we'll exceed the
arrays bounds which is exactly what
we're trying to prevent by using this
data
structure to prevent adding values
beyond the array's capacity we need to
check whether the array is full before
adding a new
element this is as simple as verifying
if the length is equal to the
capacity if the array is not full we can
safely add the value of the new
element however if the array is full
we need to allocate a larger array to
accommodate more elements typically we
allocate a new array twice as long as
the one that is
full during this process we update both
the capacity and the reference to the
new
array this involves copying all the
values from the previous array to the
new one once the elements are copied to
the new array the previous array is no
longer needed so we free its
memory finally with free space available
in the new array we can add the new
element and update the length
accordingly and that's it now we have a
list of elements whose size can grow
dynamically and please don't tell me in
the comments that this code can be
written in better ways it was written by
my cousin who knows more than you now
it's important to mention that this
process of creating a new larger array
when the current one is full involves a
heap allocation which can have
performance implications as we discussed
in a previous
episode additionally copying elements
around in memory also incurs additional
time if you know the approximate or
exact number of elements the array list
will contain and you're concerned about
performance it might be wise to create
the array list with a specific initial
capacity this can reduce the need for
frequent resizing operations which means
fewer allocations which in turn means
better
performance this is why in languages
like rust you usually create a vector
the equivalent of an array list using
the new method but there's also the WID
capacity method
available another characteristic that
array lists inherit from arrays is
constant time access since elements are
internally stored in an array accessing
an element at a specific position simply
involves retrieving the element from the
array by indexing it this operation has
a constant time
complexity to prevent accessing elements
beyond the array boundaries we perform a
validation before indexing the array
this way if the provided index is
greater than or equal to the array
length we can return null none or a
proper
error this effectively solves the
problem of accessing outof bound
elements causing undefined behaviors
which arrays by themselves cannot avoid
though it adds an extra step the time
complexity remains
constant the reason I'm bringing this up
is because in linked lists accessing
elements is not that easy unlike arrays
linked lists don't provide direct access
to Elements by index as elements are
scattered in memory there is no formula
to determine the addresses where
elements
reside accessing an element at a
specific position in a linked list
requires traversing the list from the
head until reaching the node in the
desired position since each node must be
traversed individually the time
complexity for indexing in quotes a
linked list is
linear the time complexity is not the
only performance challenge here even if
the desired element is in the cache the
other nodes along the path to find that
element are often not cached which can
significantly slow down the traversing
process when is up to remove operations
after deleting an element from the array
all the subsequent elements must be
shifted back one position in the array
so keep in mind that copying data in
memory can be timec
consuming if the removed element is
closer to the end of the array the
shifting process is faster since there
are fewer elements to shift
if the removed element is closer to the
beginning of the array there are more
elements that need to be shifted which
can significantly slow down the
process of course this also depends on
the number of elements within the array
list now linked lists Fanatics would
argue that remove operations are faster
because everything we must do to remove
an element is to update its previous
node to point to the following node and
then free The Unwanted nodes
memory and yes it is true as no shifting
operations are needed the time
complexity of this operation is constant
but we also should consider that in
order to update this node we first need
to find its value in memory and as I've
mentioned like 20,000 times accessing an
element in a linked list is not
efficient due to caching but before
putting shame on linked lists as we're
going to do with JavaScript in a moment
consider that contrary to what is
usually said linked list could still be
a better choice in certain
situations and listen correctly please
I said it could not all architectures
base their efficiency on caches nor in
data shifting instructions I'll read
your comments about your opinions on
this matter at the end of the day
picking the right data structure depends
on several factors when you get into the
low-level World you'll usually find
yourself benchmarking multiple Solutions
attempting to ring out every drop of
efficiency from the
hardware now before getting into the
final part you might have noticed that
in both cases we created the data
structure for a specific specific kind
of element here the array list Can Only
Hold 16bit signed integers while the
linked list Can Only Hold 8 bit unsigned
integers so how can we solve this
without having to create a different
version for each type we
desire well if you are thinking about
generic types yes you are right but just
to clarify generics don't work the same
way in all programming languages in Rust
for example when you declare a generic
type the compiler identifies each
instance where you use that generic
generates source code specific to that
type and substitutes the generic type
with the generated
type this process is invisible to us as
it's accomplished at compile time as a
result in memory we get data structured
exactly as we
expected another approach is employed by
Java for custom types if the type
contained by the array is not primitive
in instead of creating an array of
values the VM is actually creating an
array of pointers pointing to the memory
addresses where the elements resides in
memory here the language is designed to
hide all of those pointers from us so we
think we are dealing with arrays of
values when in reality we are
not if you are thinking this is bad for
caching yes this is one of the reasons
why Java is considered a pretty fast
language but still slower than systems
programming
languages with this approach linked
lists can get even worse because nodes
won't contain the actual element and the
pointer to the next node but a pointer
to the element and a pointer to the next
node a cache nightmare one that
hilariously is very common to see in
languages like C where beginners are
tempted to create something like this
due to the lack of
generics and this leads us to the final
part of this video how do scripting
languages handle
arrays all right let's quickly examine
both the JavaScript and python
approaches the first thing to remember
is that scripting languages are
typically not compiled into machine code
instead they rely on a special program
to interpret the script allowing it to
mimic or emulate the desired
behavior of course to execute that
emulation process the real Hardware is
used for this reason such programs are
often referred to as interpreters or
virtual machines when we declare an
array behind the scenes The Interpreter
generates a specialized data structure
to simulate an array of elements ments
to understand what python does we can
explore the implementation of the
official interpreter which is open
source here we encounter a type called
py list object this is what The
Interpreter generates when we create a
python list the documentation comment
here states that this type is nothing
but an array of object
pointers so it's similar to what Java
does for non-primitive types an array of
memory addresses pointing to the values
but there's a key difference here in
Java all pointers with within the array
point to values of the same type in
Python these pointers point to objects
implying they can point to values of
different types this explains us being
able to create an array and still being
able to add multiple kind of elements to
it because all of these pointers are of
the same size when we index a python
list The Interpreter simulates indexing
the list by first indexing the array of
pointers and then reading the value that
the pointer is pointing
to and finally let's talk about about
JavaScript in simple terms JavaScript
arrays aren't actually arrays they're
hashmaps JavaScript doesn't concern
itself much with arrays in memory
instead when a JavaScript array is
declared the engine creates a hashmap
where the keys correspond to the
indexes you can verify this yourself
create a Javascript file declare an
empty array and then insert some
elements at the position hello as you'll
observe the indexing process is
essentially syntax sugar for retrieving
an element from the hashmap by its key
now even though a lot of people claim
that I don't like JavaScript I have to
admit that this is a very bizarre but
clever solution what's particularly
fascinating about this is that if we add
an element at position 1 million the
engine won't need to allocate memory for
an array that large instead it simply
inserts another element into the table
accessible with the key 1 million very
cool huh and that about wraps for now if
you notice that the anim quality
decreases in the JavaScript part it's
not a coincidence this video was
particularly hard to animate so hit the
like button if you learned something and
if you want to learn more consider
subscribing and following me in other
platforms the subscriber count has
double since the last video so again
thanks a lot for your support
Browse More Related Video
5.0 / 5 (0 votes)