COS 333: Chapter 6, Part 3

Willem van Heerden
26 Aug 202175:01

Summary

TLDRThis lecture delves into advanced data types, focusing on unions, pointers, and references, and their implications on memory management and type safety. It distinguishes between free and discriminated unions, highlighting the dangers of type-unsafe operations in C and C++. The lecture also explores strong versus weak typing, type equivalence, and the role of type checking in preventing errors, concluding with a brief introduction to data type theory and its significance in programming language design.

Takeaways

  • 📚 The lecture concludes the discussion on Chapter Six, covering advanced topics like union types, pointer and reference types, type checking, strong versus weak typing, type equivalence, and data type theory.
  • 🔄 Union types allow a variable to store different data types at different times but only one value can be set at a time, sharing the same memory space for all fields.
  • 🛠️ Unions are useful for conserving memory space when only one of multiple variables is in use at a time, but they are less useful in modern programming due to the abundance and affordability of memory.
  • ⚠️ Free unions do not support type checking and can lead to unsafe programming practices, while discriminated unions, supported by languages like Ada, include type checking for safety.
  • 🔬 Discriminated unions use a 'discriminant' or 'tag' to track the type of data stored, preventing errors associated with accessing the wrong fields.
  • 👀 Pointers store memory addresses and can be used for indirect addressing and dynamic memory management, but they can introduce complexity and potential errors like dangling pointers and memory leaks.
  • ➡️ Pointer arithmetic allows for the manipulation of memory addresses to access different parts of memory but requires careful handling to avoid out-of-bounds access.
  • 🔐 References in languages like C++ and Java provide a safer alternative to pointers for indirect addressing, as they cannot perform pointer arithmetic and must always be initialized.
  • 🔄 Coercion can weaken typing in programming languages by automatically converting types to match operators, potentially masking type mismatches.
  • ⚖️ Strong typing ensures that type errors are always detected, leading to more reliable code, while weak typing may allow certain type errors to go undetected.
  • 📘 The lecture touches on type theory, distinguishing between practical type theory relevant to commercial programming languages and abstract type theory, which is of more interest to theoretical computer scientists.

Q & A

  • What is a union type in programming?

    -A union type is a type that can store values of different types at different times during execution. It is characterized by the fact that only one of its fields may contain a value at a time, and all fields within a union share the same storage space.

  • Why are unions considered less useful in modern programming languages?

    -Unions are considered less useful in modern programming languages because they were particularly useful in the early days of computing when memory was scarce and expensive. Today, with large and comparatively cheap memory, the need for conserving memory space through unions has diminished.

  • What are the two design issues that arise when implementing unions in a programming language?

    -The two design issues are whether type checking should be required for unions and whether unions should be embedded within records or be a separate type on their own.

  • What is the difference between free unions and discriminated unions?

    -Free unions do not support type checking, leaving it to the programmer to track which type is stored. Discriminated unions, on the other hand, support type checking through the use of a discriminant or tag that helps check the type of the union and detect errors when invalid fields are accessed.

  • Why are pointers considered dangerous in programming?

    -Pointers are considered dangerous because they can lead to issues like dangling pointers and lost heap dynamic variables. They also increase the number of routes to access a specific line in a program, similar to the 'go-to' statement, which can complicate program flow and make it harder to understand.

  • What are the two fundamental operations that must be supported for pointers in a programming language?

    -The two fundamental operations for pointers are assignment, which allows setting a pointer's value to a memory address, and dereferencing, which retrieves the value stored at the location a pointer refers to.

  • What is pointer arithmetic, and how is it used?

    -Pointer arithmetic allows the addition of values to pointers to compute offsets from a specific memory location. It is used to access elements in arrays or to navigate through memory addresses.

  • What is the difference between a pointer and a reference in programming?

    -A pointer stores the memory address of a value it refers to, while a reference is an alias for another value. Pointers can be manipulated as addresses, whereas references are treated as the values they refer to and cannot perform pointer arithmetic.

  • Why are references considered safer than pointers?

    -References are considered safer than pointers because they cannot be used for pointer arithmetic, which prevents errors like accessing memory beyond the bounds of an array. Additionally, references must always be initialized to an existing object, avoiding the issue of uninitialized or dangling references.

  • What is type checking, and why is it important?

    -Type checking is the process of verifying that the operands of an operator are of compatible types. It is important because it helps ensure that operations are performed on the correct types of data, preventing type errors that could lead to unexpected behavior or program crashes.

  • What is the difference between static and dynamic type checking?

    -Static type checking is performed at compile-time and can catch type errors before the program runs. Dynamic type checking occurs at runtime and may result in runtime type errors if the types are not compatible.

  • What does it mean for a programming language to be strongly typed?

    -A strongly typed programming language is one that always detects type errors, ensuring that all variable misuses causing type errors are caught, typically at compile-time.

  • What is type equivalence in the context of programming languages?

    -Type equivalence defines when operands of two types can be substitutable with no coercion taking place. It sets the rules for which types are acceptable for all operators and can be based on either name type equivalence or structure type equivalence.

  • What are the potential drawbacks of implementing structure type equivalence in a programming language?

    -Implementing structure type equivalence can be complex and may lead to potential problems such as difficulty in differentiating between types with the same structure but different meanings, and the need to answer a multitude of questions regarding how types should be considered equivalent.

  • What is data type theory, and what are its branches?

    -Data type theory is a broad area of study concerned with data types and their properties. It has two main branches: practical type theory, which deals with data types in commercial and real-world programming languages, and abstract type theory, which often involves typed lambda calculus and is of interest mainly to theoretical computer scientists.

Outlines

00:00

📚 Lecture Overview and Union Types

This paragraph introduces the final lecture of Chapter 6, summarizing topics such as union types, pointer and reference types, type checking, strong versus weak typing, type equivalence, and data types theory. The focus then shifts to union types, explaining them as a storage mechanism for values of different types at different times, with a unique characteristic of shared storage space among fields. The paragraph highlights the use of unions in memory conservation and their reduced utility in modern computing due to abundant and affordable memory.

05:03

🔍 Design Issues and Examples of Unions

The paragraph delves into the design considerations for implementing unions in programming languages, such as type checking requirements and whether unions should be distinct or part of records. It contrasts free unions, which lack type checking and can lead to unsafe memory access, with discriminated unions, which include a type indicator for safer usage. Examples in Fortran, C, and C++ are provided to illustrate the concepts, emphasizing the importance of programmer awareness in tracking the current value in a free union.

10:04

📘 Discriminated Unions and Memory Efficiency

This section provides a detailed explanation of discriminated unions, which include a discriminant or tag for type checking. An example using a 'figure' union in Ada demonstrates how this type can store data for different shapes with shared memory space, making storage more efficient. The memory layout for a 'figure' union is described, showing how the discriminant guides the storage and retrieval of data for shapes like circles, triangles, and rectangles.

15:05

🚫 Drawbacks of Unsafe Unions and Pointers

The paragraph evaluates the safety of unions, particularly unsafe free unions, and compares them to discriminated unions supported by languages like Ada. It suggests avoiding free unions for reliability and points out that modern languages like Java and C# do not support unions due to safety concerns. The discussion then transitions to pointers and references, explaining their roles in indirect addressing and dynamic memory management, and the associated design issues that programming languages must address.

20:05

🔄 Pointer Operations and Dynamic Memory Management

The focus shifts to pointer operations, specifically assignment and dereferencing, which are fundamental to working with pointers in languages like C and C++. The paragraph illustrates how pointers can be assigned memory addresses and how dereferencing retrieves values from those addresses. It also touches on the dangers of pointers, such as dangling pointers and memory leaks, and the importance of careful management of dynamic memory allocation and deallocation.

25:07

🛠️ Practical Use of Pointers in C and C++

This section discusses the practical implementation of pointers in C and C++, highlighting their flexibility and potential dangers. Pointer arithmetic is introduced, along with the explicit use of dereferencing and address-of operators. The paragraph also explains the concept of void pointers, which can point to any data type after being cast, and how this affects type checking in these languages.

30:08

🔬 Deeper Look at Pointers and References

The paragraph provides a deeper analysis of pointers and references, emphasizing the safety of references over pointers due to the inability to perform pointer arithmetic on references. It also discusses the memory conservation benefits of references, as they do not require copying values. However, it notes that references can still lead to dangling references and memory leaks, which are common issues with pointers.

35:08

🏭 Evaluation of Pointers in Programming Languages

The paragraph evaluates pointers in the context of programming language design, noting the problems associated with pointers, such as manual heap management and the potential for type errors during casting. It compares the handling of pointers in different languages, like Java, C#, and the observation that pointers are akin to go-to statements in terms of increasing the complexity of program flow.

40:10

🔍 Type Checking and Language Typing Strength

This section introduces type checking, explaining how it ensures operand compatibility for operators and the difference between static and dynamic type checking. It discusses the concept of strong typing, where type errors are always detected, and provides examples of strongly typed languages like ML and F#. The paragraph also covers the impact of type coercion on typing strength and the reliability of programming languages.

45:11

🔄 Coercions and Type Equivalence in Programming

The paragraph explores type coercion and its effects on programming language typing, illustrating how coercions can lead to undetected type mismatches. It compares languages like C++, Java, and Ada in terms of their coercion rules and typing strength. The concept of type equivalence is introduced, explaining name type equivalence and structure type equivalence, and the challenges associated with implementing each approach.

50:12

📚 Conclusion and Introduction to Data Type Theory

The final paragraph concludes the lecture by briefly introducing data type theory, which encompasses practical and abstract aspects of type systems in programming languages. It outlines the broad scope of type theory, touching on its relevance to fields like mathematics, logic, computer science, and philosophy, and sets the stage for the next lecture on expressions and assignment statements.

Mindmap

Keywords

💡Union Type

A union type is a data structure that can store values of different types at different times during execution. It is characterized by the feature that only one value can be set at a time, and all fields within a union share the same storage space. In the context of the video, unions are discussed as a way to conserve memory space by allowing multiple variables to be represented within the same memory location, which was particularly useful in the early days of computing when memory was a premium resource.

💡Pointer

A pointer is a variable that stores the memory address of another variable. It is used for indirect addressing and dynamic memory management. Pointers can point to any variable regardless of its allocation time or location, and they can be used to access dynamically created storage on the heap. The video script explains the dangers associated with pointers, such as dangling pointers and memory leaks, and how they require careful management due to their flexibility and potential for misuse.

💡Reference Type

A reference type in programming is an alias for another value. Unlike pointers, references do not store the memory address of a value but instead provide a means to access the value indirectly without the need for dereferencing. The script mentions that references are safer than pointers because they cannot be used for pointer arithmetic, which prevents errors such as accessing memory beyond the bounds of an array.

💡Type Checking

Type checking is the process of verifying that the operands of an operation are of compatible types. This can be done statically at compile-time or dynamically at runtime. The video emphasizes the importance of type checking in ensuring the reliability of a program, as it can detect type errors that might otherwise lead to unexpected behavior or runtime crashes.

💡Strong Typing

Strong typing refers to a programming language's ability to detect type errors at compile time, thus preventing certain types of bugs before the program runs. The video script discusses the concept of strong versus weak typing, noting that strongly typed languages like ML and F# are more reliable as they minimize the risk of type-related errors.

💡Weak Typing

Weak typing, in contrast to strong typing, allows for more flexibility in how types can be used and converted. This can lead to type errors only being detected at runtime, if at all. The script points out that languages like C and C++ are weakly typed due to their extensive use of coercions and lack of strict type checking, which can lead to less reliable code.

💡Type Coercion

Type coercion is the automatic conversion of one data type to another. For example, an integer can be coerced into a floating-point number. The video script explains how coercion can weaken the typing strength of a language, as it may allow operations between incompatible types to proceed without errors, potentially leading to unexpected results.

💡Type Equivalence

Type equivalence defines when two types can be considered the same for the purpose of an operation, without needing coercion. The script discusses two approaches to type equivalence: name type equivalence, which requires the types to have the same name, and structure type equivalence, which considers types equivalent if their structures are the same, regardless of their names.

💡Dangling Pointer

A dangling pointer is a pointer that refers to memory that has been deallocated or is no longer in use. The video script warns about the dangers of dangling pointers, as they can lead to undefined behavior and program crashes when dereferenced, because the memory they point to may no longer be valid.

💡Memory Leak

A memory leak occurs when a program allocates memory but fails to release it back to the system after it is no longer needed. The script discusses how memory leaks can occur with both pointers and references if the allocated memory is lost and cannot be deallocated, eventually leading to a depletion of available memory resources.

💡Dynamic Memory Management

Dynamic memory management is the process of allocating and deallocating memory during the runtime of a program. The video script explains how pointers are used for this purpose, allowing programs to have a flexible memory usage that can grow and shrink as needed. However, it also highlights the responsibility placed on the programmer to manage this memory correctly to avoid leaks and other issues.

Highlights

Lecture concludes the discussion on chapter six, covering advanced data types and concepts.

Introduction to union types, which can store values of different types at different times.

Explanation of how unions differ from records or tuples, specifically that only one value can be set at a time in a union.

The shared storage space characteristic of union fields, leading to memory space efficiency.

Historical context of unions' utility in early computing due to limited and expensive memory.

Modern computing context where unions are less useful due to abundant and cheaper memory.

Design issues in implementing unions, including type checking and embedding within records.

Discriminated unions supported by Ada, providing type safety through a type indicator or tag.

Free unions versus discriminated unions, with the latter being safer due to built-in type checking.

Pointers and references as fundamental building blocks for managing memory and indirect addressing.

Pointer operations, including assignment and dereferencing, and their implications for memory management.

The dangers of pointers, such as dangling pointers and memory leaks, and the need for careful usage.

C and C++ support for pointers, including flexibility and the associated risks.

Pointer arithmetic in C and C++, allowing for offsets computation from specific memory locations.

Introduction to references in C++, providing an alternative to pointers with automatic dereferencing.

Java's support for references but not pointers, impacting the language's safety and reliability.

C#'s approach to supporting both pointers and references, with pointers requiring 'unsafe' program context.

Type checking mechanisms, including static and dynamic type checking, and their role in detecting type errors.

Strong versus weak typing, with languages like Java and C# being more strongly typed than C and C++.

Type coercion and its impact on the strength of a programming language's type system.

Type equivalence, defining when operands of different types can be substitutable without coercion.

Name type equivalence and structure type equivalence, their implications for type matching and program reliability.

Data type theory, encompassing practical and abstract type theories, and their relevance to programming language design.

Transcripts

play00:01

this is the third and final lecture on

play00:04

chapter six of the textbook

play00:07

in the previous lecture we finished off

play00:09

our discussion on arrays and also looked

play00:12

at associative arrays and then we also

play00:15

covered records tuples and lists

play00:19

we will now be finishing the remaining

play00:21

work in chapter six

play00:25

this is an overview of the topics that

play00:28

we'll be discussing in today's lecture

play00:31

we'll start off with union types then

play00:34

we'll move on to pointer and reference

play00:37

types

play00:38

then we'll look at how type checking

play00:41

works

play00:42

and then related to type checking we

play00:44

will discuss strong versus weak typing

play00:48

we'll then also look at type equivalence

play00:51

which is also related to type checking

play00:55

and then we'll finish off with a

play00:57

discussion on theory and data types

play01:04

the first of the data types that we'll

play01:05

look at in this lecture is the union

play01:08

type

play01:09

so what is a union well a union is a

play01:13

type that can store values of different

play01:16

types at different times during

play01:18

execution

play01:20

now if we consider only this part of the

play01:21

definition then a union sounds as though

play01:24

it's the same thing as a record or

play01:27

possibly a tuple in the sense that it

play01:30

consists of multiple values and these

play01:34

values can have different types

play01:37

what sets a union apart however is that

play01:40

only one of these values may be set at a

play01:44

time

play01:45

so let's assume that we have a union and

play01:47

this union has three fields within it

play01:51

only one of these three fields may

play01:54

contain a value at a time

play01:57

what's also very important to understand

play01:59

is that the storage space for all of the

play02:02

fields within a union is shared

play02:06

so

play02:06

for example if we have a union with

play02:09

three fields

play02:11

then let's assume that the first field

play02:14

has a value

play02:16

now assume that the second field is set

play02:19

so what this will result in is that the

play02:21

memory space that was occupied by the

play02:24

first fields value will now be

play02:27

overwritten with the second fields value

play02:30

and this is why only one of the fields

play02:33

at a time may have a value

play02:36

so what are unions used for

play02:39

well what i would like you to do at this

play02:41

point is to pause the video and try to

play02:43

think of a scenario in which a union

play02:46

would be useful

play02:50

so in general unions are used in

play02:54

situations where memory space is at a

play02:58

premium and the programmer wishes to

play03:00

conserve memory space

play03:03

and the programmer may for example

play03:05

identify that there are a number of

play03:08

variables that exist within a program

play03:12

but only one of these variables at a

play03:14

time is ever in use

play03:16

in this case the programmer can then

play03:18

create a union that packages these

play03:21

variables together

play03:22

and this will then ensure that all of

play03:26

these variables can be represented

play03:28

however they share the same memory space

play03:32

this will then result in a lot less

play03:34

memory being used than if these separate

play03:38

variables were each separately allocated

play03:41

in memory so this is generally the

play03:43

scenario in which a union would be used

play03:46

as a result of this unions were very

play03:49

useful in the early days of computing

play03:52

when memories were very small and memory

play03:55

was also very expensive but in a modern

play03:58

context where computers typically have

play04:00

very large memories and memory is

play04:03

comparatively cheap unions have become

play04:06

less and less useful and as a result a

play04:09

lot of modern programming languages

play04:12

don't support unions anymore

play04:15

now the two design issues that arise

play04:18

when we decide to implement unions in a

play04:21

programming language

play04:23

firstly should type checking be required

play04:26

and we'll look in a moment at how type

play04:28

checking can be implemented for unions

play04:32

secondly should unions be embedded

play04:34

within records or should they be a

play04:37

separate type unto themselves

play04:43

now broadly speaking two different

play04:45

categories of unions exist discriminated

play04:48

unions and free unions we'll start off

play04:52

by looking at three unions which are

play04:54

supported in fortran c and c plus plus

play04:59

so free unions are unions that do not

play05:02

support type checking and it is

play05:05

therefore up to the programmer to keep

play05:07

track of which of the types is stored in

play05:10

other words which of the fields is the

play05:13

field that is currently storing the

play05:15

value

play05:17

right so here we have an example then

play05:19

here a union is defined and we can see

play05:22

that it has very similar structure to a

play05:25

record so our union is called flex type

play05:29

and we can see that there are two fields

play05:31

within our union

play05:32

firstly intel which is of type int and

play05:36

then float l which is of type float

play05:40

now it's important to understand that

play05:42

only one of these two fields intel or

play05:46

flotel can be occupied at a time and

play05:50

that they share memory space

play05:53

so here we have then a

play05:56

union um that has been defined we're

play05:59

defining a variable

play06:02

that is of type flex type again we need

play06:06

to specify that it's a union and we are

play06:09

calling this variable u1

play06:12

so now we can set intel by using this

play06:15

dot notation

play06:17

on u1 and we can set that value to 12.

play06:22

so at this point intel then contains the

play06:25

value 12 float l has no value associated

play06:30

with it

play06:31

then we can update flotell and we do

play06:35

that by assigning 22.8

play06:38

to float l

play06:40

in union u1

play06:43

so what this now means is at this point

play06:46

float l now has a value of 22.8 and

play06:51

intel is no longer defined

play06:55

so then we can also retrieve a value

play06:57

from a union and that's exactly what

play06:59

we're doing here we're accessing union

play07:02

u1 and we are accessing the field float

play07:06

l within that and then assigning this to

play07:08

a float variable x

play07:11

so what this will result in is that x

play07:13

will receive a value of 22.8

play07:18

now let's look at the last line of our

play07:21

example code over here

play07:23

here we are now accessing

play07:25

u1

play07:28

and the intel field within union u1

play07:33

now notice that we haven't set intel's

play07:36

value

play07:37

now because this is a free union there's

play07:40

no type checking that will raise an area

play07:43

at this point that will tell us

play07:45

no we've actually only set a value for

play07:48

float l there isn't a valid value for

play07:51

intel

play07:52

what will happen then in this case is

play07:55

that the runtime system

play07:57

will look at the memory space that is

play08:00

occupied by both of these fields

play08:04

and this memory space contains on a bit

play08:07

level the floating point value

play08:11

22.8

play08:13

what will then happen is that it will

play08:15

access the into value and it will simply

play08:19

assume that the binary data that is

play08:22

stored within the union is for this

play08:26

integer field

play08:28

so what will then happen is it will read

play08:30

the required number of bits from memory

play08:33

and it will then assign that as an

play08:36

integer value to the variable y

play08:39

however because this value isn't an

play08:42

integer value what will be retrieved

play08:44

will then just be garbage it will be the

play08:48

beginning of the floating point value

play08:50

interpreted as though it was an integer

play08:54

so it's important to understand that we

play08:56

won't get the value 22 back we'll get

play08:59

some other integer value that is

play09:02

equivalent to the first couple of bits

play09:04

within the floating point value so this

play09:09

thing means that no type checking can be

play09:11

performed we can't determine whether in

play09:13

this example intel or float l is

play09:17

occupied it's up to the programmer to

play09:20

know which one of the fields has a value

play09:23

stored for it

play09:25

what this means is that free unions are

play09:27

relatively unsafe to use because errors

play09:31

are not raised when we access the

play09:33

incorrect fields instead we'll just get

play09:36

invalid data back

play09:40

next we have discriminated unions so

play09:43

these are unions that support type

play09:45

checking and as a result are much safer

play09:48

to use than free unions

play09:51

so discriminated unions are supported by

play09:54

ada

play09:55

now and please note that discriminated

play09:58

unions are only very briefly mentioned

play10:00

in the textbook so please refer to this

play10:04

part of the discussion

play10:07

and the slides that i will post

play10:10

on the course website

play10:12

for further detail on how discriminated

play10:15

unions work

play10:17

so discriminated unions then have an

play10:21

additional type indicator which is

play10:23

called a discriminant or a tag

play10:26

the discriminant is used to check the

play10:29

type of the union

play10:30

and to detect errors if invalid fields

play10:34

are accessed

play10:37

so in order to make the concept of a

play10:40

discriminated union concrete let's look

play10:42

at an example

play10:44

here we will define a union called

play10:47

figure

play10:48

and figure can store data related to

play10:52

either a circle a triangle or a

play10:55

rectangle however it's important to note

play10:58

that it can only store data related to

play11:00

one of these things at a time

play11:03

so in other words the memory storage

play11:06

space within the union is shared

play11:09

for the data related to a circle the

play11:12

data related to a triangle and the data

play11:14

related to a rectangle

play11:17

so we will then create a discriminant in

play11:20

our union and we'll call this

play11:22

discriminant form

play11:24

and the value of form can be either

play11:28

circle triangle or rectangle

play11:31

so depending then on the value of the

play11:35

form discriminant

play11:38

only the appropriate data then for

play11:40

either a circle a triangle or a

play11:42

rectangle can be stored

play11:45

right so let's look at the code

play11:48

here to begin with we have an

play11:50

enumeration called shape

play11:53

and shape can take on one of three

play11:55

values

play11:56

circle triangle or rectangle

play12:00

then we have another enumeration called

play12:03

colors and colors has three values that

play12:06

are legal for it red green and blue

play12:10

so now we can get on to actually

play12:12

defining our

play12:14

union

play12:15

so here we are defining figure and then

play12:18

we specify that we have a discriminant

play12:22

and this discriminant then has a type of

play12:26

shape so in other words it can be either

play12:29

circle triangle or rectangle and the

play12:32

discriminant is referred to by the name

play12:35

form

play12:37

so we can see also that this is

play12:39

specified as a record

play12:41

and so it uses the same form as a

play12:44

standard record the only difference is

play12:48

that we are

play12:50

storing a discriminant

play12:53

right so then we have fields that we can

play12:56

specify for our figures so we have two

play12:59

regular fields and for these fields we

play13:04

don't have any shared memory space it's

play13:07

already presented in the standard way

play13:11

um with separate memory spaces allocated

play13:14

for each one exactly the way that fields

play13:17

would be allocated within a normal

play13:20

record

play13:22

so we have then fold which has the type

play13:25

of boolean and we have color which has

play13:28

the type of colors so color can either

play13:31

be red green or blue

play13:34

now we have a case which begins up here

play13:38

and ends at the bottom where we see the

play13:40

end case

play13:42

and we then base this on form so this is

play13:46

like a switch statement effectively and

play13:49

depending on what value form has where

play13:52

the form is a circle triangle or

play13:56

rectangle

play13:57

we will then

play13:59

execute one of the when clauses here so

play14:02

we have a when clause for each of the

play14:04

values that form can take on

play14:06

for circle

play14:08

for triangle and for rectangle

play14:12

so if form is a circle then we have

play14:15

defined for our figure

play14:18

a diameter and the diameter is a

play14:21

floating point value

play14:23

if the form is triangle then we have a

play14:27

left side and a right side both of which

play14:31

are integer values and we also have an

play14:34

angle which is specified to be a

play14:37

floating point value and finally if form

play14:40

is rectangle then we have two sides side

play14:44

one and side two and these are both of

play14:47

type

play14:48

integer

play14:49

our case then ends and then we end our

play14:53

record off

play14:57

so to complete our example let's now

play14:59

look at what our discriminated union

play15:02

will look like in memory

play15:04

so this is then the memory that is

play15:08

occupied by a full figure

play15:12

and we can see that our first two fields

play15:15

are represented over here so we have

play15:18

fold first and then we have color notice

play15:22

that these memory spaces are entirely

play15:25

separate from each other they don't

play15:27

overlap because they are just regular

play15:30

fields within our union

play15:33

then we have our discriminant over here

play15:35

and this discriminant is named form as

play15:38

we saw on the previous slide

play15:41

and this can then have a value of either

play15:44

rectangle circle or triangle

play15:47

now we can see that the memory space

play15:50

that is reserved for the data related to

play15:55

a circle rectangle or triangle is shared

play16:00

so if the discriminant has a value of

play16:03

circle then we store only the diameter

play16:07

if the discriminant has a value of

play16:10

rectangle then we store our two sides so

play16:14

that will then be this memory space

play16:17

notice that the first part of the memory

play16:19

space overlaps the memory space that

play16:23

would be used to store the diameter of a

play16:26

circle

play16:27

and then finally if the discriminant is

play16:30

set to triangle then we store the left

play16:33

side right side and the angle so that is

play16:37

then stored in this memory space over

play16:40

here and notice that this overlaps with

play16:42

the memory space that would be used for

play16:44

the two sides of our rectangle or

play16:48

alternatively also then the diameter of

play16:52

our circle so the memory space then is

play16:55

shared for the data related specifically

play16:58

to the shapes and this then makes the

play17:02

storage space that is taken up by our

play17:06

union more efficient

play17:10

so here we have an example of how an

play17:14

instance of figure could be declared

play17:17

so here we are defining a variable

play17:19

called if underscore one its type is

play17:23

figure so we know that this is in a

play17:26

union and then we can assign to f1 and

play17:29

then we specify what each of the fields

play17:34

values should be

play17:35

so here we specify that fold must be

play17:38

true that will set then this value in

play17:42

memory over here to true

play17:45

color has been set to blue

play17:48

so that means that this memory space

play17:50

over here will contain the value blue

play17:54

then we set our discriminant so we're

play17:57

setting form to circle so that means

play18:01

that circle will then occupy this space

play18:04

in memory

play18:06

and then finally we need to set the data

play18:08

related to a circle so we set then the

play18:11

diameter to a value of

play18:14

1.3

play18:15

that means that this memory space for

play18:17

the diameter over here will be occupied

play18:20

by the value 1.3 the remaining memory

play18:23

space is unused

play18:27

so um also then note that because this

play18:31

is a discriminated union this is type

play18:35

safe so if we were to attempt

play18:38

um in this declaration that we looked at

play18:41

before here

play18:42

to set for instance side 1 or side 2 or

play18:47

in fact left side or right side or angle

play18:52

then there would be a compilation area

play18:55

that would be raised and the reason for

play18:57

that is that those fields are not

play19:00

defined

play19:01

for

play19:02

the discriminant being circle only a

play19:05

diameter is defined so that is all that

play19:08

we can set so this then means that

play19:11

discriminated unions as they are

play19:13

supported in ada are much safer than the

play19:16

free unions that we discussed before

play19:18

because we cannot accidentally access

play19:21

fields that we shouldn't

play19:23

based on the value of the discriminant

play19:28

so let's finish off our discussion on

play19:30

unions with an overall evaluation of the

play19:33

data type

play19:35

now as we've seen three unions unsafe

play19:37

and this is because they don't allow any

play19:39

kind of type checking

play19:41

as a result if we want a programming

play19:44

language to be in any way reliable then

play19:48

we should avoid

play19:50

providing support for free unions

play19:53

now some modern programming languages

play19:55

such as java and c-sharp don't support

play19:58

unions at all and this is because they

play20:01

consider unions to be unsafe and also

play20:05

there isn't much of a use for unions in

play20:08

a modern context anymore as i previously

play20:11

pointed out

play20:12

so this lack of support for unions

play20:15

reflects growing concern for safety

play20:18

within programming languages and as a

play20:21

consequent

play20:23

result increased reliability

play20:26

now as we've seen ada supports

play20:28

discriminated unions and these are type

play20:31

safe

play20:32

what i would like you to do at this

play20:33

point is to pause the video

play20:37

and see if you can think of a

play20:39

disadvantage associated with ada's

play20:42

discriminated unions

play20:47

the next two data types that we'll look

play20:49

at in this lecture are related to one

play20:52

another and they are pointers and

play20:55

references

play20:56

so we'll begin with pointers

play20:58

a pointer variable is a variable that

play21:01

stores values which are memory addresses

play21:04

and these memory addresses reference a

play21:07

particular memory cell

play21:10

there's also generally a special value

play21:13

referred to as a null or a null value

play21:16

and this indicates that the pointer

play21:19

doesn't refer to a memory cell it is

play21:22

essentially an uninitialized pointer

play21:26

now there are two possible uses for

play21:28

pointers firstly they provide for

play21:31

indirect addressing

play21:33

so this means that we are providing a

play21:36

mechanism for referring to a variable

play21:40

using a different name so the variable

play21:43

itself has a name but then we have a

play21:46

pointer which also has a name and this

play21:48

then refers to the variable hence

play21:51

creating a second name that can be used

play21:55

to get to the variable's value

play21:58

the second use of pointers is that they

play22:01

provide a mechanism to manage dynamic

play22:04

memory so pointer can be used to access

play22:07

a location in memory

play22:10

where storage is dynamically created

play22:13

during runtime and this area in memory

play22:16

is referred to as the heap

play22:21

there are a number of design issues

play22:23

related to pointers which must be

play22:25

decided on if a programming language is

play22:28

going to provide support for pointers

play22:31

so firstly we need to decide on the

play22:34

scope and lifetime that a pointer

play22:37

variable will have

play22:39

now notice that this is the scope and

play22:41

lifetime of the pointer itself and not

play22:45

the scope and lifetime of the value that

play22:48

is being pointed to by the pointer

play22:52

then secondly

play22:54

if pointers are to be used for dynamic

play22:57

memory management then what is the

play23:00

lifetime of a heap dynamic variable

play23:05

then in the third place

play23:07

is there some kind of restriction that

play23:10

is applied to the type of the value to

play23:14

which a pointer can refer

play23:18

then in the fourth place our pointers

play23:21

used only for dynamic storage management

play23:24

or only for indirect addressing or

play23:27

pointers used for both purposes

play23:30

and then finally should the programming

play23:33

language support only pointer types or

play23:36

only reference types or should both

play23:39

pointers and references be supported

play23:44

next let's look at operations that are

play23:47

defined for pointers so there are two

play23:49

fundamental operations that must be

play23:52

supported if pointers are represented in

play23:55

a programming language the first is

play23:57

assignments and the second is

play24:00

dereferencing

play24:02

now assignment allows a programmer to

play24:04

set a pointer variables value to a

play24:07

memory address and by doing that set the

play24:10

pointer up to refer to another memory

play24:13

location

play24:15

now if the variable isn't on the heap

play24:17

then there needs to be some sort of

play24:19

mechanism to get the address of the

play24:22

variable

play24:23

the reason that heap variables are

play24:26

excluded is that any value that is

play24:29

allocated on the heap is typically

play24:31

referred to by a memory address in any

play24:34

case

play24:36

so here we have an example of what this

play24:39

kind of assignment operation would look

play24:41

like in c plus

play24:44

what we are doing here is we are

play24:45

defining a variable called p and the

play24:48

type of p is an integer pointer or we

play24:52

can alternatively understand it as the

play24:55

memory address of an integer value

play24:58

so now we assume that we have an integer

play25:01

variable called val and then we use this

play25:04

ampersand operator which is the address

play25:07

of operator and this will then retrieve

play25:10

the memory address for the variable val

play25:13

that memory address is then assigned and

play25:17

stored in the variable p

play25:19

which as we've seen is a pointer

play25:22

variable

play25:24

next let's move on to dereferencing so

play25:27

dereferencing yields the values stored

play25:30

at a location that is represented by a

play25:33

pointers value now dereferencing can

play25:36

either be explicit or implicit

play25:39

so in c plus explicit dereferencing is

play25:42

used and this uses the referencing

play25:46

operator which is an asterisk so over

play25:49

here we have an example of this in

play25:52

practice we will assume

play25:54

that

play25:55

j is a variable of an appropriate type

play25:59

for

play26:00

the

play26:01

pointer that we have and we assume then

play26:04

that we have a point

play26:06

named ptr so let's assume that the

play26:09

pointer's type is an integer pointer or

play26:12

an integer memory address in that case j

play26:16

would then have to be an integer and we

play26:18

want to retrieve the value that the

play26:21

pointer refers to so in this case we

play26:24

then apply

play26:25

this asterisk operator the dereferencing

play26:28

operator and what this does is it then

play26:31

retrieves the value that the memory

play26:34

address stored in pointer refers to

play26:39

and that value will then be assigned to

play26:41

the variable j

play26:44

so that is then an example of an

play26:47

explicit dereferencing it's explicit

play26:50

because we are actually using a b

play26:52

referencing operator

play26:54

implicit pointers are automatically

play26:57

dereferenced whenever they are used

play26:59

so for example in this assignment if gtr

play27:04

was

play27:05

not an

play27:06

explicit pointer we would then not have

play27:09

to provide some sort of operator in

play27:12

order to

play27:14

dereference that pointer

play27:16

the runtime system or the compiler would

play27:19

automatically determine that you are

play27:22

attempting to assign to an integer value

play27:25

and therefore the pointer needs to be

play27:27

dereferenced first in order to retrieve

play27:30

the value it is referring to

play27:35

so let's look at our previous example in

play27:38

terms of a memory diagram to more

play27:41

clearly illustrate this concept

play27:44

so we will assume that our pointer which

play27:47

is named ptr

play27:49

stores memory address which is 7080

play27:54

we then assume that we have an integer

play27:56

value of 206 that is stored at memory

play28:00

address

play28:02

7080

play28:03

so over here we can then see that

play28:06

arrangement in memory we have our

play28:09

integer value which is 206 and that is

play28:12

stored at memory location 708

play28:16

our pointer is over here and this stores

play28:19

then the memory address value 7080

play28:23

which then means we have a reference

play28:27

that points from our pointer to memory

play28:30

location 7080

play28:32

and the value stored at that location is

play28:35

206.

play28:37

so now we are performing this assignment

play28:39

over here we're assuming that j

play28:42

is an integer variable and we are now

play28:45

dereferencing our pointer

play28:48

so what this then means is we look at

play28:51

the memory address that is stored

play28:53

in our pointer variable ptr and we

play28:57

de-reference that meaning that we follow

play28:59

our pointer to memory location 7080

play29:04

and then we de-reference that by

play29:06

retrieving the value

play29:08

206. this value is then assigned to the

play29:12

variable j so here we have our variable

play29:15

j and the value 206 is then stored in

play29:19

this location in memory

play29:21

so essentially what we have done is we

play29:24

have performed a copy from the original

play29:28

memory location into a new memory

play29:31

location

play29:34

now in general pointers are considered

play29:36

to be relatively dangerous

play29:39

and unreliable and the use therefore

play29:42

should be limited as far as possible

play29:45

now why is this well one reason that

play29:49

textbook doesn't go into too much detail

play29:50

on is that pointers provide support for

play29:54

aliases so in other words we can use

play29:57

pointers to create a mechanism whereby

play30:00

multiple names can refer to the same

play30:02

value in memory what this means then is

play30:05

that we have multiple avenues through

play30:07

which this value in memory can be

play30:10

modified and therefore we have potential

play30:13

roots open through which the value can

play30:15

be corrupted

play30:16

unexpectedly if we use only one name to

play30:20

refer to a value in memory then if we

play30:23

have control over that name there's no

play30:25

way that the value that stored in memory

play30:29

can be accidentally corrupted or

play30:31

modified in some way

play30:34

now

play30:35

much more dangerous than this drawback

play30:38

is the problem of dangling pointers and

play30:41

also the problem of lost heap dynamic

play30:44

variables

play30:46

so dangling pointers

play30:48

are

play30:49

pointers

play30:50

that refer to the allocated heap dynamic

play30:54

memory variables

play30:57

so in other words we would have a

play30:59

situation where we have a pointer

play31:02

referring to some space in memory that

play31:05

has been allocated dynamically at

play31:07

runtime and this will have been

play31:09

allocated from the heap

play31:12

we then potentially through another

play31:14

point to de-allocate this memory space

play31:16

but our first pointer is still pointing

play31:19

to the location where the allocated

play31:22

memory was

play31:23

so why is this dangerous

play31:27

well i'll give you a moment to pause the

play31:29

video and to answer this question for

play31:31

yourself

play31:35

so essentially this is dangerous because

play31:39

we might then use the pointer to refer

play31:42

to what we think is memory that is still

play31:45

allocated

play31:46

but has in fact been deallocated

play31:49

this referencing then potentially could

play31:52

cause an error depending on the

play31:54

programming language or it may result in

play31:57

us referring to memory that we are not

play31:59

supposed to refer to

play32:01

which may not then cause a runtime error

play32:03

but will cause us to be working with

play32:06

data that we're not supposed to be

play32:07

working with

play32:10

now the next major drawback is lost heap

play32:13

dynamic variables

play32:15

and here we have an allocated heap

play32:18

dynamic variable that is no longer

play32:20

accessible to the user program and this

play32:23

is often referred to as garbage

play32:27

so let's look at an example of how this

play32:31

could occur in practice

play32:34

here we have a pointer this point is

play32:37

referred to as p1 and this pointer

play32:41

refers to some allocated space that has

play32:44

been dynamically allocated from the heap

play32:48

so now what we do is later on in the

play32:50

program we allocate more space on the

play32:54

heap which is represented at the bottom

play32:56

over here

play32:58

we then set p1 to refer to this

play33:02

allocated space in memory

play33:05

and at this point then of course

play33:08

the linkage between p1 and the

play33:12

originally allocated heap space has been

play33:15

broken

play33:16

now if we have no other pointer that

play33:18

refers to this initially allocated heap

play33:21

space then what we have is a lost heap

play33:24

dynamic variable there's no way to refer

play33:27

to that heap dynamic variable to that

play33:30

space that has been allocated on the

play33:33

heap

play33:34

what this means then is that space is

play33:35

effectively lost because it can never be

play33:38

de-allocated and reclaimed and over time

play33:42

if we keep doing this we may fill up the

play33:44

whole of our allocated heap memory space

play33:47

that is available for our program's

play33:51

so this process then is sometimes

play33:54

referred to as memory leakage and it can

play33:57

result in very

play34:00

difficult to debug errors

play34:03

because in general this memory space is

play34:05

just slowly disappearing and at some

play34:07

points the program will then terminate

play34:09

with some sort of fatal error because it

play34:11

has run out of memory

play34:15

so let's now look at pointers from a

play34:17

practical perspective in terms of how

play34:20

they are implemented in the c

play34:22

and c plus plus programming languages

play34:25

so in both of these languages pointers

play34:28

are incredibly flexible but they're also

play34:30

quite dangerous to work with for the

play34:33

reasons that i've previously discussed

play34:35

and therefore they should be used with

play34:38

care

play34:39

so pointers can point to any variable

play34:42

regardless of when or where the variable

play34:44

was allocated you can use a pointer to

play34:47

point to a local variable inside a

play34:49

function

play34:50

or some sort of global variable you can

play34:53

use it to point to stack allocated

play34:55

variables or heap allocated variables

play34:58

there's no limitation at all that cnc

play35:01

plus

play35:02

impose on the use of pointers in this

play35:04

regard

play35:05

now pointers are also used for both

play35:08

dynamic storage management and indirect

play35:10

addressing

play35:12

so this again points to the fact that

play35:14

there are relatively dangerous to work

play35:17

with particularly because

play35:19

pointers are used for dynamic storage

play35:21

management

play35:22

and we saw the drawbacks to that on the

play35:25

previous slide when we were talking

play35:28

about

play35:30

lost heap dynamic variables and dangling

play35:34

pointers

play35:35

now pointer arithmetic is also possible

play35:38

we'll look at what pointer arithmetic is

play35:40

in a moment

play35:42

but essentially this allows one to add

play35:44

values to pointers and you can use this

play35:47

to compute offsets um from a specific

play35:50

memory location

play35:52

and cnc plus plus also both use explicit

play35:55

dereferencing and address of operators

play35:58

so you explicitly need to indicate when

play36:01

you are declaring a pointer and if

play36:03

you're getting the address of a variable

play36:05

you need to use

play36:07

an operator

play36:08

in order to do that there is no implicit

play36:12

dereferencing

play36:14

so also what's quite interesting in cnc

play36:17

plus plus is that the domain type for a

play36:20

point doesn't need to be fixed so what

play36:22

this means is we don't necessarily need

play36:25

to declare that a pointer points to an

play36:28

integer address or a floating point

play36:31

address or a boolean address or whatever

play36:33

the case may be we can instead use a

play36:36

void pointer and this thing indicates

play36:39

that we are declaring a pointer that can

play36:41

point to a value of any particular type

play36:44

that we want so a void pointer can then

play36:47

point to an integer or a float or a

play36:49

boolean or any other type for that

play36:52

matter

play36:53

however there are some limitations on

play36:55

the use of void pointers so you can't

play36:57

de-reference a void pointer as is

play37:00

you first need to perform a cast which

play37:04

converts the void pointer to a pointer

play37:06

for a specific type so for example you

play37:09

might cast a void point to an integer

play37:12

pointer for example

play37:15

so what i would like you to do at this

play37:16

point is pause the video and try to

play37:19

explain why it makes sense that pointers

play37:22

need to be cast

play37:24

to a specific type before they can be

play37:27

dereferenced if they are void pointers

play37:30

to begin with

play37:33

now what i'd like you to also do at this

play37:35

point is again pause the video

play37:37

and try to explain how support for void

play37:41

pointers affects the type checking in c

play37:44

and c plus

play37:45

so in other words is type checking

play37:48

easier or is it more difficult because

play37:50

of the inclusion of void pointers

play37:57

so let's look at how pointers in c and c

play38:01

plus can be used for dynamic storage

play38:04

management let's first of all look at c

play38:07

plus which is the more modern of the two

play38:09

programming languages

play38:11

and in order to dynamically allocate

play38:13

storage from the heap we use the new

play38:17

special word

play38:18

so this is an example of some code that

play38:21

would allocate memory space from the

play38:24

heap at runtime

play38:25

and then assign the memory address of

play38:29

the first

play38:30

value that has been allocated to a

play38:33

pointer p

play38:34

so we must use pointers in order to work

play38:37

with memory that has been allocated on

play38:39

the heap

play38:40

so here we are declaring a variable p p

play38:44

is a pointer it's a pointer to an

play38:47

integer value

play38:49

then on the right hand side of our

play38:51

assignment we perform the actual dynamic

play38:53

storage allocation using the new special

play38:56

word and then we indicate the type of

play39:00

value that we want to allocate on the

play39:02

heap which in this case

play39:04

is an integer type

play39:06

now if we had just left at that we would

play39:09

have allocated space for a single

play39:11

integer but in this instance we don't

play39:13

want to do that we want to allocate

play39:14

space for 10 integers consecutively in

play39:18

memory so we then specify in square

play39:20

brackets the number of integer values

play39:23

that we want to allocate from the heap

play39:25

so this will then allocate space for 10

play39:27

integer values on the heap the memory

play39:30

address of the first integer value in

play39:32

that sequence of 10 will then be

play39:34

assigned to the pointer p

play39:37

so p then points to newly allocated heap

play39:41

dynamic memory

play39:44

now i've mentioned previously that with

play39:48

dynamic storage management it is the

play39:50

programmer's responsibility

play39:52

to the allocate

play39:54

memory that has been allocated from the

play39:56

heap or at least this is the case in cnc

play39:59

plus plus some programming languages do

play40:02

use garbage collectors so java for

play40:05

instance is an example of this but in

play40:07

the case of cnc plus plus because

play40:09

they're relatively low level you

play40:11

actually have to then handle the

play40:12

allocation manually yourself so this is

play40:15

done fairly simply using the delete

play40:18

special word so here we are deleting our

play40:21

pointer p and this then instructs our

play40:24

runtime system to find the memory that p

play40:27

points to and de-allocate that thus

play40:30

returning that memory space for the ten

play40:33

integers that we've allocated back to

play40:35

the heap and then they can be

play40:36

reallocated at some later stage

play40:40

now c also then has the concept of

play40:44

allocating and deallocating heap memory

play40:47

however c doesn't use the new and delete

play40:50

special words instead c uses functions

play40:54

which are provided by a library and it

play40:58

uses the malloc function to allocate

play41:00

heap memory and then the free function

play41:03

to deallocate heap memory again both

play41:06

malloc and free will

play41:10

receive pointer values to the dynamic

play41:13

memory that has been allocated

play41:17

now as i've previously mentioned

play41:19

pointers in c and c plus plus support

play41:22

pointer arithmetic so we'll now

play41:24

illustrate how pointer arithmetic works

play41:26

with a few examples

play41:29

so let's assume that we start off with

play41:31

this program code over here

play41:33

and what we are doing first of all is we

play41:35

are declaring a variable named stuff and

play41:38

this variable is an array that contains

play41:42

floating point values and we've

play41:44

indicated that we want to store 100

play41:47

separate floating point values in our

play41:49

stuff array

play41:51

next we have a pointer that we define

play41:54

our point is named p

play41:56

and the type of this pointer is a float

play41:59

pointer so in other words p points to a

play42:01

floating point value or alternatively we

play42:04

could say that p is the memory address

play42:07

of a floating point value

play42:10

now notice that p hasn't been

play42:11

initialized to any value yet so it's

play42:14

essentially an uninitialized pointer

play42:17

and then we perform an assignment so we

play42:19

are now setting p's value meaning that

play42:22

we need to assign a memory address to p

play42:27

so on the right hand side of the

play42:29

assignment we see that we have the name

play42:31

of the array stuff that appears

play42:34

and when we just simply use the array's

play42:38

name on its own without any index or

play42:41

subscript indicated then the name of the

play42:45

array is an alias for the memory address

play42:48

of the first element contained in that

play42:51

array so in other words stuff on the

play42:54

right hand side here will be the memory

play42:56

address of the first floating point

play42:58

value contained in the stuff array

play43:02

so we then assign that memory address to

play43:05

p

play43:06

p now has the memory address of the

play43:08

first element in the stuff array so

play43:10

essentially what we've done is we've

play43:11

created an alias

play43:13

and p then is a pointer and can be used

play43:16

as a substitute for the name of the

play43:20

array namely stuff both p and stuff now

play43:23

contain the memory address of the first

play43:27

element in the stuff array

play43:30

all right so now we can move on to some

play43:32

actual pointer arithmetic which is

play43:34

exactly what we are doing over here

play43:36

and here we are adding 5 to our pointer

play43:39

p so what does this mean well it means

play43:42

we are adding 5 floating point memory

play43:46

offsets to p

play43:48

so a memory offset is the size in memory

play43:53

of a particular type so we are adding

play43:56

five

play43:57

spaces that are the size of a floating

play43:59

point value 2p

play44:02

so what this thing means is we are

play44:05

accessing the sixth element in the stuff

play44:08

array because as we've seen p is an

play44:10

alias four staff

play44:12

and the first element in the stuff array

play44:15

will be at offset zero so if we add five

play44:18

offsets we're then getting the memory

play44:21

address of the sixth element in our

play44:24

stuff array so we place that then in

play44:26

parentheses and then we use the asterisk

play44:29

operator to dereference

play44:31

the memory address that we have computed

play44:33

and this will then give us the sixth

play44:36

value that is stored in the stuff array

play44:40

now there are alternative notifications

play44:43

that we or notations rather that we can

play44:45

use

play44:47

for example we can access our p array at

play44:50

subscript 5 and this essentially will

play44:52

then perform

play44:53

exactly the same pointer arithmetic that

play44:57

we did before and this is just a shorter

play44:59

hand notation

play45:02

that we can use for this alternatively

play45:04

we can use the stuff array and we can

play45:07

also access that at index or subscript

play45:10

five so all three of those approaches

play45:13

are then equivalent to one another

play45:16

we can also use a variable so in this

play45:20

case we've got exactly the same pointer

play45:22

arithmetic that we're performing except

play45:24

instead of using a literal value of 5 we

play45:27

now use a variable's value i

play45:29

and this will then allow us to specify

play45:32

exactly which element we want to access

play45:35

in the stuff array by means of i's value

play45:39

so again this is then equivalent to

play45:42

accessing the stuff array at index or

play45:45

subscript i

play45:46

or using the pointer in the same way

play45:52

i previously mentioned that c plus

play45:54

supports a reference type in addition to

play45:58

a pointer type

play45:59

so we will introduce and discuss

play46:01

references using c plus as an example

play46:06

so references are very similar to

play46:08

pointers the difference is that while a

play46:11

pointer stores the memory address of a

play46:14

value that it is referring to a

play46:16

reference is simply an alias for another

play46:20

value so both are used for indirect

play46:23

addressing however a point is a memory

play46:25

address whereas a reference is a value

play46:29

so let's look at some example program

play46:31

code over here here we are declaring a

play46:33

variable named v and it is of an integer

play46:38

type and we set its initial value to 10

play46:42

so now we perform an assignment now on

play46:45

the left hand side of the assignment we

play46:48

have a declaration for a variable and

play46:50

this variable is ref and this ampersand

play46:53

character over here indicates that ref

play46:56

is a reference variable and specifically

play46:59

it's a reference variable to an integer

play47:02

value

play47:03

so now we assign v to this reference and

play47:06

what this means is after this assignment

play47:10

ref then

play47:11

is an alias another name

play47:14

for v so in other words it directly then

play47:17

references the value 10 which is stored

play47:20

in memory so notice that we don't have a

play47:23

memory address that we are working with

play47:25

directly here

play47:27

now what are references used for in

play47:30

practice well they're used primarily for

play47:33

formal parameters and they give the

play47:35

benefits of both pass by reference and

play47:38

pass by value so what does this mean

play47:41

well let's look at that in terms of this

play47:43

code example over here

play47:45

so here we are declaring a function

play47:47

called if its return type is void

play47:51

meaning it doesn't return a value

play47:54

we receive then as a parameter a single

play47:57

parameter named p and we can see because

play48:00

of this ampersand it is a reference and

play48:03

specifically it is a reference to an

play48:07

integer value we then have the body of

play48:10

our function which we've omitted for the

play48:12

purpose of this example

play48:16

so effectively what this thing means is

play48:18

that p is another name or an alias for

play48:22

whatever parameter value has been passed

play48:25

through in a call to f and therefore any

play48:28

modification any change that is

play48:31

performed in the body of the function to

play48:33

p

play48:34

will then also modify the parameter

play48:36

value

play48:37

so let's now assume that we have this

play48:39

code over here

play48:41

which appears somewhere else in our

play48:43

program possibly in the main function

play48:46

here we are declaring a variable called

play48:48

val its type is int and its initial

play48:51

value is set to 12.

play48:54

we then call f and we pass through val

play48:57

as a parameter here so what this means

play49:00

then is that p

play49:01

is an alias another name

play49:03

for vowel

play49:05

so if we then perform some modification

play49:07

to p in the body of our function f over

play49:10

here

play49:11

let's say for example we increment p's

play49:14

value then we are actually incrementing

play49:17

val's value and that will then be

play49:19

incremented from 12 to 13.

play49:23

so effectively then what we've done here

play49:26

is we've created a reference p

play49:28

to a parameter and it works like an

play49:30

integer variable so that is in the

play49:33

advantage of pass by value because we

play49:37

don't need to dereference p at all in

play49:41

the body of our function we just simply

play49:43

use it directly as though it were a

play49:45

value

play49:46

but we don't need then to dereference

play49:49

this

play49:50

however it is a reference even though we

play49:53

aren't directly working with the memory

play49:55

address and dereferencing the value so

play49:58

this gives us the advantage of pass by

play50:02

reference

play50:03

what this means then is that we're not

play50:05

actually copying val into the parameter

play50:09

and therefore the body of this function

play50:12

we are just working with a reference

play50:15

so what this then means is that memory

play50:18

is conserved because we're not creating

play50:20

unnecessary copies of values

play50:24

now another big advantage associated

play50:27

with reference types is that you can't

play50:29

perform pointer arithmetic on them and

play50:31

therefore they're much safer than

play50:33

pointers

play50:35

for example if we

play50:37

look at the example we worked through on

play50:39

the previous slide where we were using

play50:42

pointer arithmetic in order to access a

play50:44

particular value in an array of floating

play50:47

point values we could accidentally add

play50:50

an offset that is too large and this

play50:53

would then cause us to access a value

play50:55

beyond the bounds of the array this kind

play50:58

of thing isn't possible with reference

play51:00

types and the reason for that is that we

play51:03

can't perform pointer arithmetic on

play51:05

references

play51:07

it is important to note however that we

play51:09

can still have dangling references and

play51:12

we can also have memory leaks that are

play51:15

caused through the use of references so

play51:19

those two disadvantages associated with

play51:21

pointers are not eliminated by the use

play51:24

of references

play51:28

java also supports references which are

play51:31

an extended form of c plus references

play51:35

and in fact only references are

play51:37

supported and not pointed

play51:41

now references can refer only to objects

play51:44

and not to primitives such as integers

play51:48

or floats or boolean values

play51:50

so what i would like you to do at this

play51:52

point is pause the video and try to

play51:55

think how these two factors

play51:58

that we've discussed related to java's

play52:01

references affect the programming

play52:03

language evaluation criteria that we've

play52:05

been using through this course

play52:10

c-sharp

play52:11

includes support for both pointers and

play52:14

references so again we see c-sharp

play52:17

trying to be as general as possible

play52:20

and supporting as many features

play52:23

and from both java and c plus as

play52:26

possible

play52:27

so the references that c-sharp supports

play52:30

very similar to those that we see in

play52:33

java

play52:34

and the pointers are quite similar to c

play52:37

plus plus pointers but one can only use

play52:39

them in such programs that the

play52:42

programmer has explicitly marked as

play52:44

unsafe so what i would like you to do

play52:47

again at this point is pause the video

play52:50

and try to answer how these details

play52:53

related to c sharps pointers and

play52:55

references affect the programming

play52:57

language evaluation criteria we've been

play52:59

using

play53:01

specifically

play53:02

focus on the fact that subprograms must

play53:05

be marked as unsafe if pointers are used

play53:09

within them

play53:14

let's look at a general evaluation of

play53:17

pointers

play53:18

from an overall perspective

play53:21

so there are a couple of problems

play53:23

associated with pointers and with

play53:25

references however these problems more

play53:28

detrimentally affect pointers than

play53:31

references

play53:32

so we've discussed problems related to

play53:36

pointers and most importantly there we

play53:39

saw that we have the problems of

play53:41

dangling pointers and lost heap dynamic

play53:45

variables these are major drawbacks to

play53:47

the use of both pointers and references

play53:51

now if pointers are used for dynamic

play53:54

memory management in other words

play53:57

they are used to refer to heap memory

play54:00

that has been dynamically allocated by

play54:02

the programmer then manual heap

play54:05

management is necessary one needs to

play54:08

allocate and de-allocate memory that one

play54:11

is working with on the heap so what i

play54:13

would like you to do at this point is

play54:15

pause the video and try to answer how

play54:18

this manual heap management

play54:20

affects the programming language

play54:22

evaluation criteria that we've been

play54:25

using through this course

play54:29

now the textbook makes quite an

play54:31

interesting observation it states that

play54:33

pointers are like the go-to statement in

play54:37

a lot of ways

play54:38

so why does the textbook say this well

play54:41

if we use a go-to statement then this

play54:44

means that there is an increase in the

play54:47

number of routes that one can use in

play54:49

order to access a specific line or

play54:53

statement within our program

play54:56

and this is because with a go to label

play54:59

we can jump

play55:01

to it from any line within our program

play55:05

effectively

play55:07

so in a similar fashion

play55:09

pointers then increase or widen the

play55:12

range of memory cells that are

play55:15

accessible by a particular variable

play55:20

now

play55:20

unfortunately dynamic data structures

play55:23

things like linked lists and trees and

play55:27

graphs and so on

play55:29

require dynamic memory management

play55:31

because these are data structures that

play55:33

can grow and shrink over time as nodes

play55:36

or elements within the data structure

play55:38

are added and removed

play55:40

so support for dynamic data structures

play55:42

does require the pointers or references

play55:46

and so we can't completely avoid them so

play55:49

in general the majority of modern

play55:52

high-level programming languages must

play55:54

support at least references in order to

play55:57

allow for rich data structures to be

play56:00

supported

play56:04

we'll now move on to the concept of type

play56:07

checking

play56:08

now type checking involves operators and

play56:11

operands

play56:13

so we'll of course then consider

play56:14

operators and operands in their normal

play56:16

sense for example if we are adding two

play56:20

values together then we have an addition

play56:22

operator and then the values that appear

play56:25

on the left and right hand side of the

play56:26

addition operator are the operands

play56:30

however type checking also applies to

play56:33

both sub-programs and assignments so we

play56:36

also then generalize the idea of

play56:38

operators and operands to work in these

play56:42

contexts as well so if we have a

play56:44

subprogram call such as this over here

play56:46

where we're calling a subprogram called

play56:49

if and we're passing through a parameter

play56:51

of 1.2

play56:53

then we consider the operator to be if

play56:57

because this is the thing doing the

play56:59

actual

play57:00

computation and the operand in other

play57:03

words the data that the operator works

play57:05

with would be the parameter value of 1.2

play57:10

in a similar fashion with an assignment

play57:12

here we have an example assignment then

play57:15

the operator is the assignment operator

play57:18

or the symbol that's used for the

play57:20

assignment and the operands will then be

play57:23

the values that appear on the left and

play57:24

right hand side so on the left hand side

play57:27

we have a variable a and on the right

play57:29

hand side we have a literal integer

play57:33

which represents the value of 43.

play57:37

so if we now consider operands and

play57:40

operators to be as i've just defined

play57:42

them and that's only for this section

play57:45

that we will consider operators and

play57:47

operands

play57:48

according to this wider definition

play57:51

then type checking is the activity that

play57:55

ensures that the operands of an operator

play57:58

of compatible types and if there are

play58:02

compatible types then the operation can

play58:05

proceed

play58:06

so a compatible type in this context

play58:09

is then a type that is either legal for

play58:14

the operator

play58:15

or it is allowed under the language

play58:18

rules to be implicitly converted in

play58:20

other words coerced by the compiler to a

play58:24

legal type for the operation

play58:26

so for example compatible types will

play58:29

exist if we are adding two integer

play58:32

values together

play58:34

that will then be a legal use of types

play58:39

in relation to the addition operator

play58:42

alternatively we can add an integer and

play58:44

a floating point value and in that case

play58:47

the integer will then be automatically

play58:50

converted by means of coercion to a

play58:53

floating point value so that the

play58:55

addition can then proceed

play58:57

so in both of these cases we then have

play59:00

compatible types for our operands

play59:04

now a type error then arises

play59:08

if we

play59:09

apply an operator to an operand of an

play59:13

inappropriate type and this will then

play59:16

generally cause some kind of error it

play59:18

might be a compilation error or it might

play59:20

be a runtime area depending on the

play59:23

specific programming language

play59:27

now type checking can either be static

play59:30

or dynamic if we have static type

play59:32

checking then type checking can be done

play59:35

before runtime and it will be picked up

play59:38

then by a compiler

play59:41

if we have

play59:42

dynamic type checking then type checking

play59:45

needs to occur at runtime and there may

play59:48

therefore be a runtime type error that

play59:51

may occur

play59:53

so if all of our type bindings are

play59:55

static then almost all of our type

play59:58

checking can also be done statically

play60:01

and conversely if our type bindings are

play60:04

dynamic

play60:05

in nature then our type checking must be

play60:09

dynamic as well

play60:11

so what i would like you to do at this

play60:12

point is pause the video and try to

play60:15

explain why this is the case

play60:18

for both static type bindings and for

play60:21

dynamic type bindings

play60:26

alright so we can then refer to a

play60:29

programming language as strongly typed

play60:32

if it will always detect type errors

play60:35

now the advantage of this is that all

play60:38

variable misuses that cause type errors

play60:41

can then be detected

play60:44

now usually we can't refer to a

play60:47

programming language as being purely

play60:49

strongly typed or purely weakly typed

play60:52

usually we need to describe the strength

play60:56

of the typing in a relative fashion so

play60:58

comparing one programming languages type

play61:01

system to another programming languages

play61:04

type system so we can then say something

play61:07

like language a is more strongly typed

play61:10

than language b

play61:15

let's now look at strong versus weak

play61:18

typing in the context of actual concrete

play61:21

programming language examples so c and c

play61:25

plus are not very strongly typed at all

play61:27

despite the fact that types are

play61:29

statically bound in these programming

play61:31

languages

play61:33

now there are a variety of reasons why

play61:35

cnc plus are not very strongly typed one

play61:38

of them is that in early versions of c

play61:41

so these are versions of c that come

play61:43

before the c99 standard parameter type

play61:46

checking could be completely avoided

play61:49

so this would allow one to write for

play61:51

example a function that would receive an

play61:54

integer value as a parameter

play61:56

and one could then pass some other value

play61:59

to the function for example a floating

play62:02

point value

play62:03

now what would happen then in this case

play62:06

is that the value would not be cast to

play62:08

an integer as one would expect in a

play62:10

modern programming language but instead

play62:13

the value that had been passed through

play62:15

as a parameter for instance a floating

play62:17

point value would simply be interpreted

play62:19

as an integer on a bit level so this

play62:22

would then not raise a type error

play62:25

instead we would simply then

play62:28

get some kind of unexpected value back

play62:32

instead of something resembling the

play62:35

float value that we had actually passed

play62:37

to the function

play62:39

also both c and c plus plus

play62:41

include support for unions and as we've

play62:44

seen before these unions are free unions

play62:47

and can therefore not be type checked

play62:50

so what i would like you to do at this

play62:52

point is to pause the video once more

play62:54

and think of other reasons that c or c

play62:58

plus plus are strongly typed

play63:00

think of some of the previous

play63:02

discussions that we've had through the

play63:04

course of our treatment on this chapter

play63:10

now if we look at java and c sharp we

play63:13

see that implicit type areas will always

play63:16

be detected however explicit casting can

play63:19

result in type areas

play63:22

so for instance we can have an object

play63:25

reference and then this object reference

play63:28

can be cast to an invalid

play63:31

class instance this will then generate a

play63:35

type error but this will only be

play63:37

detected at run time

play63:40

and an exception then will be thrown in

play63:42

this case which one would then have to

play63:44

catch and potentially handle in some way

play63:47

although generally speaking these kinds

play63:50

of casting errors

play63:52

are areas that one cannot recover from

play63:55

so as a result of this and c sharp are

play63:58

much more strongly typed than c and c

play64:01

plus plus but they are still not

play64:03

perfectly strongly typed

play64:06

now ml and f sharp are very strongly

play64:09

typed much more strongly typed than java

play64:12

and c-sharp are in practice

play64:18

type coercion can be considered to

play64:21

weaken typing in a programming language

play64:24

so we've already touched on coercion a

play64:27

moment ago

play64:29

but to recap on that coercion is an

play64:32

automatic conversion of operands to the

play64:36

same type so that they work for a

play64:38

particular operator

play64:40

now coercion can cause accidental type

play64:44

mismatches to go undetected so let's

play64:47

look at this in terms of an example

play64:48

let's assume that we have two variables

play64:50

a and b and the types of these variables

play64:53

will be int in both of those cases we

play64:57

also then have a third variable d and

play64:59

its type is float

play65:02

so now assume that the programmer

play65:04

intends to add a and b together both of

play65:07

which are integer variables but they're

play65:10

typing quickly so they mistype b

play65:12

as d

play65:15

now the compiler or the interpreter

play65:17

depending on the programming language if

play65:19

it supports coercions will see this

play65:22

mismatch in type and will convert the

play65:25

integer

play65:26

operand which will be a

play65:29

to a floating point value so that it

play65:32

matches with the type of d

play65:35

so the mistyping error then won't be

play65:38

detected the compiler won't raise an

play65:40

error indicating that you've perhaps

play65:43

made a mistake because one of the

play65:46

operands is a floating point value and

play65:49

not an integer value or vice versa

play65:54

so let's look at how coercions affect

play65:58

typing in some actual concrete language

play66:00

examples

play66:02

so c plus plus is a very large set of

play66:04

coercions that can take place and this

play66:07

then means that it is fairly weakly

play66:10

typed

play66:10

and it is also then as a result a lot

play66:14

less reliable

play66:15

now ada has very few coercions in fact

play66:18

it has almost no coercions at all so

play66:20

this means that ada is much more

play66:23

strongly typed than c plus plus is and

play66:26

therefore much more reliable

play66:29

jarvis is somewhere in between so it has

play66:31

about half of c plus its assignment

play66:34

coercions

play66:35

so therefore it is more strongly typed

play66:37

in c plus but still less strongly typed

play66:40

than ada which has almost no coercions

play66:44

at all

play66:47

so this brings us to a related concept

play66:50

which is type equivalence

play66:52

so what is type equivalence well it

play66:54

defines when operands of two types can

play66:58

be substitutable with no coercion taking

play67:01

place

play67:03

so

play67:04

what type equivalence then does is it

play67:06

defines the types of operands that are

play67:09

acceptable for all operators

play67:13

now for the predefined scalar types

play67:15

things like integers for instance

play67:19

um

play67:20

these rules are very simple and rigid so

play67:23

usually a type is only equivalent to

play67:26

itself so for instance we can only add

play67:29

one integer to another integer if no

play67:32

coercion is going to take place if any

play67:34

other type value is used in the addition

play67:38

then there must be a coercion that takes

play67:41

place however for struct types and some

play67:45

user-defined types the type equivalence

play67:48

rules are more complex and we're going

play67:50

to look at two ways that type

play67:52

equivalence can be handled

play67:56

the easiest and most straightforward

play67:58

approach to dealing with type

play68:00

equivalence is referred to as name type

play68:03

equivalence so with name type

play68:05

equivalents two variables are considered

play68:07

to have equivalent types if they both

play68:10

have exactly the same type

play68:13

we're not looking at the structure of

play68:15

the two variables at all

play68:17

so this will then be the case if either

play68:20

both variables appear in exactly the

play68:22

same declaration in which case they both

play68:25

have the same type or they appear in

play68:27

separate declarations but both of these

play68:30

two declarations then have exactly the

play68:33

same type name

play68:35

now name type equivalence is very easy

play68:38

to implement

play68:39

for instance if we have two variables

play68:42

and both of those variables are record

play68:45

variables then the two variables are

play68:47

only equivalent if they are both exactly

play68:51

the same record type

play68:55

however we don't look at the structure

play68:58

of these records at all so therefore

play69:00

it's very efficient and very

play69:02

straightforward to implement however

play69:05

name type equivalence is very

play69:07

restrictive because it requires an exact

play69:09

match between the types

play69:12

so for example if we are looking at

play69:14

user-defined ordinal types in other

play69:17

words we are looking at enumerations

play69:21

then an enumeration is not equivalent to

play69:24

an integer type and vice versa we saw

play69:27

when we were discussing enumerations

play69:29

that they are

play69:31

interpretable as integer values however

play69:35

if we implement name type equivalents

play69:37

then because an integer type is not the

play69:40

same thing as

play69:42

an enumeration type those types are then

play69:45

not equivalent to each other this would

play69:47

then for example restrict us from

play69:50

assigning an integer value to an

play69:53

enumeration type variable or vice versa

play69:58

also if we're looking at subprograms

play70:00

such as functions then the formal

play70:02

parameters must be exactly the same type

play70:05

as their corresponding actual parameters

play70:10

the second and more complex approach to

play70:13

type equivalence is referred to as

play70:15

structure type equivalence

play70:17

so here two variables are considered to

play70:20

have identical types if their types have

play70:23

identical structures now this is much

play70:26

more flexible for example we can assign

play70:30

a variable of one record type to a

play70:33

variable of another record type and

play70:36

these can be different record types as

play70:38

long as the fields contained within the

play70:41

records have exactly the same types and

play70:44

typically appear in exactly the same

play70:47

order

play70:48

however structure type equivalence is

play70:50

much harder to implement and the reason

play70:52

for this is we're not simply looking for

play70:54

a match between two types when that we

play70:57

now actually have to pass through the

play71:00

internal content of each of the

play71:03

variables in order to determine whether

play71:05

they are structurally equivalent to one

play71:08

another

play71:10

the complexity of structure type

play71:13

equivalence leads to potential problems

play71:17

and a large number of questions that

play71:20

need to be answered by the programming

play71:23

language designers

play71:24

so here are a few few examples of some

play71:27

of these questions and potential

play71:29

drawbacks that we might encounter

play71:32

firstly if we're working with record

play71:34

types then are two records considered

play71:37

equivalent if the fields are exactly the

play71:40

same

play71:40

and they have exactly the same types and

play71:42

appear in the same order but the names

play71:45

of the fields are different this is a

play71:47

question that needs to be answered at

play71:49

language design time

play71:51

and secondly if we're working with array

play71:53

types then our two array types concept

play71:56

to be equivalent if they are exactly the

play71:59

same in other words they store exactly

play72:01

the same type and they have the same

play72:03

number of values however their base

play72:06

subscripts differ so for example

play72:10

would

play72:11

array 1 and array 2 in this example be

play72:14

considered equivalent we can see that

play72:16

both of them contain 10 integer values

play72:19

however array 1 is indexed from index 1

play72:23

to index 10 whereas array 2 is indexed

play72:26

from index 0 to index 9. should these

play72:30

two arrays be considered structurally

play72:32

type equivalent to one another then if

play72:36

we're working with enumeration types

play72:38

should two enumeration types be

play72:40

considered equivalent if they differ

play72:43

only in the names of their constants in

play72:45

other words they have exactly the same

play72:46

number of constants but the constants

play72:49

are named differently should they be

play72:51

considered equivalent to each other or

play72:53

not

play72:55

now with structure type equivalence as

play72:58

we've seen the compiler or the

play73:00

interpreter can't differentiate between

play73:03

types with exactly the same structure

play73:06

and this is by definition

play73:08

so for example if we have a floating

play73:10

point value that represents miles per

play73:12

hour and then a separate floating point

play73:15

value that represents kilometers per

play73:17

hour the compiler or the interpreter

play73:20

can't tell these apart from each other

play73:23

only the programmer can actually tell

play73:25

them apart so these are all potential

play73:28

questions that need to be answered and

play73:30

possible drawbacks associated with

play73:33

implementing structure type equivalence

play73:35

in a programming language

play73:39

finally we'll conclude this lecture by

play73:41

very briefly looking at data type theory

play73:45

so type theory is a very broad area of

play73:48

study and it exists within multiple

play73:52

fields

play73:53

including mathematics logic computer

play73:56

science and even philosophy

play73:59

so the two branches of type theory

play74:02

firstly there's practical type theory

play74:04

and this is concerned with data types in

play74:06

commercial

play74:08

real world programming languages and

play74:10

then there is abstract type theory which

play74:14

often concerns itself with typed lambda

play74:17

calculus and in general abstract type

play74:20

theory is only really of interest to

play74:22

theoretical computer scientists

play74:25

now broadly speaking a type system then

play74:28

in the context of data type theory is

play74:31

considered to be a set of types and then

play74:34

the rules that govern the types as they

play74:37

are used in programs

play74:40

so essentially what we've been

play74:41

discussing through this chapter has been

play74:44

been type systems

play74:47

all right so that then concludes our

play74:50

discussion on chapter six in the next

play74:52

lecture we'll start with chapter

play74:55

we will look at expressions and

play74:58

assignment statements

Rate This

5.0 / 5 (0 votes)

Related Tags
Data TypesType CheckingUnion TypesPointersReferencesMemory ManagementStrong TypingWeak TypingProgramming ConceptsLecture Series