Lecture-8 Introduction to Databases: Relational Algebra - Select, project, join
Summary
TLDRThis video introduces relational algebra, a formal language foundational to implemented languages like SQL. It covers basic relational algebra concepts and popular operators, including the select, project, and cross-product operators. The video explains how to filter, slice, and combine relations using these operators. It also discusses the natural join operator, which simplifies combining relations by enforcing equality on shared attributes. The script sets the stage for a deeper exploration of additional operators and alternative notations in a subsequent video.
Takeaways
- 📚 Relational algebra is a formal language used to perform queries on relational databases, forming the basis for languages like SQL.
- 🔍 Queries in relational algebra operate on relations and produce new relations as results, allowing for complex data manipulation.
- 📈 The Select operator (σ) is used to filter rows in a relation based on a given condition, denoted with a subscript for the condition.
- 📊 The Project operator (π) allows for the selection of specific columns from a relation, specified by listing the column names in the subscript.
- 🔗 The Cross-Product operator combines every possible pair of tuples from two relations, resulting in a new relation with a schema that is the union of the original schemas.
- 🔑 The Natural Join operator is a convenient way to combine relations by enforcing equality on shared attributes and removing duplicate columns.
- 🔄 Duplicate values are eliminated in relational algebra results, differing from SQL which is based on multisets and retains duplicates.
- 🎯 The Theta Join operator is an abbreviation in relational algebra that combines relations based on a specific condition, akin to the basic join operation in many database systems.
- 📝 The script introduces the basics of relational algebra and sets the stage for learning additional operators and notations in a subsequent video.
- 📌 Keys in a relation are attributes or sets of attributes that are guaranteed to be unique, which is crucial for operations like joins and selections.
Q & A
What is relational algebra?
-Relational algebra is a formal language used for expressing queries on relational databases. It forms the underpinnings of implemented languages like SQL.
What are the basic operations relational algebra performs on relations?
-Relational algebra operates on relations and produces relations as a result. It can filter, slice, and combine relations using various operators.
What is the simplest query in relational algebra?
-The simplest query in relational algebra is the name of a relation itself, which returns a copy of the entire relation.
What does the Select operator in relational algebra do?
-The Select operator is used to filter certain rows out of a relation based on a specified condition, denoted by a Sigma with a subscript for the condition.
How can you filter students with a GPA greater than 3.7 using relational algebra?
-You can use the Select operator with the condition Sigma GPA > 3.7 over the student relation to filter out students with a GPA greater than 3.7.
What is the purpose of the Project operator in relational algebra?
-The Project operator is used to select specific columns from a relation, denoted by the Greek PI symbol with a subscript listing the desired column names.
How does the cross-product operator work in relational algebra?
-The cross-product operator combines every tuple from one relation with every tuple from another relation, resulting in a new relation with a schema that is the union of the two input schemas.
What is the difference between a cross-product and a natural join in relational algebra?
-A natural join is a specialized form of a cross-product that automatically enforces equality on shared attributes and removes duplicate columns, whereas a cross-product simply combines tuples without any conditions.
How does the theta join operator differ from the natural join?
-The theta join operator is equivalent to applying a specific condition (theta) to the cross product of two relations, whereas the natural join automatically enforces equality on all attributes with the same name.
Why is the natural join considered convenient in relational algebra?
-The natural join is convenient because it simplifies the process of combining relations with shared attributes by automatically enforcing equality and removing duplicate columns without the need for explicit conditions.
What is the significance of relational algebra in database management systems?
-Relational algebra is significant in database management systems because it provides a formal foundation for query languages like SQL and helps in understanding the underlying operations for combining and manipulating relations.
Outlines
📚 Introduction to Relational Algebra
This paragraph introduces relational algebra, a formal language that forms the basis of implemented languages like SQL. The video will cover the basics of relational algebra and its popular operators. It reviews that queries on relational databases operate on and produce relations. The example of a college admissions database with three relations is introduced: 'college', 'student', and 'apply'. The concept of keys as unique attributes is explained. The simplest query in relational algebra is presented as the name of a relation, such as 'student', which returns a copy of the relation. The paragraph sets the stage for exploring relational algebra operators in detail.
🔍 Select and Project Operators in Relational Algebra
The paragraph explains the Select and Project operators in relational algebra. The Select operator, denoted by Sigma (Σ), is used to filter rows based on a condition. Examples are given for selecting students with a GPA greater than 3.7 and filtering applications to Stanford for a CS major. The Project operator, denoted by Pi (Π), is used to select specific columns from a relation. It is demonstrated how to compose operators, such as applying a Project operator to the result of a Select operator. The paragraph also discusses the elimination of duplicate values in relational algebra results, contrasting it with SQL's handling of duplicates.
🔗 Cross-Product and Natural Join Operators
This paragraph delves into the Cross-Product operator, which combines every tuple from one relation with every tuple from another, creating a new relation with a schema that is the union of the two original schemas. The Natural Join operator is introduced as a convenient way to perform a cross-product followed by a selection based on equality of common attributes. The paragraph illustrates how these operators can be used to answer complex queries, such as finding the names and GPAs of students from large high schools who applied to CS and were rejected, and extends the query to include applications to colleges with enrollment over 20,000.
🔄 Theta Join and Conclusion of Relational Algebra Basics
The final paragraph introduces the Theta Join operator, which is equivalent to applying a selection condition to the cross-product of two relations. It explains that while the Natural Join simplifies notation, it does not add expressive power to relational algebra. The paragraph concludes by summarizing that relational algebra operates on sets of relations and produces relations as a result, using various operators to filter, slice, and combine relations. It sets the stage for the next video, which will cover additional operators and alternative notations for relational algebra expressions.
Mindmap
Keywords
💡Relational Algebra
💡Relation
💡Select Operator
💡Project Operator
💡Cross-Product Operator
💡Natural Join
💡Theta Join
💡Duplicate Values
💡Keys
💡Schema
Highlights
Relational algebra is a formal language that underpins implemented languages like SQL.
Queries in relational algebra operate on and produce relations.
The simplest query in relational algebra is the name of a relation, producing a copy of that relation.
The Select operator (σ) is used to filter rows based on a condition.
Multiple conditions in the Select operator are combined using the logical AND.
The Project operator (π) is used to select specific columns from a relation.
Operators can be composed to filter rows and select columns simultaneously.
Relational algebra eliminates duplicates in query results, unlike SQL which is based on multisets.
The cross-product operator combines every possible pair of tuples from two relations.
The natural join operator combines relations by enforcing equality on shared attributes and removes duplicate columns.
Natural join is a convenient notation but does not add expressive power to relational algebra.
The theta join operator is equivalent to applying a selection condition to the cross product of two relations.
Theta join is the basic operation for combining relations in most database management systems.
Relational algebra is used to express queries in a formal way, using a set of operators to manipulate relations.
The video series will cover additional operators and alternative notations for relational algebra in the next part.
Transcripts
this is the first of two videos where we
learn about relational algebra
relational algebra is a formal language
it's an algebra that forms the
underpinnings of implemented languages
like sequel in this video we're going to
learn the basics of the relational
algebra query language and a few of the
most popular operators in the second
video we'll learn some additional
operators and some alternate notation
notations for relational algebra now
let's just review first from our
previous video on relational querying
that queries over relational databases
operate on relations and they also
produce relations as a result so if we
write a query that operates say on the
three relations depicted here the result
of that query is going to be a new
relation and in fact we can pose queries
on that new relation or combine that new
relation with our previous relations so
let's start out with relational algebra
for the examples in this video we're
going to be using a simple college
admissions database with three relations
the first relation the college relation
contains information about the college
name state and enrollment of the college
the second relation the student relation
contains an ID for each student
the students name GPA and the size of
the high school they attended and
finally the third relation contains
information about students applying to
colleges specifically the students I
need the college name where they're
applying the major they're applying for
and the decision of that application
I've underlined the keys for these three
relations as a reminder a key is an
attribute or a set of attributes whose
value is guarantee be guaranteed to be
unique so for our examples we're going
to assume that college names are unique
student IDs are unique and that students
will only apply to each college for a
particular major one time so we're going
to have a picture of these three
relations at the bottom of the slides
throughout the video the simplest query
in relational algebra is a query that is
simply the name of a relation so for
example we can write a query student and
that's a valid expression in relational
algebra if we run that query on our
database we'll get as a result a copy of
the student relation pretty
straightforward
now what happens next
is that we're going to use operators of
the relational algebra to filter
relations slice relations and combine
relations so let's go through those
operators the first operator is the
Select operator so the Select operator
is used to pick certain rows out of a
relation the Select operator is denoted
by a Sigma with a subscript that's the
condition that's used to filter the rows
that we extract from the relations so
we're just going to go through three
examples here the first example says
that we want to find the students whose
GPA is greater than 3.7 so to write that
expression in relational algebra we
write the Sigma which is the selection
operator as a subscript the condition
that we're filtering for GPA greater
than 3.7 and the relation over which
we're applying that selection predicate
so this expression will return a subset
of the student table containing those
rows where the GPA is greater than 3.7
if we want to filter for two conditions
we just do an end of the conditions in
the subscript of the Sigma so if we want
say students whose GPA is greater than
3.7 and it's high school size is less
than a thousand we'll write select GPA
greater than 3.7 we'll use the logical
and operator a caret high school size is
less than a thousand and again we'll
apply that to the student relation and
once again the result of that will be a
subset of the student relation
containing the rows that satisfy the
condition if we want to find the
applications to Stanford for a CS major
then we'll be applying a selection
condition to the apply relation again we
write the Sigma and now the subscript is
going to say that the college name is
Stanford and the major is CS again the
and operator and that will be applied to
the apply relation and they will return
as a result a subset of the apply
relation
so the general case of the select
operator is that we have this Sigma we
have a condition as a subscript and then
we have a relation name and we return as
a result the subset of the relation our
next operator is the project operator so
the select operator picks certain rows
and the project operator picks
and columns so let's say we're
interested in the applications but all
we wanted to know is the list of ID's
and the decisions for those applications
the project operator is written using
the Greek PI symbol and now the
subscript is a list of the column names
that we would like to extract so we
write ID
sorry student ID and decision and we
apply that to the apply relation again
and now what we get back is a relation
that has just two rows it's going to
have all the tuples of apply but it's
only going to have the student ID and
the decision columns so the general case
of a project operator is the projection
and then a list of attributes can be any
number and then a relation name now what
if we're interested in picking both rows
and columns at the same time so you want
only some of the rows and we want only
some of the columns now we're going to
compose operators remember that
relational queries produce relations so
we can write a query say with a select
operator of the students whose GPA is
greater than three point seven this is
how we do that and now we can take that
whole expression which produces a
relation and we can apply the project
operator to that and we can get out the
student ID and the student name okay so
what we actually see now is that the
general case of the selection and
projection operators weren't quite what
I told you at first I was deceiving you
slightly when we write the Select
operator it's a select with a condition
on any expression of the relational
algebra and if it's a big one we might
want to put parens on it and similarly
the project operator is a list of
attributes from any expression of the
relational algebra and we can compose
these as much as we want we can have
select over projective or select select
project and so on now let's talk about
duplicate values in the results of
relational algebra queries let's suppose
we ask for a list of the app of the
majors that people have applied for and
the decision for those majors so we
write that as the project
the major and the decision on the apply
relation you may think that when we get
the results of this query we're going to
have a lot of duplicate values so we'll
have CS yes CS yes CS no EES EE no and
so on you can imagine in a large
realistic database of applications
there's going to be hundreds of people
applying for majors and having a yes or
a no decision the semantics of
relational algebra says that the
duplicates are always eliminated so if
you run a query that would logically
have a lot of duplicate values you just
get one value for each result that's
actually a bit of a difference with the
sequel language so sequel is based on
what's known as multi sets or bags and
that means that we don't eliminate
duplicates whereas relational algebra is
based on sets themselves and duplicates
are eliminated
there is a multicenter bag relational
algebra defined as well but will be fine
by just considering the set relational
algebra in these videos
our first operator that combines two
relations is the cross-product operator
also known as the Cartesian product what
this operator does is it takes two
relations it kind of glues them together
so that their schema of the result is
the union of the schemas of the two
relations and the contents of the result
are every combination of tuples from
those relations this is in fact the
normal set cross-product that you might
have learned way back in elementary
school so let's talk about say doing the
cross-product of student and apply so if
we do this cross-product just to save
drawing I'm gonna just kind of glue
these two relations together here so if
we do the cross-product we'll get as a
result a big relation here which is
going to have eight attributes the eight
attributes across the student and apply
now the only small little trick is that
when we glue two relations together
sometimes they'll have the same
attribute name we can see we have si D
on both sides so just as a notational
convention when cross-product is done
and there's two attributes that are
named they're prefaced with the name of
the relation they came from so this one
would be referred to in the
cross-product as the student dot si D
where this one over here would be
referred to as the apply dot si D so
again we blew together in the Cartesian
product the two relations with
four attributes each we get a result
with eight attributes now let's talk
about the contents of these so let's
suppose that the student relation had
estoppels in it and that's how many
tuples while the apply had a tuples in
it the result of the Cartesian product
is going to have s times a tuples it's
going to have one tupple for every
combination of tuples from the student
relation and the apply relation now the
cross product seems like it might not be
that helpful but what is interesting is
when we use the cross product together
with other operators and let's see a big
example of that let's suppose that we
want to get the names of GPAs of
students with a high school size greater
than a thousand who applied to see us
and were rejected okay so let's take a
look we're going to have to access the
students and the apply records in order
to run this query so what we'll do is
we'll take student cross apply as our
starting point so now we have a big
relation that contains eight attributes
and all of those tuples that we
described previously but now we're going
to start making things more interesting
because what we're going to do is a big
selection over this relation and that
selection is first of all going to make
sure that it only combines student and
apply tuples that are referring to the
same student so to do that we write
student dot s ID equals apply dot s ID
so now we've filtered the result of that
cross-product to only include
combinations of student and apply tuples
that makes sense now we have to do a
little bit of additional filtering we
said that we want the high school size
to be greater than a thousand so we do a
little and operator in the high school
we want them to have applied to CS so
that's an major equals CS we're getting
a nice big query here and finally we
want them to have been rejected so and
decision equals we'll just use an R for
reject so now we've got that gigantic
query but that gives us exactly what we
want except for one more thing which as
I said all we want is their names and
GPAs so finally we take a big
parentheses around here and we apply to
that the projection operator getting at
the student name
the GPA and that is the relational
algebra expression that produces the
query that we've written in English now
we've seen how the cross product allows
us to combine tuples and then apply
selection conditions to get meaningful
combinations of tuples it turns out that
relational algebra includes an operator
called the natural join that is used
pretty much for the exact purpose what
the natural join does is it performs a
cross-product but then it enforces
equality on all of the attributes with
the same name so if we set up our schema
properly for example we have student ID
and student ID here meaning the same
thing then when the cross product is
created it's only going to combine
tuples where the student ID is the same
and furthermore if we add collagen we
can see that we have the college name
here in the college name here if we
combine college and apply tuples we'll
only combine tuples that are talking
about the same College
now in addition one more thing that it
does is it gets rid of these pesky
attributes that have the same names so
since when we combine for example
student and apply with the natural join
we're only combining case tuples where
the student s ID is the same as the
apply s ID then we don't need to keep
two columns through copies of that
column because the values are always
going to be equal so the natural join
operator is written using a bowtie
that's just the convention you will find
that in your text editing programs if
you look carefully so let's do some
examples now let's go back to our Sam
query where we were finding the names
and GPAs of students in from large high
schools who applied to CS and were
rejected so now instead of using the
cross-product we're going to use the
natural join which as I said was written
with a bowtie
what that allows us to do once we do
that natural join is we don't have to
write that condition that enforced
equality on those two attributes because
it's going to do it itself and once
we've done that then all we need to do
is apply the rest of our conditions
which were that the high school is
greater than a thousand and the major is
CS and the decision is reject again
we'll call that R and then since we're
only getting the names and
pas we write the student name and the
GPA okay and that's the result of the
query using a natural join so as you can
see that's a little bit simpler than the
original with the cross-product and by
setting up schemas correctly natural
join can be very useful now let's add
one more complication to our query let's
suppose that we're only interested in
applications to colleges where the
enrollment is greater than 20,000 so so
far in our expression we've referred to
the student relation in the apply
relation but we haven't used the college
relation but if we want to have a filter
on enrollment we're gonna have to bring
the college up the college relation into
the picture this turns out to perhaps be
easier than you think
let's just erase a couple of our
parentheses here and what we're going to
do is we're going to join in the college
relation with the two relations we have
already now technically the natural join
is a binary operator people often use it
without parentheses because it's
associative but if we get pedantic about
it we could add that and then we're in
good shape now we've joined all three
relations together and remember
automatically the natural join enforces
equality on the shared attributes very
specifically the college name here is
going to be set equal to the apply
College name as well now once we've done
that we've got all the information we
need we just need to add one more
filtering condition which is that the
college enrollment is greater than
20,000 and with that we've solved our
query so to summarize the natural join
we the natural join combines relations
it automatically sets values equal and
attribute names are the same and then it
removes the duplicate columns the
natural join actually does not add any
expressive power tooth to relational
algebra we can rewrite the national
natural join without it using the
cross-product so let me just show that
rewrite here if we have and now I'm
going to use the general case of two
expressions one expression natural join
with another expression that is actually
equivalent to doing a projection on the
schema of the first expression I'll just
call it e 1 now Union the schema of the
second X
and that's a real union so that means if
we have two copies we just keep one of
them over the selection now we're going
to set all the shared attributes of the
first expression to be equal to the
shared attributes of the second so I'll
just write e1 a1 equals e2 a 1 and E 1 a
2 equals e2 a2 now these are the cases
where again the attributes have the same
names and so on so we're setting all
those equal and that is applied over
expression one cross-product expression
2 so again the natural join is not
giving us additional expressive power
but it is very convenient notationally
the last operator that I'm going to
cover in this video is the theta join
operator like natural join theta join is
actually an abbreviation that doesn't
add expressive power to the language let
me just write it the F theta join
operator takes two expressions and
combines them with the bow tie looking
operator but with a subscript theta that
theta is a condition it's a condition in
the style of the condition in the
selection operator and what this
actually says it's it's pretty simple is
it's equivalent to applying the theta
condition to the cross product of the
two expressions so you might wonder why
even mention the theta join operator and
the reason I mention it is that most
database management systems implement
the theta join as their basic operation
for combining relations
so the basic operation is take two
relations combine all tuples but then
only keep the combinations that pass the
theta condition often when you talk to
people who build database systems or use
databases when they use the word join
they really mean the theta join so in
conclusion
relational algebra is a formal language
it operates on sets of relations and
produces relations as a result the
simplest query is just the name of a
relation and then operators are used to
filter relations slice them and combine
them so far we've learned the Select
operator for selecting rows the
projector operator for selecting columns
the cross-product operator for combining
every possible pair of tuples from two
relations and then two abbreviations the
natural joint which is a very useful way
to combine relations by enforcing
equality on certain columns and the
theta join operator in the next video
we'll learn some additional operators of
relational algebra and also some
alternative notations for relational
algebra expressions
تصفح المزيد من مقاطع الفيديو ذات الصلة
Relational Algebra (Select Operation)
43. OCR A Level (H046-H446) SLR8 - 1.2 Introduction to programming part 4 mathematical operators
Basics of Relational Algebra
#11 Python Tutorial for Beginners | Operators in Python
SQL Basics for Beginners | Learn SQL | SQL Tutorial for Beginners | Edureka
Relational Algebra (Union Operation)
5.0 / 5 (0 votes)