Different Phases of Compiler
Summary
TLDRThis lecture covers the various phases of a compiler, including lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and target code generation. It illustrates how an arithmetic expression is processed through these phases, resulting in assembly language code. The lecture also introduces tools like Lex and Yacc for implementing compiler phases and mentions the LANCE C compiler platform for embedded processors.
Takeaways
- π The lecture provides an overview of the different phases of a compiler.
- π The phases include the preprocessor, compiler, assembler, and linker/loader.
- π¨βπ« The lecture focuses on converting high-level language code into machine code.
- π The preprocessor removes preprocessor directives and embeds header files.
- π’ The compiler converts high-level language code into assembly language code.
- π The lexical analysis phase involves tokenizing the input code.
- π³ The syntax analysis phase constructs a parse tree using context-free grammars.
- π The semantic analysis phase checks for type correctness and scope resolution.
- πΎ The intermediate code generator produces three-address code from the parse tree.
- π The code optimizer reduces the length of code and improves efficiency.
- π οΈ The target code generator produces assembly code from the optimized intermediate code.
- π οΈ Tools like Lex and Yacc are used to implement the lexical and syntax analysis phases.
- π The LANCE C compiler platform is mentioned for implementing the front end of a C language compiler.
Q & A
What are the four main phases of a language translator in a compiler?
-The four main phases of a language translator in a compiler are the preprocessor, the compiler, the assembler, and the linker and loader.
What is the primary function of a preprocessor in a compiler?
-The primary function of a preprocessor is to convert high-level language code into pure high-level language code by embedding required header files and omitting preprocessor directives.
What does the lexical analysis phase of a compiler do?
-The lexical analysis phase takes lexems as input and generates tokens. It is responsible for identifying and categorizing sequences of characters into meaningful tokens that the compiler can understand.
What is a lexem and how does it differ from a word?
-A lexem is similar to a word but differs in that while words individually have their own meanings, a group of lexems in their entirety convey the meaning. For instance, 'x' individually doesn't convey any meaning until it is considered within the context of an entire arithmetic expression.
What are tokens and how are they generated?
-Tokens are the meaningful units of a programming language that represent identifiers, operators, literals, and other syntactic elements. They are generated by the lexical analyzer, which recognizes patterns using regular expressions.
Can you explain the role of regular expressions in lexical analysis?
-Regular expressions are used by the lexical analyzer to define patterns for recognizing different types of tokens. For example, a regular expression can define what constitutes a valid identifier in a language.
How does the syntax analyzer use context-free grammars?
-The syntax analyzer uses context-free grammars to analyze the stream of tokens and produce a parse tree. It follows a set of production rules to ensure that the tokens conform to the language's syntax.
What is the purpose of a parse tree in compiler design?
-A parse tree is a hierarchical structure that represents the syntactic structure of a program according to its grammatical rules. It is used to verify the syntactic correctness of the source code.
What does the semantic analyzer check for in a parse tree?
-The semantic analyzer checks for type correctness, array bounds, scope resolution, and logical consistency within the parse tree. It ensures that the program makes sense from a semantic point of view.
What is the intermediate code generator's role in the compiler?
-The intermediate code generator takes the semantically verified parse tree and produces intermediate code, such as three-address code, which is a lower-level representation of the program that can be further optimized or translated into target code.
How does the code optimizer reduce the length of the code?
-The code optimizer reduces the length of the code by eliminating unnecessary operations and variables, such as by directly assigning the result of an expression to a variable instead of using a temporary variable.
What tools can be used to implement the lexical analysis phase of a compiler?
-The tool Lex can be used to implement the lexical analysis phase. It generates a lexical analyzer from a user-provided specification.
What is the role of YACC in compiler implementation?
-YACC (Yet Another Compiler Compiler) is used to implement the syntax analysis phase. It generates a parser from a set of context-free grammar rules.
What is the significance of the Lance C compiler platform mentioned in the script?
-The Lance C compiler platform is a software platform that can be used to implement the entire front end of a C language compiler for embedded processors.
Outlines
π Introduction to Compiler Phases
The instructor begins by welcoming students to a course on compiler design. The lecture aims to provide an overview of the various phases of a compiler using an illustration. The expected outcome includes understanding the different phases such as the preprocessor, compiler, assembler, and linker/loader. The process starts with the preprocessor which removes directives and embeds header files into the source code. The compiler then translates this high-level code into assembly language code. To save time, the instructor decides to focus on a single arithmetic expression to demonstrate how it goes through the compiler phases. The first phase discussed is lexical analysis, where the lexical analyzer processes lexemes to generate tokens. The tokens represent the meaning of lexemes, and the analyzer uses regular expressions to identify them. An example of a regular expression for identifiers is provided, and a finite automata diagram is used to illustrate the process of recognizing identifiers.
π Lexical Analysis and Syntax Analysis
The second paragraph delves into the syntax analysis phase, overseen by the syntax analyzer or parser. The parser uses context-free grammars to form a parse tree from the stream of tokens. The instructor illustrates the formation of the parse tree using production rules. The yield of the parse tree is the same as the input expression, indicating no syntax errors. The semantic analyzer then takes the parse tree and performs semantic analysis, checking for type correctness, array bounds, scope resolution, and other logical aspects of the code. The semantic analyzer ensures the meaningfulness of the parse tree and identifies any semantic errors.
π οΈ Intermediate Code Generation and Optimization
The third paragraph covers the intermediate code generation phase, where the semantically verified parse tree is transformed into intermediate code, specifically three-address code (3AC). The precedence of operators in the parse tree is used to determine the order of operations. The intermediate code generator assigns results to temporary variables to maintain this order. The code optimizer then takes this intermediate code and optimizes it by reducing the number of statements, for example, by assigning the result of an addition directly to the final variable instead of using a temporary variable. This optimization shortens the code and potentially improves efficiency.
πΎ Target Code Generation and Compiler Tools
The final paragraph discusses the target code generation phase, where the optimized intermediate code is converted into assembly code. The instructor explains the assembly code segment line by line, detailing how each command moves and manipulates data in registers to perform the arithmetic operation. The assembly language code is a translation of the high-level arithmetic expression into machine-readable instructions. The instructor then discusses tools like lex and yacc for implementing the lexical and syntax analysis phases of a compiler. The lecture concludes with a mention of the LANCE C compiler platform for implementing the front end of a C language compiler and directs interested learners to a research paper for further study. The session ends with a preview of the next topic, the symbol table.
Mindmap
Keywords
π‘Compiler
π‘Phases of Compiler
π‘Lexical Analysis
π‘Tokens
π‘Syntax Analysis
π‘Parse Tree
π‘Semantic Analysis
π‘Intermediate Code
π‘Code Optimization
π‘Assembly Code
π‘Lex
π‘Yacc
Highlights
Introduction to the phases of a compiler with an illustration.
Overview of various phases of the compiler.
Introduction to tools for implementing different compiler phases.
Explanation of the role of the preprocessor in converting source code.
The function of the compiler in producing assembly language code.
Detailed walkthrough of an arithmetic expression through the compiler phases.
Lexical analysis phase explanation using lexical analyzer.
Explanation of lexemes and tokens in lexical analysis.
Regular expressions used by the lexical analyzer.
Finite automata illustration for identifier recognition.
Syntax analysis phase controlled by the parser.
Formation of the parse tree using context-free grammars.
Yield of the parse tree and syntax error detection.
Semantic analysis phase for type checking and scope resolution.
Intermediate code generation from the semantically verified parse tree.
Introduction to three-address code (3AC) as an intermediate code.
Code optimization phase for reducing code length.
Target code generation phase producing assembly code segment.
Explanation of assembly language code segment translation.
Tools for implementing compiler phases: Lex and Yacc.
Lance C compiler platform for implementing the front end of a C compiler.
Conclusion and anticipation for the next lecture on symbol tables.
Transcripts
hello everyone
welcome back to the course of compiler
design
as promised today we will observe the
different phases of the compiler with
the help of an illustration
so without any further ado let's get to
learning
now if we talk about the expected
outcome of this particular lecture
first we are going to have a brief
overview of the various phases of the
compiler
thereafter we will learn about the
various tools using which the different
phases can be implemented
well in the last lecture we have already
observed that in order to convert the
human readable source code into the
machine code
we need a language translate
and the language translator has got four
different phases
the preprocessor then the compiler the
assembler and finally the linker and
loader
now after going through the preprocessor
the high level language code gets
converted into the pure high level
language code
basically the preprocessor will embed
the required header files with the
source code omitting all the
preprocessor directives from that
now this pure high level language code
will be given to the compiler which in
turn will produce the equivalent
assembly language code
now this much we have already observed
in the previous session
so today we will take this pure high
level language code and observe how it
goes through the various phases of the
compiler now to be really honest
converting this entire piece of code
will be a little time consuming
so what we will do instead of the entire
code
let's consider this statement that is
the arithmetic expression
we will observe how this expression goes
through the compiler
and finally how this particular assembly
language code segment is produced
basically we will observe how this
expression is treated by all the six
phases of the compiler
now coming to the first one that is the
lexical analysis phase
here the entire process is taken care of
by the lexical analyzer
so here the expression is given to the
lexical analysis phase
and the lexical analyzer takes lexums as
inputs
and generates the tokens
now lexums are similar to words with
only one small difference that is
words individually have their own
meanings whereas group of legazims in
their entirety convey the meaning
for instance this word analysis means a
detailed examination of anything complex
in order to understand its nature or to
determine its essential features
on the contrary this x individually
doesn't convey any meaning until we
consider the entire arithmetic
expression
now coming to tokens
these are actually the meanings of the
lexus
so if we traverse this statement from
left to right
particularly in this statement
x is an identifier
then the equal symbol is an operator to
be precise it is the assignment operator
thereafter a is again an identifier
the plus symbol is an arithmetic
operator and so on
so this is the output of the lexical
analysis phase that is a stream of
tokens
and the job of the lexical analyzer is
to find out the meaning of every lexing
it recognizes the tokens with the help
of regexes or regular expressions
exemply gratia this is the regular
expression for identifiers
where l stands for letter
d for digit and this special character
is the underscore
now allow me to illustrate this using
its equivalent finite automata
observe there are two regexes for
identifier
let's consider the first one
so from the initial state that is q0
seeing a later we will go to the next
state q1
and from this one
seeing any number of letters or digits
we will end up at the final state that
is q3
coming to the next form
starting from the initial state
if we see an underscore we will move
towards the next state that is q2
thereafter
saying any number of letter or digits we
will end up at the final state q3
many of you may know this that we cannot
have an identifier name starting with
digits
and this is the logic which rejects the
identifier names beginning with digits
so for examining the leg zooms the
lexical analyzer makes use of the type 3
or regular grammars of the family of
grammars for formal languages
now let's move on to the next phase
here the syntax analyzer also known as
the parser is in control
the stream of tokens is passed to the
syntax analysis phase
the syntax analyzer depends on the type
2 or context-free grammars
for this particular expression
these are the cfg production rules that
the parser will use in order to form the
parse tree
let me illustrate the formation of the
parse tree
consider the first production rule
the start symbol s can be rewritten as
id equals e semicolon
it means an expression can be assigned
to an identifier
and the expression has to be followed by
the statement terminator that is the
semicolon so from s
we can derive
id
equals operator e and semicolon
coming to the next production rule
e can be rewritten as e plus t
that is expression can be expression
plus term
so from this e
we can derive e plus t
using the production rule
now e can also be rewritten as t
that is expression can also be a single
term
so using this production rule from e
we can derive t
coming to the next one
t can be rewritten as t into f
that means term can be a term multiplied
by a factor
so from this t
we will derive t into f
that is t
then the multiplication operator and
then f
additionally t can also be written as f
that is term can also be a single factor
so from t
we can derive f in these two instances
finally consider the last production
rule
f can be rewritten as id
that is factor is an identifier
and using this we can derive id from all
these f's
so this is the parse tree
now before observing the yield of the
parse tree let me tell you a few things
about this particular grammar here id
the equals operator semicolon the plus
and the multiplication operators are the
set of terminals
and the ones in the upper case that is s
e
t
f
these are the set of variables or
non-terminals
now in order to find out the yield of
the parse tree
we will have to traverse it top to
bottom left to right taking notes of
only the terminals
so let's begin
during traversal this id is the first
terminal that we encounter
remember
top to bottom left to right
so the next one is this equals operator
next this id
thereafter this plus operator
then this id
then this multiplication operator
after that this id
and finally this semicolon
therefore the yield of the parse tree
would be
id equals id plus id into id semicolon
and since the yield of the parse tree
and the expression are the same
the syntax analyzer will not produce any
errors
so in short taking the stream of tokens
the syntax analyzer analyzes them
following specific set of production
rules and produces the parse tree and if
the yield of the parse tree and the
provided stream of tokens are the same
then there is no error otherwise there
is some syntax error in the statement
now let's move on to the next phase
in this phase the semantic analyzer
takes the control
the parse tree produced by the syntax
analyzer is given to semantic analysis
phase and the semantic analyzer in turn
produces the semantically verified parse
tree
semantic analyzer is responsible for
type checking array bound checking and
the correctness of scope resolution
basically it does the logical analysis
of the parse tree
like in this parse tree these three
identifiers can be constants
whereas this particular identifier
because it is followed by this
assignment operator cannot be a constant
here the semantic analyzer will examine
whether the type of this identifier is
variable
semantic analyzer detects type mismatch
errors undeclared variables misuse of
reserved words multiple declaration of a
variable within a single scope accessing
an out of scope variable
mismatch between actual and formal
parameters etc
in simpler terms semantic analyzer looks
for the meaningfulness of the parse tree
and verifies that
now the next phase is handled by
intermediate code generator the
semantically verified parse tree is
given to the intermediate code
generation phase where the intermediate
code generator in turn produces the
intermediate code
so this was our parse tree
and this is the yield of the parse tree
which is similar to the pure high level
language expression
after being semantically verified
the intermediate code generator will
produce the intermediate code
here we are using the very popular
intermediate code that is the three
address code dac
now if you observe this parse tree
carefully
the precedence of the operators are
visible here if we look at the tree from
bottom to upwards
so at the lowest level the
multiplication operator is there and
therefore this would be performed at
first
for this the intermediate code generator
will create this temporary variable say
t0 and assign the result of b into c to
it
coming to the next level the result of
this is being added with this identifier
which in the expression is a
therefore in the intermediate code the
result of a plus d0 is being assigned to
another temporary variable t1
finally in the last level the result of
the expression is being assigned to this
identifier which in the arithmetic
expression is the variable x so in the
intermediate code t1 is being assigned
to x
now this intermediate representation of
the code is called three address code
because if you observe all the
statements in here at most they have
three addresses in them
and by saying address we are mentioning
the addresses of these variables
now till this phase it is called the
front end because taking this
intermediate code
if we want to generate the target code
which is specific to the platform all we
have to do is modify the next two phases
that is the backend according to the
platform which we want to produce our
target code for
now let's move on to the next phase
shall we
so here the code optimizer is in control
basically the intermediate code is
provided to the code optimization phase
there the code optimizer in turn
generates the optimized code
now code optimization can either be
machine dependent or machine independent
we will observe them in details in due
time for now for the sake of
understanding let me illustrate the
optimization procedure so as you can
observe
our intermediate code that is the three
address code had three statements
in the first one t0 is being assigned
with the result of b into c
then in the second one t1 is being
assigned with the result of a plus d0
and finally in the third one
we're assigning t1 to x
now if you observe carefully the second
and the third statements
here
instead of t1 if we assign a plus t0
directly to x
we can reduce the length of the code
so the optimized code will have only two
statements
and in a nutshell
this is exactly what the code optimizer
does
it optimizes the intermediate code
next up is the target code generator
so the optimized code is given to the
target code generation phase which
happens to be the last phase of the
compiler
now taking this optimized code the
target code generator will finally
produce this assembly code segment
allow me to walk you through this code
segment
here in the first line we have mov that
is the mnemonic for moving
then we have eax
now eax is actually the extended version
of the ax register
which is a combination of ah and al
ah can store 16 higher order bits and al
stores remaining 16 lower order bits
so eax can actually store 32 bits it is
actually the accumulator register
next we have d word ptr that is the data
word pointer which is pointing at the
register base pointer rbp with a scaling
factor of minus 8.
basically this pointer is pointing to
the variable b
and using this entire command the value
held by variable b is being moved to the
eax register
now this imul is the mnemonic for sign
multiplication
d word ptr rbp minus 12 is pointing to
the value stored by the variable c
and using this command the value stored
in variable c is being multiplied with
the value stored in the register eax
and the result is also stored in the eax
register
basically by the end of these two
commands
we will have the result of b into c in
the accumulator
coming to the next command here the
contents of the eax register are being
moved to another register edx which also
is a 32-bit register
similar to eax edx also has two
different 16 bit segments where the
higher order 16 bits are to be stored in
dh and the lower order 16 bits will be
stored in dl
now in the next command we are moving
the data world pointed by register base
pointer with the scaling factor minus 4
that is the content of the variable a
into the accumulator
thereafter in this command we are adding
the contents of the register eax and edx
and also storing the result in the eax
register that is the accumulator
finally we are moving the content of the
eax register to the address pointed by
the data with pointer rbp minus 16 which
is actually the address of the variable
x
so after execution of these commands we
will have the calculated value in the x
so this is the meaning of this assembly
language code segment
so this is how that arithmetic
expression of our peer high level
language code
finally gets translated into this
assembly language code segment after
going through all these phases
now let's observe the tools using which
we can practically implement various
phases of the compiler
so the compiler has six phases
in order to implement the lexical
analysis phase we can use the program
called lex
lex is the standard lexical analyzer
generator on many unix-based systems
it reads an input stream specifying the
lexical analyzer and writes the source
code which implements the lexical
analyzer for the c programming language
yacc which stands for yet another
compiler compiler is a lalr parcel
generator we will study about the llr
parsers in the chapter 4.
anyway using yacc we can implement the
syntax analysis phase
lex is commonly used with yacc
now we already know that among these six
phases
the first four are collectively called
the front end
and the last two are known as the back
end
using the software platform lance c
compiler we can implement the entire
front end of a c language compiler for
an embedded processor
interested learners are advised to go
through the research paper lance a c
compiler platform for embedded
processors by dr reiner leipers of
university of dortmund germany
the link of the paypal has been provided
in the description of this lecture
all right so in this lecture we have
gone through the overview of the various
phases of compilers
also we have learned about the tools
using which we can implement different
phases
all right people that will be all for
this session in the next session we will
discuss about the symbol table so i hope
to see you in the next one thank you all
for watching
[Applause]
[Music]
you
Browse More Related Video
5.0 / 5 (0 votes)