Tips and tricks for reading unfamiliar code
Summary
TLDRThe speaker discusses strategies for reading and understanding code, emphasizing the importance of being fearless and comfortable with confusion. They advocate for multiple 'passes' through code, akin to how one might read a book, to gradually build understanding. The talk touches on the psychological aspects of learning, such as the brain's pattern recognition and the value of familiarity over deep understanding. Technical advice includes having a strong foundation in data structures and algorithms, and using an algorithmic approach to exploring code. The speaker also discourages starting with 'Main' when reading code, suggesting instead to focus on the human-facing aspects of code organization.
Takeaways
- 😌 **Embrace Fearlessness**: To understand a codebase, especially large ones, one must be fearless and accept that it's often too large for a single person to fully grasp.
- 🧐 **Accept Confusion**: It's crucial to be comfortable with not understanding everything immediately and to move on without getting stuck on minor details.
- 🔁 **Multiple Passes**: Reading and understanding code often requires multiple passes, similar to how one might approach reading a complex book.
- 🧠 **Understand the Brain's Learning Process**: Recognize that the brain processes information in a probabilistic and pattern-matching way, akin to machine learning algorithms.
- 🎯 **Have a Clear Objective**: When exploring code, it's important to know what you're looking for, which helps focus your attention on relevant parts of the codebase.
- 🤝 **Familiarity Over Understanding**: Sometimes, being familiar with code is more practical than striving for a deep understanding, especially when time is limited.
- 💻 **Technical Proficiency**: Having a solid foundation in data structures, algorithms, and programming languages is essential for effectively reading and understanding code.
- 📚 **Balanced Learning**: Balance reading code with writing code and studying related materials to reinforce learning and understanding.
- 🔍 **Algorithmic Exploration**: Develop a systematic approach to exploring code, such as iterative deepening, to efficiently find the most relevant parts of a codebase.
- 🛠️ **Focus on Human-Facing Artifacts**: Prioritize understanding the organization and structure of the code as presented to humans, rather than getting lost in machine execution details.
Q & A
What is the main challenge when trying to understand a large codebase?
-The main challenge is that most codebases of interest are too large to be fully understood by a single person, as they are often worked on by multiple teams and broken down into specialized components.
Why is it important to be fearless when reading code?
-Being fearless is important because it allows you to approach complex codebases without being intimidated by their size or complexity, and to engage with the code without fear of not understanding every part.
What does the speaker suggest about being comfortable with confusion when reading code?
-The speaker suggests that being comfortable with confusion is essential because you won't understand everything at first glance, and accepting this fact allows you to move forward and learn over time.
How does the concept of 'doing several passes' help in understanding code?
-Doing several passes over the code allows you to gradually build understanding, as each pass may reveal new insights or clarify previous confusions, much like how repeated exposure helps in learning and pattern recognition.
What role does the brain's pattern recognition play in reading code?
-The brain's pattern recognition is crucial in reading code because it helps in identifying common structures and algorithms across different codebases, much like how machine learning algorithms learn from data.
Why is it beneficial to have a clear idea of what you are looking for in a code repository?
-Having a clear idea of what you are looking for helps focus your attention on relevant parts of the code, making the process more efficient and targeted, rather than getting lost in irrelevant details.
How does familiarity differ from understanding in the context of learning to code?
-Familiarity refers to becoming comfortable and accustomed to certain coding patterns and structures, which is often more achievable and practical than deep understanding, especially when dealing with complex topics or large codebases.
What background knowledge is necessary to effectively read and understand code?
-Background knowledge such as data structures, algorithms, and programming language fundamentals is necessary to recognize patterns and understand the logic within the code.
What is the speaker's approach to exploring code within a repository?
-The speaker's approach involves an algorithmic mindset, starting with identifying the root folder, then progressively exploring subfolders, and focusing on the parts of the code that match their interests or goals.
Why does the speaker recommend not starting with the 'Main' function when exploring new code?
-Starting with the 'Main' function is often not useful because it typically involves command line parsing and branching that leads in multiple directions, which can be complex and not representative of the core logic or structure of the program.
What is the importance of being honest with oneself about what is understood and what is not when reading code?
-Being honest with oneself is crucial to avoid self-deception and to ensure that the learning process is genuine and effective, as it helps in identifying areas that require further attention or clarification.
Outlines
😎 Mental Game for Reading Code
The speaker begins by addressing the complexity of reading code and how it relates to their views on education and learning. They emphasize the importance of being fearless and comfortable with confusion when trying to understand a codebase. The analogy of having a conversation with a computer is used to illustrate the process of understanding code. The speaker also discusses the idea of making multiple passes through the code to gradually build understanding, much like how one might approach reading a book. The mental game involves accepting confusion and the fact that complete understanding of a large codebase is often unattainable and not necessarily beneficial.
🧠 Psychological Insights on Code Comprehension
This paragraph delves into the psychological aspects of learning and understanding code. The speaker suggests that the brain operates similarly to a machine learning algorithm, processing information probabilistically and recognizing patterns. They recommend approaching code as one would approach training an AI, focusing on recognizing patterns across multiple repositories rather than trying to understand every detail of a single one. The speaker also stresses the importance of knowing what you are looking for in a codebase, using the differences between repositories as a guide to focus your attention. Lastly, they touch on the idea that familiarity can be more valuable than a false sense of understanding, advocating for practical engagement with code over theoretical comprehension.
💻 Technical Foundations for Navigating Code
The speaker discusses the technical knowledge required to effectively read and understand code. They highlight the necessity of having a background in data structures, algorithms, and programming languages. The paragraph emphasizes the importance of practical programming experience and suggests a balanced approach to learning that includes reading code, writing code, and studying related information. The speaker also recommends having an algorithmic approach to exploring code, starting with a broad overview and gradually delving deeper into specific areas of interest. They advise against starting with the 'Main' function, as it often leads to complex branching and initialization routines that can obscure the core logic of the program.
🔍 Practical Strategies for Code Exploration
In this paragraph, the speaker focuses on practical strategies for exploring code. They advocate for an iterative deepening approach, similar to depth-first search, to navigate through the file system and understand the organization of the code. The speaker suggests starting with the root folder and progressively examining subfolders to identify areas of interest. They also discuss the importance of being honest with oneself about what is understood and what is not, warning against the pitfall of overestimating one's comprehension. The paragraph concludes with a reminder to focus on the human-facing aspects of code, such as the file system and organization, rather than the machine-facing aspects like execution traces.
🕵️♂️ The Value of Honesty in Code Understanding
The final paragraph reinforces the idea of being honest with oneself about the level of understanding when reading code. The speaker relates this to the earlier point about being comfortable with confusion, suggesting that self-deception can hinder true learning. They emphasize the importance of recognizing one's limitations and the emotional barriers that can prevent a clear understanding of complex topics. The speaker concludes by acknowledging that while they have covered many points, there is always more to learn and that the process of reading and understanding code is an ongoing journey.
Mindmap
Keywords
💡Fearlessness
💡Mental Game
💡Pattern Recognition
💡Iterative Learning
💡Familiarity vs. Understanding
💡Background Knowledge
💡Code Exploration Algorithm
💡Human-facing vs. Machine-facing Artifacts
💡Honesty with Self
💡Ego and Learning
Highlights
The necessity of being fearless when attempting to understand a large codebase.
The analogy of understanding a codebase to having a conversation with a computer about its behavior.
The importance of being comfortable with confusion when reading code.
The strategy of making multiple passes through code to gradually improve understanding.
The brain's processing likened to a machine learning algorithm, focusing on pattern recognition.
The concept of focusing on what is different between codebases to understand unique aspects.
The psychological insight that familiarity often matters more than deep understanding.
The advice to have a background in data structures and algorithms to effectively read code.
The recommendation to practice reading, writing, and learning about programming languages simultaneously.
The approach of using an algorithmic mindset, similar to iterative deepening depth-first search, to explore code.
The suggestion that starting at the 'Main' function is often not the most productive way to understand a program.
The preference for focusing on human-facing artifacts of code over machine-facing artifacts.
The importance of being honest with oneself about what is understood and what is not, to avoid self-deception.
The idea that the emotional response to understanding complex topics can sometimes hinder true comprehension.
The emphasis on the practicality of code understanding over theoretical knowledge.
Transcripts
so I've gotten a lot of questions about
you know how do I recode as in how do I
personally read code but also like what
can people
do to get better at reading code or
understanding code or writing code and
uh initially I had started writing out a
a whole script uh for for this and I
started kind of spiraling out of control
because the reality is that the the way
that I read code is pretty um dependent
on and embedded in the my ideas about
how Education Works how Learning Works
in
general and that's something that we are
going to focus on in this
channel uh soon now that we're leaving
this old regime but we haven't really
talked about it yet and I didn't really
want to put the the cart before the
horse and it was getting into a
situation the the script was getting
into a situation where um I felt like I
was perhaps raising more questions than
I was answering so what I want to do
here is just give a few like tips and
tricks about heris how I think about
reading code and um they are the
following so I guess we might as well do
this as an org buffer
uh
so here's how I'll set it out so
psychological um
uh computer science these will be just
like little headings just to hang things
on
um
and is there a last one I don't
know we'll start here maybe this will be
technical so uh how about mental game
mental game will be first so the first
mental game point is uh
you have to be you have to be relatively
Fearless uh if you want to do something
like um trying to understand a codebase
is kind of a foolish thing to do most
code bases that are of Interest are too
large to uh to understand even for one
person if you take something uh that
like you know a large company like
Google put out puts out or meta or
Twitter or Netflix often times those big
projects are going to be worked on by
several teams and if you're really lucky
a small number of the team leaders on
those teams will understand like One
Directory um that's maybe exaggerating a
little bit but
um really things are broken down and
specialized and you don't really have
someone whose job it is to understand
the whole code base because it's not
clear that uh even if were possible that
there would be a whole lot of a whole
lot to gain from that instead what the
codebase does is you've written down
into the
computer uh how the how the machine will
act and so that's really the record of
the code base and then in terms of
understanding it your job is to sort
of
um figure be able to have a conversation
about with the computer in some sense
about about how you want it to act so
it's maybe a little bit like uh
I have kids uh you may not have kids but
it's a little bit like having a kid it's
not really my job to understand
everything that goes on in my kid's
brain because literally nobody does but
um it is my job to like if my kid is
having a hard time be able to have a
conversation with them and help them
have a better time and it's kind of the
same with code and in order to do that
and you have to be uh you have to be
also willing to be confused
and
not these are related right so you have
to be willing to like as you're looking
at something uh be comfortable with the
fact that you don't understand what's
going on and you see that a lot in this
channel um and you know very often I'll
just say I don't know what's going on I
don't know what this variable is for I
don't know what this piece of syntax is
and I I typically just move on rather
than trying to drill down and that's a
that's a strategy I think like earlier
in my life it would have really bothered
me that I didn't understand something
because I was very good at understanding
things and so I learned over time that
it's vastly better to do a pass where
you're accepting of the fact that there
are things that you don't understand
even if you know they are silly and
you've made some sort of mistake and you
would understand it if you had not made
some simple mistake or whatever um and
then
just being okay with that uh is really
powerful and then uh I guess relatedly
is like um think in terms
of of doing
um
several
passes is how I'll put it there's a good
book here
um I uh I think Mor J Adler is the guy
uh this is not like a like life-changing
book but I think it's a book that's
worth maybe getting and looking at and
the idea uh the idea of how to read a
book is just like it's a you know um a
lot of kids like go to go to college and
they
don't have a relationship with uh with
literature the way that you know from
High School the way that you do in
college and uh when you're faced with
like a huge library of
literature uh on on your field like you
may be an engineer there's an
engineering literature you may be a
philosopher there's a philosophy
literature you have to have some
strategy for for dealing with all of
that and the strategy can't really be
like depth first search where you're
going to like pick up uh n and
understand everything that's in it
instead what you want to do is you want
to read books and passes you want to
make a pass through each individual book
you probably also want to make a pass
through several books at the same time
so maybe you want to pick up the N book
also pick up a Play-Doh book pick up a
book on Classics of philosophy and kind
of work through them sort of
simultaneously and so that's a little
bit of how I've I've done things in this
um in this stream it's been rare
recently that I've Revisited a
repository many times and that's really
an artifact of uh of my trying to get
through the
backlog uh but if it were me left to my
own devices I would revisit the same
repository that I was trying to
understand multiple times doing multiple
passes and that's useful because when I
don't understand something uh almost
always I find I eventually understand it
with little effort just by doing more
passes I may not understanding the next
pass it may take a couple of passes um
but that in my experience is like the
least effort approach to getting the
most uh the most out of um the most out
of what you're doing so that's that's
the mental game we'll uh hide that for
now so we can focus here on the on kind
of psychological
points and the first psychological point
is
um maybe have a sense
of have a sense of how the brain
processes in
in my opinion I think this is now pretty
mainstream the the brain is more or less
like a machine learning algorithm rather
than a traditional computer so the brain
takes like a probabilistic approach it
does a lot of pattern matching and it
does a lot of that almost all of it I I
would say uh kind of behind the scenes
in a way that you're not really aware
of so what that means is you can kind of
take it imagine that your brain
literally is a machine learning Al
algorithm look at how they train
algorithms and take that sort of
approach so when you're training um an
AI algorithm to let's say understand
text uh at the end of the training it's
able to quote unquote understand text in
this in the sense that you can ask it a
question and it will spit out a response
but kind of in the beginning of the
training or in the middle of the
training it doesn't really understand
anything it's like updated getting a
bunch of weights and if you tried to get
information out of it it would be
terrible and garbled and so what that
what that implies is that uh this
algorithm is iterating over a bunch of
documents and it doesn't really
understand any one document instead what
it's doing is kind of like pattern Rec
pattern recognition across a corpus of
documents and that's the that's the same
sort of approach that I'm taking here is
I've looked at a bunch of repositories
I'm not really trying to understand any
one repository instead I'm trying to see
what things have in common what things
stick out and what are different and and
that sort of
thing um I guess relatedly is have a
sense of what you're looking
for in a in a repo so if you have
uh if you've gotten to the point where
you've looked at a bunch of different
pieces of code then it'll be clear to
you as I just said what's in common but
it would also be clearer to you uh
what's different and it's really the
differences you can imagine like
literally running a diff algorithm like
a conceptual diff algorithm um between
two repositories and those diffs are the
sorts of things that you want to focus
on and so for example when I was looking
at databases I wanted to know the
database type thing like how are their
how are their file systems different
from other repositories a big part of
databases are that it's they fast to
search so I wanted to know how the
querying um algorithms worked and in
particular we'd already seen a lot of
like
parsing and in compiler type stuff so I
was interested more in the query
planning rather than the query like uh
parsing and and mapping onto specific
functions or
whatever uh so having that sense of what
you're looking for is will help Focus
the mind because especially if you have
a a bounded amount of time
it'll it'll help you figure out whether
something's worth looking at or not and
similarly if you were like
traveling if you were like going to
Paris you would probably want to go to
the parts of if as a tourist you would
want to go to the the parts of Paris
that are different from all of the other
places you wouldn't like go to Paris and
like spend the whole time eating at
McDonald's or
whatever
um the last maybe psychological Point uh
I want to make is like familiarity
in practice I'll literally write this
down in practice is often important than
understanding so and I'll put these in
in quot in scare quotes
because I don't think these are super
well-defined um terms
psychologically the way that I think
about it is that familiarity and
understanding are kind of like
feelings and what the psychological
literature says in my reading of it is
that often when people think that they
understand something they don't actually
understand it they overestimate how much
they understand especially about
technical
topics
uh and so understanding is kind of this
emotion that's that's in my opinion
really kind of about comfortableness
with the topic uh and it can often be
illusory on the other hand in practice
familiarity seems to be really important
um there's a famous quote from uh John
Von noyman that in mathematics you never
understand things you you just get
familiar with them uh that's not word
for word the qu but that's the basic
idea and I think that's correct like Von
noyman was one of the one of the best
mathematicians in the last however many
years arguably one of the smartest
people in history and and I think he's
correct in the sense that like you you
sometimes see people trying to chase
understanding or like
intuition about a
topic um and it's often not clear that
that pays off that often seems like
they're in retrospect they're they are
wasting their time it would be uh more
time efficient to just accept the accept
the things that you know about them
become familiar with them play around
with them explore the tools that you
have but not really
try to P too much attention to whether
you feel internally like you've gred
something and I think that's a I think
that's an important
shortcut
um yeah okay and then in terms of
technical stuff this is now kind of
getting to the computer
science uh computer sciencey area one is
um so I'm coming to this having already
uh programmed professionally having
studied uh some computer science but
having studied lots of other technical
things solved you know many many math
problems uh solved many fewer but still
a fair number of programming problems
and so there's just some level of
background knowledge that you'll need in
order to understand uh to understand
like a something as big as a as a
repository and so I'll say like you
should know data structures
algorithms how much do you need to know
them I don't know but you should be
familiar with them you should have
probably have
implemented things like depth first
search breadth first search uh a few
times under
different uh under in under different
conditions exploring different problems
um and you should obviously know the
basics like four loops and while loops
and all all of that stuff and if you
struggle there then you're going to
struggle reading other other people's
codes because you just won't have the P
you won't have those patterns very well
in your system in order to recognize
them in others so this is just like
background knowledge I
guess um you should know some
programming
language and have a programming practice
I guess
so this is like um for those of you who
are learning how to code uh and want to
read in order to learn how to code this
maybe kind of like a chicken and egg
problem but the way that I would do it
personally if I were starting out is I
would um have a practice of reading code
have a practice of writing code and also
have a practice of reading information
about the the topic and programming
language I was I was working on this
could be that could be books this could
be
uh like websites documentation it could
be videos um and kind of go back and
forth between those three things like
split your time because those are sort
of M mutually reinforcing uh habits and
uh what you want to really have is the
Rhythm between those
things in order to make the the fastest
progress I
guess
um and then uh
I guess the last technical point I will
have is that
um how maybe have an
algorithm in mind for exploring the
code I I typically do something that's
maybe a little bit like uh iterative
deepening depth for a search uh what
you'll see that I do in almost all cases
is I the first thing I want to know is
like
uh which which
file so often there's like one folder
that has all of the goodies and it's
like the source folder or something it
it depends on the language but like
first of all finding out the right root
folder but then inside that root folder
looking at all of the folders in there
that's like depth equals 1 and figuring
out okay which of these looks like it's
probably going to have the sort of sort
of information I'm looking for
so here or have a sense of what you're
looking for if I have a sense of what
I'm looking for like I know I'm looking
for the uh I know I'm looking for the
parsing routine or whatever then the
that should hopefully be evident where
that is from the the first level of
folders if not or if it's a project and
you don't really know where the most
interesting code or routines is then you
kind of have to make some guesses and so
you like peek deeper into the folders
and and try to do the same thing and you
kind of go down bit by bit and so that's
sort of how I think of that's how sort
of how I think of things
um and we'll see that I think a little
bit in the um in the blender build
system section
video
uh and then the last the real last I
said that was the last but this is the
real last
um this is there's a point that comes up
repeatedly I'm not quite sure how to put
it in writing but we'll talk about it
and then I'll figure a way to summarize
it um when I first started people were
asking me about like why not start at
Main and kind of like Trace through the
execution and uh after people started
suggesting it I did start looking more
at the main file more often
and almost all the time the main file is
not an interesting place to start
because what happens is like uh Main
it's going to kick off some like often
command line parsing routine and then
it's going to branch in a bunch of
different directions depending on the
the command line flags and then you have
to track all that branching and then
there's going to be like some loading
and initialization routines and that
probably has a bunch of different
branching because that's also looking
for things like what platform are you on
etc etc and so um often like going into
Main and tracking execution in that way
is not super useful and also like if
you're tracking uh if you're tracking in
like a debugger you kind of have to
figure out what level you want to debug
because if you like start uh by setting
some break
point and
um the if you just stay at the same
level it's equivalent to just reading
the code it's going to execute line by
line if you step into and out of
functions then you're going to like
quickly get down uh deeper and deeper
into uh kind of things might be
implementation details compared to what
you actually want to
know and so what I have found is that
the tracing the execution is is really
for the computer that's how the computer
thinks but the way that code is
organized for humans is via the file
system that's how uh organizations the
the the files layout typically reflects
how organizations have devoted human
resources to writing the code it's uh
it's the the files are there for humans
to put their thoughts into and then the
the it's
the once it's compiled and executed that
that's for the machine so my personal
preference and you can differ here but
my personal preference is focus on the
files the focus on the human facing
artifacts rather than the machine facing
artifacts is how I'll put that
now you do sometimes need to look at the
machine facing artifact so for example
if you want to debug a program that's
gone Haywire or if you want to
understand the uh like assembly code or
if you want to disassemble some program
you do have to dig into the weeds that
again goes back to knowing what you want
to look for um and so if that is what
you want to look for then uh like
stepping through with a debugger Etc
that's a that's a really good way to
look at things and I do I do enjoy
debuggers I use them uh for tasks that
are different from reading code but
um but they're great they're life-saving
tools uh I had one more thought I think
we check my notes Here
yeah I guess this maybe goes into
psychological or maybe I'll call it
mental
game uh and that is
um this is probably this is closely
related to be willing to be confused and
that is be honest about what you
understand so if you're going to be
looking through something and are will
are willing to to be confused then that
only works if you're honest with
yourself about what you understand and
and what you're confused about um the
danger of not going through something
carefully and trying to understand it at
every step of the way is that you might
fool yourself into thinking you've
understood something when you haven't
and that might take practice I don't
know that that might just take some
humility just but be aware that uh you
have have limitations some of these Li
some of these limitations are
informational like uh the you can think
about different ways of trying to
understand something some of those will
have like exponential running time some
might have polinomial running time and
it's kind of your uh your choice about
which of those algorithms to
run but um in addition to being having
like this information constraints like
the sort of running time intrinsic
constraints there's also um constraints
about like your ego might be involved in
being able to solve some problem or
being able to understand something
complicated that your friends are having
trouble with or being able to prove
yourself
or uh you know all of that stuff kind of
goes into any topic that's sufficiently
complex or wanting to to dig your way
out of poverty or all of that stuff so
um you just have to kind of be aware of
when
you're
uh you have to be aware of kind of when
those like emotional responses are are
making you think that you've mastered
something that maybe you haven't
mastered um and I guess maybe that's the
last point I'm sure I'll think of uh of
more stuff but for now we'll leave it
here and I'm going to move on
to the blender build system
Ver Más Videos Relacionados
BEST WAY to read and understand code
Книги які має прочитати КОЖЕН програміст
How To Read NCERT Book For SST or Theory Subjects | The Best Approach and Importance
This is why you are not confident in logic building
Zen of Postgres: How to be a happy hacker
Digital semiotics: making sense of the world | Oscar Bastiaens | TEDxDordrecht
5.0 / 5 (0 votes)