How to write code with few bugs?

Fredrik Christenson
6 Jul 202413:15

Summary

TLDRThis video script discusses strategies for writing code with minimal bugs, emphasizing the importance of understanding automated testing, including unit, integration, and visual regression testing. It highlights the significance of the testing pyramid and the need for a comprehensive approach to software delivery, from development to production. The speaker shares real-world examples to illustrate common pitfalls and suggests using feature flagging and blue-green deployment to mitigate risks. The script concludes by advocating for simplicity in code to reduce complexity and potential bugs.

Takeaways

  • πŸ” **Understand Automated Testing**: The importance of automated testing, including unit, integration, and visual regression testing, is emphasized to ensure each layer of the application works as expected.
  • πŸ“š **Learn the Testing Pyramid**: Familiarize yourself with the testing pyramid to understand the different levels of testing and how they are applied in your specific tech stack.
  • πŸ› οΈ **Know Your Stack's Tools**: Gain knowledge of the tools commonly used in your tech stack for effective bug detection and management at various levels.
  • 🦟 **Bugs are Unintended Behaviors**: Recognize that bugs are simply unintended behaviors that can affect user experience, and understanding this can help in identifying potential failure points.
  • πŸ”— **Environment Synchronization**: Ensure that updates across different environments are synchronized to prevent inconsistencies that can lead to production issues.
  • 🚫 **Prevent Temporary Outages**: Utilize strategies like feature flagging and blue-green deployment to avoid temporary outages during updates.
  • πŸ”„ **Continuous Learning**: Acquire a full-stack understanding to effectively manage the software delivery process and mitigate potential production issues.
  • πŸ›‘οΈ **Implement Fail-Safes**: Apply necessary safeguards such as health checks and redundancies to handle potential failures in production.
  • πŸ”„ **Rollbacks and Redundancies**: Prepare for the possibility of bugs by having rollback procedures and redundancy systems in place.
  • πŸ“ˆ **Simplicity in Code**: Write simple and stable code to reduce complexity and the likelihood of bugs, especially in the delivery process.
  • 🌐 **Consider Global Issues**: Be aware of potential issues like character encoding that can affect users globally and ensure your application is prepared for such scenarios.

Q & A

  • What is the primary focus of the discussion in the video script?

    -The primary focus of the discussion is on the best approaches to write code with very few bugs, emphasizing the importance of understanding automated testing and the testing pyramid.

  • Why is automated testing crucial in reducing bugs in software development?

    -Automated testing is crucial because it helps to ensure that each layer of the application is working as expected, thus preventing unintended behavior that affects the user experience.

  • What is the testing pyramid and why is it important?

    -The testing pyramid is a concept that illustrates the different levels of testing in software development, from unit tests at the base to UI tests and end-to-end tests at the top. It's important because it helps developers understand the general tools and patterns used to deal with bugs at different levels.

  • What is an example of a production issue mentioned in the script?

    -An example of a production issue mentioned is when a developer updated the base URL of an application without coordinating with the operations team, leading to a broken load balancer and incorrect routing, which caused the entire application to fail.

  • Why did the QA environment not help in the scenario described with the developer updating the base URL?

    -The QA environment did not help because the issue was related to the front-end routing and the developer did not consider how to synchronize the updates of the routing with the production environment, leading to a temporary outage.

  • What is feature flagging and how can it help in reducing bugs?

    -Feature flagging is a technique that allows developers to enable or disable features without deploying new code. It helps in reducing bugs by allowing for a safer deployment process, such as blue-green deployments, where a new version of the code can be tested before fully replacing the old one.

  • What is the significance of understanding the entire delivery process from code writing to production?

    -Understanding the entire delivery process is significant because it allows developers to identify potential points of failure and implement necessary fail-safes, thus reducing the likelihood of bugs in production.

  • What is the importance of writing simple code to minimize bugs?

    -Writing simple code is important because it reduces unnecessary complexity in the delivery process, making it easier to manage and less prone to introducing bugs.

  • Can you provide an example of a common mistake that leads to bugs, as mentioned in the script?

    -An example of a common mistake is when developers forget to consider the execution environment or logical errors, such as not properly importing code libraries into a Docker container, leading to a broken container build.

  • What strategies are suggested for dealing with bugs in production?

    -Strategies suggested include using feature flagging, implementing redundancies like multiple availability zones, running end-to-end tests in production, and ensuring that there are checks and health monitors for running code.

  • Why is it recommended to have more than one availability zone when dealing with cloud providers?

    -Having more than one availability zone is recommended to provide redundancy and increase the reliability of the application, ensuring that if one zone goes down, the application can still function in another zone.

Outlines

00:00

πŸ”¬ Best Practices for Writing Bug-Free Code

The speaker emphasizes the importance of automated testing as a fundamental approach to minimize bugs in software development. They introduce the concept of the testing pyramid and suggest understanding the tools and patterns specific to one's tech stack. The narrative includes a real-world example where a lack of communication between development and operations teams led to a production failure due to an untested environment. The speaker advocates for feature flagging and blue-green deployment strategies to ensure backward compatibility and prevent downtime during updates.

05:00

πŸ›  Understanding the Delivery Pipeline to Reduce Bugs

This paragraph delves into the complexities of the software delivery process and the potential for bugs at various stages. The speaker recounts an incident where a developer's update caused a Docker container to fail due to incorrect library paths. The importance of understanding the entire code journey from development to production is highlighted, along with the necessity of implementing fail-safes like automated testing, health checks, and feature flagging. The speaker also stresses the value of writing simple code to reduce unnecessary complexity and the potential for bugs.

10:01

πŸ“š Comprehensive Knowledge for Minimizing Software Bugs

The final paragraph underscores the need for developers to be well-informed about the full spectrum of tools and strategies available to prevent bugs. It discusses the importance of continuous learning in areas such as site reliability engineering. The speaker provides advice on creating a robust system for checking and rectifying issues at each step of the software delivery process. They also mention the significance of keeping the code simple and having strategies in place for production environments, such as end-to-end testing and feature toggling, to ensure software reliability.

Mindmap

Keywords

πŸ’‘Automated Testing

Automated Testing refers to the use of software to execute tests on a program or system to identify any errors or bugs. In the video, it is emphasized as a fundamental practice in reducing bugs, including unit testing and integration testing. The script mentions the importance of understanding the testing pyramid and using appropriate tools for different levels of testing to ensure each layer of the application is functioning as expected.

πŸ’‘Testing Pyramid

The Testing Pyramid is a concept in software testing that illustrates the different levels of tests and their recommended proportions. It typically consists of a large base of unit tests, a middle layer of integration tests, and a smaller top layer for UI tests. The video script discusses the pyramid to highlight the importance of having a balanced approach to testing at various levels to catch bugs early in the development process.

πŸ’‘Unit Testing

Unit Testing is a method where individual components or units of a software are tested to determine if they are fit for use. The script mentions unit testing as the first step in the testing process, ensuring that the code written does not have logical errors and functions correctly in isolation.

πŸ’‘Integration Testing

Integration Testing is the phase where individual units are combined and tested as a group to evaluate their interactions. The video script uses this term to explain the importance of ensuring that when components work together, they perform as expected, which is crucial for catching bugs that may not be evident in unit tests.

πŸ’‘Visual Regression Testing

Visual Regression Testing involves comparing the visual representation of a user interface over time to detect unintended changes. The script briefly touches on this as a type of testing that can be important, especially for UI changes, to ensure that updates do not negatively impact the user experience.

πŸ’‘Feature Flagging

Feature Flagging is a technique used to toggle features on or off in a software application without deploying new code. The video script discusses feature flagging as a strategy to safely deploy new features, allowing developers to test and roll out changes gradually and control the feature release process.

πŸ’‘Blue-Green Deployment

Blue-Green Deployment is a strategy that reduces downtime and risk by running two identical production environments, one active (blue) and one idle (green). The video script uses this term to illustrate a method of deployment that allows for zero-downtime updates and easy rollback if something goes wrong with the new version.

πŸ’‘Backwards Compatibility

Backwards Compatibility ensures that new versions of a product can still function with older versions or components. In the script, it is mentioned in the context of making sure that updates do not break existing functionality, which is essential for maintaining stability when introducing new features or changes.

πŸ’‘Code Simplicity

Code Simplicity advocates for writing code that is easy to understand, maintain, and less prone to errors. The video script emphasizes the importance of simplicity in code design to reduce complexity and the potential for bugs, suggesting that simpler code is easier to manage and test.

πŸ’‘Environment Consistency

Environment Consistency refers to the practice of keeping development, testing, and production environments as similar as possible to avoid discrepancies that can lead to unexpected behavior in production. The script uses the example of a developer who did not consider the production environment's specific conditions, leading to a failed update.

πŸ’‘SRE (Site Reliability Engineering)

Site Reliability Engineering (SRE) is an engineering discipline that incorporates aspects of software engineering and practices from reliability engineering to maintain large-scale, distributed systems. The video script mentions SRE to highlight the specialized knowledge required to master the various aspects of software delivery and ensure reliability, including understanding how to prevent and handle bugs in production.

Highlights

The importance of understanding automated testing, including unit testing and integration testing, to reduce bugs.

The significance of the testing pyramid and its role in identifying the right tools for different testing levels.

The necessity for developers to comprehend the layers of the application that can fail and how to assert each layer's functionality.

The example of a production issue caused by an update to the base URL not being checked with the operations team.

The concept of feature flagging and blue-green deployment as a safeguard against production issues.

The importance of understanding the entire delivery process from code writing to production to identify potential failure points.

The example of a Docker container issue due to a path problem with code libraries not being imported correctly.

The necessity for developers to consider the execution environment and potential logical errors in their code.

The role of health checks in identifying outages in production and the importance of fail-safes.

The recommendation to have more than one availability zone for redundancy, especially when dealing with cloud providers.

The strategy of using feature flagging, rollbacks, and redundancies to mitigate production issues.

The emphasis on writing simple code to reduce unnecessary complexity and potential bugs.

The example of encoding format issues leading to broken text for non-UTF characters in a database.

The need for a comprehensive understanding of the software delivery process and its components to prevent bugs.

The importance of having a gateway system to check the functionality at each step of the delivery process.

The role of unit testing in ensuring the logical correctness of newly written code.

The value of automated testing before production deployment to catch issues like container failures early.

The strategies for ensuring code works correctly in production, including end-to-end testing and feature flagging.

Transcripts

play00:00

hey guys so today you and I are going to

play00:04

talk about

play00:06

very few bugs so let's get into it so

play00:10

the question in question was hi

play00:11

Frederick what are the best approaches

play00:15

to write code with very few bugs

play00:20

well uh there are a few tried and true

play00:23

patterns for how to do this

play00:26

and I'll I'll try to basically give you

play00:32

sort of those tips and tricks that I

play00:34

usually use in order to try to reduce it

play00:36

down to the bare minimum that I can can

play00:38

sustain

play00:39

so first and foremost you need to

play00:41

understand automated testing hopefully

play00:43

that is no-brainer for those of you who

play00:45

have been doing some professional work

play00:46

basically that means unit testing and

play00:49

Trend testing if you have a UI of some

play00:51

sort of visual regression testing

play00:54

etc etc there are many patterns for how

play00:56

to do this but in essence it's good for

play00:58

you to take a look at the testing

play00:59

pyramid and have an understanding of

play01:01

what every stack you're using what is

play01:04

the general way that people use like

play01:08

what are the general tools that your

play01:10

specific stack leverages in order to

play01:12

deal at with the bugs at different

play01:15

levels because this is a very important

play01:17

factor a bug is unintained it's just

play01:21

unintended Behavior at some part of the

play01:24

application that somehow affects the

play01:26

user experience the reason why I say

play01:28

that it might sound obvious but it's

play01:30

actually very important is that you have

play01:32

to understand as a software developer

play01:36

what layers of the application can fail

play01:42

and how do you assert that each

play01:45

individual layer is working as expected

play01:48

this is very complicated

play01:51

and the reason I can give you an example

play01:53

so I was working with a developer of

play01:57

mine

play01:58

a few a while back we I asked this this

play02:02

person was updating making an update to

play02:05

the

play02:07

base URL of of an application

play02:11

and so in the code review he had written

play02:14

unit tests like he had made sure that he

play02:16

had tested everything you know in like

play02:19

properly all the like end-to-end tests

play02:23

had been passing and Etc so basically

play02:25

the whole suite and he

play02:27

had done quote unquote all the things

play02:29

that on a normal story he was expected

play02:32

to do

play02:33

and then he pushes like he merges this

play02:36

code sends it out into production boom

play02:39

production issue immediately

play02:40

uh the reason being why that in what it

play02:44

had why what happens is that the update

play02:47

that he had made had not been checked

play02:50

with the operations team which meant

play02:52

that the load balancer that he was

play02:54

basically in back behind that forwards

play02:58

all the requests was not updated and

play03:00

also so the routing wasn't happening

play03:02

correctly which meant the whole

play03:04

application broke because there was an

play03:06

environment which in which he had not

play03:08

been attested before he released which

play03:11

was the production environment

play03:13

now I talked to him about this and I he

play03:17

said oh well because in this case

play03:18

scenario he he said yeah it would have

play03:21

been great if we had a QA environment or

play03:23

a test environment before we pushed this

play03:25

thing so I could have tested it in that

play03:27

environment and I went that wouldn't

play03:28

have helped you

play03:30

because the real this was in this case

play03:33

it was a front-end Rover because and I

play03:35

said the reason why I wouldn't have

play03:36

helped you was because when you have

play03:38

pushed would you would have pushed it to

play03:40

the QA environment

play03:42

yes very likely you would have seen a

play03:44

breakage and you would have seen that

play03:46

you needed to update the routing of this

play03:48

thing

play03:49

but the question is would you have known

play03:54

how to

play03:56

do that for production as well you might

play03:59

have known that you needed to make the

play04:01

update for production as well but how

play04:02

would you do it in this manner because

play04:05

in this specific manner like there was

play04:06

no way to synchronize the updates of the

play04:10

routing at the same time as pushing into

play04:12

production in other words there's a

play04:14

window where something is inconsistent

play04:18

either the routing is not going to work

play04:20

or the application is not going to be

play04:22

updated so something's going to be

play04:24

broken for a little window of time in in

play04:27

theory basically and so what he and I

play04:30

started talking about was that okay so

play04:32

how do you Safeguard yourself from not

play04:36

making a mistake like this and I said

play04:38

well it's very simple you have to

play04:39

understand how to in this case the like

play04:42

I can give you the short answer feature

play04:44

flagging basically where you need to be

play04:46

able to do a blue green type of

play04:48

deployment thing where you need to be

play04:50

able to

play04:52

spin up a new version of the code like

play04:55

basically a backwards compatibility

play04:57

make sure that all the traffic hits the

play05:00

new application and then degrade the old

play05:03

one because if you don't do that you're

play05:05

basically in a situation where as I said

play05:07

you have temporary outage

play05:08

and maybe that's fine and in the

play05:11

scenario it wasn't a big deal but that's

play05:13

the sort of thinking that you need to

play05:15

have when you're trying to figure out

play05:17

how to reduce the amount of bugs you

play05:19

have because the the bugs that guys

play05:22

bikes can happen in so so so many ways

play05:25

and so it's really important for you to

play05:28

have an understanding of the impacts

play05:31

like like all the parts that make up

play05:34

everything from how you get the code

play05:36

from your laptop all the way into the

play05:38

production environment and when it's

play05:39

actually running in production this

play05:41

takes years

play05:45

and it's a full stack knowledge level

play05:48

that is required to do this effectively

play05:50

to to really do it effectively I had

play05:53

another co-worker the other day who made

play05:57

a update to the source code

play06:00

was very happy with it had written unit

play06:03

tests etc etc and then he pushed it and

play06:06

luckily for us we had automated testing

play06:08

that in this case it was a Docker

play06:10

container everything was running in so

play06:12

the docker container was trying to be

play06:13

run and this is where you want to have

play06:16

something like automated testing before

play06:18

things go out into production which is

play06:20

in this case it was so the container was

play06:22

trying to be built and then it's frozen

play06:24

error because the change that he had

play06:26

made was only the bug only existed

play06:29

basically here in a routing problem not

play06:31

a routing problem he had a path problem

play06:33

to certain libraries code libraries that

play06:37

he was depending on those code libraries

play06:39

were not being imported into the

play06:40

Container correctly so now the container

play06:42

broke

play06:43

and when he made the change

play06:47

he didn't think about that because he

play06:48

didn't think about the next step okay

play06:50

what happens when this thing goes into

play06:51

the next container now this is the main

play06:54

reason why bugs happen guys the

play06:56

developers are forgetting about some

play06:59

execution environment or like there

play07:01

might be a logical error Etc so there's

play07:03

tons of ways for this to happen but for

play07:05

you to understand how the testing

play07:07

Pyramid has to work

play07:09

you have to also understand all the

play07:12

things that are going to happen between

play07:14

you wrote the code in your on your

play07:18

workstation all the way until it's out

play07:20

in production and what can possibly go

play07:22

wrong when it is in production and apply

play07:27

the necessary fail safes in order to

play07:30

deal with that problem so as I said unit

play07:32

test that's what the junior learns

play07:34

that's the first step just to make sure

play07:35

that the thing that you just wrote

play07:37

doesn't like logically break somehow

play07:39

then the next thing is integration test

play07:41

you need to be able to connect to

play07:43

systems or something like that to make

play07:44

sure that the system still works then

play07:46

you might have health checks because

play07:48

something might you know if something's

play07:49

running in production you don't know

play07:51

maybe something there's an outage

play07:53

somewhere and some service Goes Down And

play07:55

if you don't know about it well then

play07:56

it's down and your system is now you

play07:59

know there's a bug there's a temporary

play08:00

outage etc etc and then you have

play08:02

questions about okay so if now something

play08:04

breaks into production how do you

play08:05

actually go back from that how do you

play08:06

save yourself from that that's for

play08:08

feature flagging for example can be in

play08:09

all alternative you can have multiple

play08:11

redundancies with like this is one of

play08:14

the reasons why we recommend usually to

play08:16

have more than one

play08:17

you know availability zone or something

play08:19

like that if you're dealing with Cloud

play08:21

providers etc etc there's tons and tons

play08:24

and tons of weight everything from

play08:26

rollbacks to redundancies to Future

play08:28

flagging to like backwards compatibility

play08:30

uh like graceful degradation etc etc and

play08:34

in order for you to write a few bugs as

play08:37

as possible you basically need to know

play08:39

as I said how does the entire delivery

play08:41

look like what can go wrong in

play08:44

production and what are the remedies

play08:46

that you can apply in order to mitigate

play08:48

those sorts of problems and I'll give

play08:50

you one more which is a very important

play08:52

one make sure that the code that you are

play08:55

writing is simple

play08:58

the reason why I say that it might sound

play09:00

simply and like obvious but the reason

play09:03

why I say that is because the simpler

play09:07

you can design the code

play09:09

the less likely you are to create

play09:13

unnecessary complexity around the

play09:16

delivery process if you have the option

play09:18

of picking something that is very simple

play09:20

very stable something that like where

play09:22

you're basically just trying to keep

play09:24

down the complexity of what you're doing

play09:27

that's always almost always a good

play09:30

investment for you because a lot of

play09:31

these like the more complex as I was

play09:34

saying with these two examples the more

play09:36

complicated your environment is the more

play09:39

of things you have to keep in mind

play09:41

whenever you're doing the coding because

play09:42

as you can imagine it's almost

play09:44

impossible for you to sit there and go

play09:46

yeah I wrote I changed an example would

play09:50

be I changed the uh symbols here like I

play09:55

I added a few classic one is a German

play09:57

signs so like something like a German

play09:59

characters or Chinese characters or

play10:00

something like that here in this little

play10:01

text here that's going to be translated

play10:03

I wonder if we have the right game for

play10:07

the encoding format in our database

play10:10

which is another one that I've found as

play10:11

well like we've I've been working on

play10:13

projects where literally they have the

play10:15

wrong encoding format they store things

play10:17

in the incorrect format and send it back

play10:19

to us so all we get is this little like

play10:21

you know the placeholder

play10:24

symbol for all of the non-utf I don't

play10:27

remember which encoding they were using

play10:28

but basically it was okay we had to show

play10:31

broken text to all of our customers from

play10:33

Germany etc etc because they had not

play10:36

considered that this would have been a

play10:38

problem so what I want you to take away

play10:40

from this is that the way that you write

play10:42

as few bugs as possible is number one

play10:44

you have to get yourself informed of the

play10:48

main testing tools you usually use the

play10:49

common ones or you know unit testing and

play10:51

then testing visual regression testing

play10:55

feature flagging things like that but

play10:58

there's more to it because it's a minute

play11:00

guys it's a whole education to learn all

play11:02

this sort of stuff this is sort of what

play11:03

the site reliability engineers and

play11:05

things like you know there's there's an

play11:07

entire job to master all of this stuff

play11:09

right but that is basically what it

play11:11

comes down to because what you have to

play11:13

understand is that there is

play11:15

no like like the entire way from your

play11:19

laptop all the way up

play11:21

to how something is running in

play11:23

production can break

play11:25

and the best way for you to reduce the

play11:28

amount of bugs is for you to create a

play11:29

Gateway system where you know that you

play11:32

have a proven way to determine at each

play11:34

step of the delivery process

play11:37

if something is working if it's working

play11:39

correctly and some way to rectify the

play11:43

problem if something did not

play11:45

function correctly so as I was saying in

play11:48

production environments you have a few

play11:50

strategies at the code level you have a

play11:52

few strategies and there's tons of this

play11:54

stuff but basically try to keep things

play11:57

as simple as possible and then inform

play11:59

yourself about all right how do I make

play12:01

sure that all quote unquote things that

play12:03

can go wrong with my code have been

play12:06

accounted for and that there is some way

play12:08

for me to check if that thing is

play12:10

actually working

play12:12

unit test is a good start making sure

play12:14

that you know your containers and your

play12:16

production bills are working correctly

play12:17

making sure that you have some type of

play12:19

way of checking the running code before

play12:21

it goes on into production is a good

play12:22

investment making sure that as I said

play12:25

visual regression testing or end-to-end

play12:27

testing like features are working even

play12:28

in like a QA environment doesn't have to

play12:31

be a queue environment and then when

play12:32

things are in production make sure that

play12:34

things are working in production there's

play12:35

tons of different ways of doing that

play12:37

everything from running end to end

play12:38

testing production to you know pinging

play12:40

things or you know doing feature

play12:43

flagging and turning features on and off

play12:46

etc etc there's a whole like there's so

play12:49

much content related to this guys so the

play12:52

I suggest that you have a look but keep

play12:54

that fundamental thing with you the most

play12:57

important thing is that you understand

play12:58

how your software it goes from your

play13:01

laptop all the way out into production

play13:03

and all the things that are happening as

play13:06

part of that process because then you

play13:08

know what all the components are and how

play13:10

they in their unique way can fail you

play13:13

have a great day

Rate This
β˜…
β˜…
β˜…
β˜…
β˜…

5.0 / 5 (0 votes)

Related Tags
Code QualityTesting TipsBug ReductionSoftware DevelopmentAutomated TestingUnit TestingIntegration TestingProduction BugsFeature FlaggingDeveloper Best PracticesSRE Knowledge