A Beginners Guide to Code Review

The Cyber Mentor
1 Mar 202413:17

Summary

TLDRThis video offers a comprehensive guide on conducting a security-focused code review to identify vulnerabilities. It introduces a general methodology, emphasizing the importance of understanding application structure and behavior. The tutorial reviews a sample code snippet, highlighting the detection of sources, sinks, and middleware issues, and the need for secure coding practices. The presenter recommends using tools like 'Sneak' for real-time vulnerability scanning and suggests resources for further learning, such as Sourcecodes.com and Pentester Lab, to enhance one's code review skills.

Takeaways

  • 🛠️ The video introduces a methodology for conducting a security-focused code review to identify weaknesses and vulnerabilities in code.
  • 🤖 AI tools can assist in writing code, but they may also introduce security vulnerabilities due to the quality of the training data they were based on.
  • 🔒 'Sneak' is a tool highlighted in the video that helps secure code by scanning for vulnerabilities and providing real-time fixes.
  • 🔎 Code review from a security perspective is different from the typical development process and is crucial for identifying unique security issues.
  • 👀 Understanding the application's environment and how input flows through it can reveal the need for unique security measures not found by automated testing.
  • 📚 The script emphasizes the importance of learning about dangerous functions and their potential exploits in the technology being used.
  • 🔍 During code review, it's essential to identify all sources of input and sinks where input is processed, as well as any middleware that influences the data flow.
  • 🚫 The video points out the risk of hardcoded credentials in the source code and the need to avoid this practice for security reasons.
  • ⚠️ The use of dynamic queries instead of parameterized statements can lead to vulnerabilities such as SQL injection.
  • 🧐 The script discusses the importance of scrutinizing sanitization functions to ensure they are effectively removing or neutralizing harmful input.
  • 🔄 Consistency in code review practices is key to mastering the skill and improving the ability to find vulnerabilities over time.
  • 🌐 Resources like Sourcecodes.com and Pentester Lab are recommended for practicing code review and learning more about web application security.

Q & A

  • What is the primary focus of the video script?

    -The video script focuses on code review from a security perspective, aiming to identify weaknesses and vulnerabilities in code, particularly when it is generated by AI tools.

  • What is the role of 'Sneak' as mentioned in the script?

    -Sneak is a tool that scans code for vulnerabilities in real time, providing recommended fixes that can be applied with a single click. It is designed to secure code whether it is written by humans or generated by AI.

  • Why is code review important from a security standpoint?

    -Code review is important from a security standpoint because it helps identify weaknesses and vulnerabilities within applications before they can be exploited, which is more proactive than finding issues through fuzz testing live applications.

  • What are 'sources' and 'sinks' in the context of code review for security?

    -In the context of security, 'sources' are points where user input enters the application, and 'sinks' are functions that can execute or process this input, potentially leading to security issues if not handled correctly.

  • What is the recommended approach to understanding the application structure and its source code during code review?

    -The recommended approach is to first understand the routing, input structure, and general application layout. Then, identify the dangerous functions within the technology being used and learn about their potential exploits.

  • Why is it not advisable to dive too deeply into specific issues during the initial pass of code review?

    -Diving too deeply into specific issues during the initial pass can lead to losing sight of the overall application structure and potentially wasting time on less critical issues. It's better to get a holistic understanding first before focusing on specific vulnerabilities.

  • What is the significance of identifying hardcoded credentials in the source code during code review?

    -Identifying hardcoded credentials is significant because it is a common security flaw that can lead to unauthorized access to sensitive information. It's important to note such issues as they can be exploited by attackers.

  • What is the potential security risk associated with dynamic queries in the script?

    -Dynamic queries, which are not parameterized or prepared statements, can lead to security risks such as SQL injection because they mix data and code, allowing for potential manipulation of the query by an attacker.

  • What is the purpose of the 'sanitize' function mentioned in the script, and what are its limitations?

    -The 'sanitize' function is meant to remove or modify potentially harmful keywords in the input to prevent SQL injection. However, its limitations include not being recursive, potentially missing certain keywords, and not handling special characters or case insensitivity effectively.

  • What is the recommended approach to dealing with input sanitization to follow best practices?

    -The recommended approach is to use standard library functions for sanitizing input, such as 'MySQL real escape string', and to ensure that the sanitization is applied consistently and securely across the codebase.

  • What resources are suggested in the script for someone looking to improve their code review skills?

    -The script suggests using code snippets from sites like Sourcecodes.com and resources like Pentester Lab, which offers a free introduction to code review and further exercises for a subscription fee.

Outlines

00:00

🔍 Introduction to Code Review for Security

In this introductory paragraph, the speaker sets the stage for a video focused on code review from a security perspective. The goal is to identify vulnerabilities in code, whether written by humans or generated by AI. The speaker introduces 'Sneak', a tool that integrates with IDEs to scan code in real-time for vulnerabilities and provides fixes with a single click. The video aims to provide a general methodology, useful resources, and tools for improving code security. The importance of understanding the application's environment and the flow of inputs is highlighted as crucial for identifying unique security issues that automated testing might miss.

05:02

📝 Methodology for Conducting a Code Review

This paragraph delves into the methodology of conducting a code review with an emphasis on security. It begins by defining what a code review is in the context of security and why it's essential beyond just testing live applications. The speaker suggests starting with understanding the application's structure and identifying sources (like user input) and sinks (like dangerous functions). The paragraph also touches on the importance of recognizing risky developer behaviors, such as hardcoding credentials or creating backdoors. The speaker recommends learning about dangerous functions in the technology stack being used and understanding the application's behavior to find vulnerabilities more easily.

10:07

👀 Practical Code Review: Identifying Sources, Sinks, and Middleware

The speaker provides a practical example of a code review by analyzing a snippet of code written by an AI tool. The focus is on identifying all sources of input, such as 'username', 'email', and 'password', and all sinks, which are points where input is executed or stored, such as database queries. The paragraph discusses the importance of middleware that sanitizes input and the potential issues with inadequate sanitization methods. It also points out the risks associated with dynamic queries that are not parameterized, which can lead to SQL injection vulnerabilities. The speaker emphasizes the need to scrutinize functions for their effectiveness and to ensure they follow best practices for security.

🛠️ Tools and Resources for Enhancing Code Review Skills

In the final paragraph, the speaker wraps up the video by discussing the next steps in enhancing one's code review skills. They recommend starting with code snippets and using resources like Sourcecodes.com and Pentester Lab for practice. The speaker stresses the importance of consistency in code review to improve over time and suggests using tools to supplement manual code review. They also caution about the potential risks of working with untrusted code sources and the need for precautions like using virtual machines. The video concludes with a reminder to focus on understanding the application deeply to uncover hidden weaknesses and vulnerabilities.

Mindmap

Keywords

💡Code Review

Code review is the process of examining source code to identify weaknesses and vulnerabilities, particularly from a security perspective. In the video, it is the primary focus for improving software security by manually checking code for potential issues. The script mentions that code review is different from how it's performed by software engineers for development purposes, emphasizing the need to understand the application's environment and data flow.

💡Security Perspective

A security perspective in code review involves looking at the codebase to find potential risks that could be exploited. The video script discusses this approach, explaining that it helps to identify unique vulnerabilities that automated testing might miss, such as second-order attacks or application-specific payloads.

💡AI Tools

AI tools are mentioned in the context of generating code that may contain security vulnerabilities due to the quality of the training data. The video suggests that while AI can assist in various tasks, including coding, the security of the generated code is dependent on the data it was trained on.

💡Sneak

Sneak is a tool highlighted in the video for securing code, whether written by humans or generated by AI. It works by scanning code in real-time within an IDE, flagging vulnerabilities, and providing recommended fixes that can be applied with a single click.

💡Vulnerabilities

Vulnerabilities refer to weaknesses in the code that can be exploited by attackers. The script discusses how code review from a security perspective aims to identify these vulnerabilities, such as hardcoded credentials or improper input sanitization.

💡Hardcoded Credentials

Hardcoded credentials are security vulnerabilities where sensitive information like usernames and passwords are directly embedded in the source code. The video script points out hardcoded credentials in the example code as something to be avoided.

💡Sources and Sinks

In the context of security, sources are inputs like user data, and sinks are functions that can potentially execute harmful input as code. The video emphasizes the importance of identifying both sources and sinks during a code review to understand the flow of data and where vulnerabilities may arise.

💡Input Sanitization

Input sanitization is the process of cleaning up user input to prevent security threats like SQL injection or cross-site scripting. The script critiques an example of inadequate sanitization and suggests following best practices for securely handling user input.

💡SQL Injection

SQL injection is a type of security vulnerability where an attacker can insert or 'inject' malicious SQL code into a query. The video script identifies dynamic queries without parameterization as a risk for SQL injection.

💡Cross-Site Scripting (XSS)

Cross-site scripting is a security vulnerability that allows attackers to inject malicious scripts into web pages viewed by other users. The video mentions the potential for XSS if user input becomes part of an error message displayed on a web page.

💡Best Practices

Best practices refer to the recommended methods or techniques for performing a task, such as sanitizing input or using prepared statements in database queries. The video script encourages following these practices to improve code security.

Highlights

Introduction to code review from a security perspective to identify vulnerabilities.

AI tools can generate code, but the security of the code depends on the data it's trained on.

Sneak tool introduced for securing code, whether written by humans or AI, with real-time vulnerability scanning and fixes.

Code review is essential for understanding the application's environment and input flow for identifying unique vulnerabilities.

The importance of identifying sources and sinks in the code for security analysis.

Methodology for preparing to review code by understanding application structure and dangerous functions.

Risks associated with hard-coded credentials in source code.

Reviewing the MySQL connection setup for hardcoded credentials as a common finding.

The role of input functions in security and their potential impact on application behavior.

Identifying sources such as 'username', 'email', and 'password' from the request body in the code.

Use of bcrypt for hashing passwords and the importance of secure password handling.

Risks of dynamic queries without parameterization leading to vulnerabilities like SQL injection.

Analysis of the 'sanitize' function as middleware and its effectiveness in preventing SQL injection.

The importance of testing sanitization functions for recursion and case insensitivity.

Recommendation to follow standard best practices for input sanitization.

The need to understand the application's behavior to identify hidden weaknesses and risky behavior.

Consistency in code review as a skill that improves over time and leads to better vulnerability detection.

Use of manual code review followed by tools to find new areas of the application.

Resources like Sourcecodes.com and Pentester Lab for practicing code review with real vulnerabilities.

Final thoughts on the importance of code review in web application security.

Transcripts

play00:00

welcome back to another video and today

play00:02

we're going to get started with code

play00:04

review to find vulnerabilities I'll

play00:07

share with you how to get started a

play00:09

general methodology ways to improve and

play00:12

some useful resources and tools too just

play00:16

before we dive in though I should

play00:17

mention that we're going to be looking

play00:19

at code review from a security

play00:21

perspective to identify weaknesses and

play00:24

vulnerabilities so this is a little bit

play00:27

different to how code review is carried

play00:29

out from a velopment standpoints by

play00:31

software Engineers pardon the

play00:33

interruption AI tools can be super handy

play00:36

they can help you write a poem sve

play00:38

captures and even write code but is that

play00:42

code secure well AI is only as good as

play00:45

the data it's trained on which means

play00:47

vulnerabilities in code could be there

play00:49

as it's generated or written but that's

play00:52

where sneak comes in sneak makes it fast

play00:55

and easy to secure code whether it's

play00:57

being written by you or generated by Ai

play01:01

and here's how it works you use your AI

play01:03

tools to generate code and put it into

play01:05

your IDE and sneak scans that code

play01:07

flagging vulnerabilities in real time

play01:10

you then get recommended fixes for those

play01:12

vulnerabilities that you can apply with

play01:14

just a single click and so whether

play01:16

you're using AI or writing code yourself

play01:19

you can give sneak a try for free today

play01:22

by going to snak.io

play01:24

slthe cyber Menor and of course there is

play01:27

a link in the description below if you

play01:30

enjoyed the video don't forget to like

play01:32

And subscribe and let's dive in so first

play01:35

up what is code review exactly well in

play01:37

the context of security code review

play01:39

helps us identify weaknesses and

play01:42

vulnerabilities within applications and

play01:44

you might be thinking why can't I just

play01:47

fuzz and test the live application to

play01:49

find issues well of course you can but

play01:52

understanding the environment in which

play01:54

the application is operating and looking

play01:56

at how input flows from sources to syns

play01:59

and the journey that it takes to get

play02:01

there which can include things like

play02:03

Security checks and going through

play02:05

middleware could unveil the need for a

play02:08

payload that is unique to the

play02:09

application or a variation on a pattern

play02:12

that hasn't been seen before it might be

play02:15

that multiple inputs are needed or the

play02:17

attack is a second order attack and

play02:19

therefore scanners struggle to find it

play02:22

so code review is a useful skill to

play02:25

develop especially if you're on the hunt

play02:27

for cves or looking to contribute to

play02:29

open-source projects so how do we get

play02:32

started well let's talk about the

play02:34

methodology first most of the time we

play02:37

have an application and we want to look

play02:39

for sources and syncs sources are things

play02:42

like user input and syncs are things

play02:45

like dangerous functions that can

play02:47

execute the input as code most famously

play02:51

we have the eval function if you're just

play02:53

starting out though what I'd recommend

play02:55

doing is two things to prepare first try

play02:58

to understand the structure of the

play03:00

application and its source code for

play03:02

example how is the routing handled what

play03:05

does the input look like and the general

play03:07

structure of the

play03:09

application second what dangerous

play03:11

functions exist within the technology

play03:14

that you're working with with PHP for

play03:16

example we can find a list to get

play03:18

started with here and start to learn

play03:20

about what these functions do and how we

play03:22

can exploit an application that is using

play03:25

them it is also worth mentioning at this

play03:27

point that these are not the only things

play03:29

we're on the lookout for as we're trying

play03:31

to understand the application Behavior

play03:33

we also want to look for risky things

play03:35

that the developer might have done maybe

play03:38

there is some hard-coded back door for

play03:40

admins or input that's stored in the DB

play03:42

and then saved to a to PHP file which

play03:45

can then be executed later when the page

play03:47

is called so really our goal is to

play03:51

understand the application and once we

play03:54

understand it vulnerabilities are

play03:55

relatively easy to find later on in our

play03:58

journey we can can dive more into

play04:00

different vulnerabilities their patterns

play04:02

or signatures and deeper behavior of the

play04:05

application but I think this is enough

play04:07

for us to get started so let's find a

play04:10

nice snippet of code to take a look at

play04:13

so here we are and I have some code that

play04:15

I got chat GPT to write for me and then

play04:18

we're just going to review it and what

play04:22

I'm going to do is I'm going to step

play04:23

through it and of course in larger

play04:25

applications you probably want to break

play04:26

it down into subsets of functionality or

play04:29

into some logical sections um but we're

play04:32

just going to review this whole code and

play04:35

as we step through we have a few things

play04:37

that we want to achieve we want to

play04:39

identify all of these sources or inputs

play04:42

we want to identify all of the syncs or

play04:45

outputs and then identify any middleware

play04:48

or the rout that sources take to their

play04:50

syns so I think up here we can ignore

play04:54

these includes and we can ignore the

play04:57

express information and here we come

play05:01

down to mySQL connection setup this is

play05:04

probably our first finding something

play05:06

that you'll run into a lot and that is

play05:08

hardcoded credentials so we don't really

play05:11

want to see this in our source code

play05:12

although it's fairly common to see

play05:14

usernames and passwords and other

play05:16

sensitive information like Secrets or

play05:18

tokens or keys hardcoded so this is

play05:22

something that we would note down

play05:23

straight away as we keep going down we

play05:25

have our connection to mySQL and then we

play05:28

have a s ize input function and when I

play05:32

see this function I think ah this is to

play05:34

do with security this could impact the

play05:36

application and how it behaves so

play05:39

probably what I'd do is I'd add this to

play05:40

my notes to review later on as a general

play05:43

rule especially on my first pass I try

play05:46

not to dive into anything too deeply

play05:48

otherwise I find that I'll lose all of

play05:50

my time going into a rabbit hole I want

play05:52

to review the whole application first

play05:54

and then get a better understanding of

play05:56

what areas I need to put my time into so

play05:59

so we'd note this down to come back to

play06:02

later on as we come further down we can

play06:05

see some of our first sources so here we

play06:08

have username and this is the request.

play06:11

body. username and then we have email

play06:13

request. body. email and of course the

play06:16

password request. body. password Here

play06:19

further down we can see that we're using

play06:21

bcrypt to Hash the password and then if

play06:23

there's an error we return error hashing

play06:26

password to the user otherwise the

play06:28

password becomes the out put hash and

play06:30

then here we're going to insert the data

play06:33

into the database by the looks of it and

play06:36

this is our first sync so for example

play06:39

here we have insert into users and we

play06:42

have username email and password and the

play06:45

first thing to notice is that this is a

play06:47

dynamic query so it's not parameterized

play06:50

it's not a prepared statement we're

play06:51

actually taking this variable and

play06:53

placing it straight into the code so

play06:56

mixing data and code is bad practice and

play06:58

that's something to be on the lookout

play07:00

for and there are a lots of

play07:01

vulnerabilities like SQL injection that

play07:04

arise from mixing data and code and we

play07:07

come back here and we also see another

play07:10

Sync here so we see this error. SQL

play07:13

message now we obviously didn't see any

play07:16

particular input that was SQL message

play07:18

maybe we have control over this or

play07:20

partial control over this or maybe we

play07:22

don't if our username input for example

play07:25

becomes part of this error. SQL message

play07:28

then maybe we can get cross-site

play07:29

scripting but we need to go back through

play07:32

and understand how this is formed and

play07:35

what inputs can be used to influence it

play07:38

so those are our two syncs all right so

play07:41

let's come back up and what we want to

play07:42

do is check to see whether this username

play07:46

and email is exploitable so we can see

play07:49

that they're both using this sanitize

play07:51

function as middleware so if we scroll

play07:54

up we can now scrutinize this function

play07:58

and take a look and see where whether

play07:59

it's effective so we're essentially

play08:01

doing data replace so it's checking for

play08:05

keywords like select insert delete

play08:07

update drop and alter and whenever we do

play08:11

this kind of function there are some

play08:12

things to think about so is it done

play08:15

recursively is everything in there

play08:17

that's needed to be in there is it case

play08:19

insensitive and there are generally just

play08:22

a lot of edge cases that can bypass this

play08:24

kind of thing so what this is going to

play08:26

do is when we pass in some data if it

play08:28

finds the select keyword it's going to

play08:31

Simply remove it and since we have the G

play08:34

flag here this is a global flag so if we

play08:37

have multiple instances of Select it's

play08:39

going to get removed and also we have

play08:42

the I flag as well so this is going to

play08:44

be case insensitive so for

play08:47

example select like this is not going to

play08:50

work because it's case insensitive or if

play08:54

we do something like select select it's

play08:57

going to find all of them notice that

play09:00

it's not removing things like special

play09:02

characters and also some keywords like

play09:05

Union for example are also missing

play09:07

something I would test for here is

play09:10

whether it's recursive so we know that

play09:12

the global flag is going to find all

play09:14

instances of the keyword select for

play09:17

example but I don't know whether it's

play09:19

going to find select here remove it and

play09:24

then the output will still be vulnerable

play09:26

something I would have to test and the

play09:28

key thing to remember here is that some

play09:30

scanners might be fooled by inadequate

play09:33

input sanitization and sometimes they

play09:35

might not but what we want to do is

play09:38

identify this as a weakness now even if

play09:41

this isn't exploitable if you're an

play09:42

aback engineer or if you're working as

play09:44

part of a development team maybe suggest

play09:46

that they follow the standard best

play09:48

practice or the normal way of sanitizing

play09:51

input and I can't remember off the top

play09:54

of my head exactly how to do this but I

play09:56

suspect we want something like return

play10:01

MySQL MySQL do Escape data and we

play10:06

probably want to trim this as well so

play10:10

something like this of course check the

play10:12

documentation check the best practice

play10:14

but again when we see something that is

play10:17

a little weird even if we can't exploit

play10:19

it think about what the right way or the

play10:21

standard way that's widely accepted to

play10:24

do things is and then try and Implement

play10:26

that instead and of course down here as

play10:29

well we'd want to update this statement

play10:32

and then maybe we would want to try and

play10:34

make sure that we escape this output as

play10:37

well so that we're not just putting raw

play10:39

data into this alert box but very

play10:43

quickly you can see that we've

play10:44

identified some issues and some sources

play10:47

and syncs and some middleware that might

play10:49

be faulty and this is the goal of code

play10:52

review understanding the application how

play10:55

it's behaving and how the data is

play10:57

Flowing between different branches of

play11:00

code so now when we are looking at code

play11:02

we can look at what middleware or

play11:04

functions are being used for security

play11:07

and determine if they are standard

play11:08

libraries or custom written are they

play11:10

applied consistently across the code

play11:12

base written securely and following the

play11:15

best practices and understand the

play11:17

context around how code is reaching the

play11:20

function and what's being returned now

play11:23

to start with of course you're going to

play11:24

be looking for lwh hanging fruits and a

play11:27

lot of tools can detect these issues

play11:30

automatically so what I recommend is

play11:32

instead of using vulnerabilities as a

play11:34

measure of success use understanding of

play11:37

the application instead think to

play11:39

yourself do I understand how this

play11:42

application works can I follow the code

play11:45

and the more you do that the more likely

play11:47

you are to find hidden weaknesses risky

play11:49

behavior and ultimately more

play11:51

vulnerabilities one more thing to think

play11:53

about is consistency code review is a

play11:56

skill that requires time to learn and

play11:58

eventually master so if you are

play12:01

consistent with it then over time you'll

play12:03

be able to reap the rewards I recommend

play12:06

you step through the code manually first

play12:08

and then add tools later on to see what

play12:10

you missed or if they help you find new

play12:13

areas of the application to explore so

play12:16

where do we go from here I usually

play12:17

recommend that you start out with code

play12:19

Snippets but actually this site Source

play12:22

codes to.com has a ton of projects and

play12:25

many of them have critical

play12:27

vulnerabilities waiting for you to find

play12:30

just be aware that when you're working

play12:31

with code from an untrusted Source you

play12:33

might be dealing with something that has

play12:35

a back door or something malicious

play12:36

inside so take precautions use a virtual

play12:39

machine etc etc another resource that

play12:42

I'd recommend to get started is

play12:44

pentester lab there is a free

play12:46

introduction here with some code for you

play12:49

to review and unfortunately the rest of

play12:51

the exercises for the code review badge

play12:54

require a subscription but if you can

play12:56

afford it it's definitely worthwhile in

play12:58

my op opinion pentester lab is a great

play13:00

platform in general if you're interested

play13:03

in taking more steps into web

play13:05

application security and web app pen

play13:08

testing and that's it for this video I

play13:10

hope it helps you get started on your

play13:12

journey into code review and I will

play13:14

catch you next time

Rate This

5.0 / 5 (0 votes)

Related Tags
Code ReviewSecurityAI ToolsVulnerabilityWeb SecurityBest PracticesSanitizationInput ValidationCode SnippetsSecure CodingEducational