Algorithms: Solve 'Shortest Reach' Using BFS

HackerRank
27 Sept 201604:43

Summary

TLDRGayle Laakmann McDowell demonstrates how to implement the shortest reach algorithm using breadth-first search in code. She initializes key data structures like a queue and distances array, walks through traversing graph neighbors to update distances, and returns the final distances array. She emphasizes only using necessary variables, choosing the right algorithm upfront, and being comfortable with foundational graph search algorithms that come up in interviews and sometimes in real engineering work.

Takeaways

  • 😊 Use BFS when trying to find the shortest path, as it is designed for that
  • 👍 Have a queue to keep track of nodes to process next in BFS
  • 🔑 No need for a separate visited array, can check if distance != -1
  • 📏 Initialize distances array with -1, update to 0 for start node
  • 😎 Distance to neighbor = distance to current node + 1
  • ⚠️ Avoid too many unnecessary variables like visited array
  • 🤔 Understand the problem before coding to know what's needed
  • 🚫 DFS would not work well for shortest path problem
  • 📚 Good to be familiar with BFS and DFS as fundamentals
  • 💪 BFS useful for shortest path and real world problems

Q & A

  • What algorithm does Gayle recommend using for the shortest reach problem?

    -Gayle recommends using breadth-first search, as it is designed to find the shortest path between nodes in a graph.

  • Why does Gayle create a queue data structure?

    -Gayle creates a queue to track the next nodes to process in the breadth-first search. The queue follows the first-in, first-out ordering needed for BFS.

  • What array does Gayle initially create and then realize is unnecessary?

    -Gayle initially creates a boolean visited array to track which nodes have been visited. She then realizes it is unnecessary because that information is already encoded in the distances array.

  • How does the distances array work?

    -The distances array tracks the shortest distance from the start node to each other node. Unreached nodes are marked with -1, while reached nodes store the distance.

  • How is the distance to a neighbor node calculated?

    -The distance to a neighbor is the distance to the current node plus 1, representing the weight of the edge between them, which is defined to be 1.

  • What value does Gayle initialize the start node's distance to in the distances array?

    -Gayle initializes the start node's distance to 0 in the distances array, since the distance from a node to itself is 0.

  • What common mistake does Gayle say candidates make on this problem?

    -Gayle says a common mistake is using too many variables, like separate visited and distances arrays, instead of understanding what is actually necessary.

  • Why is depth-first search a poor choice for this problem?

    -Depth-first search is poor because it does not guarantee finding the shortest path between nodes like breadth-first search does.

  • Where can breadth-first and depth-first search be useful outside of interviews?

    -Breadth-first and depth-first search can be useful in real-world applications like social networking, routing algorithms, and AI pathfinding.

  • What key things does Gayle recommend remembering about graph searches?

    -Gayle recommends being comfortable with breadth-first and depth-first search, understanding when to apply each one, minimizing variables, and double checking your algorithm choice.

Outlines

00:00

😊 Introducing the Shortest Reach Problem and Solution Approach

The paragraph introduces the shortest reach problem, where given a graph with labeled nodes, the goal is to return the shortest path distances from a starting node to all other nodes. It mentions that breadth-first search is a good algorithm for this since it finds shortest paths. The solution sets up data structures like a queue to track nodes and a distances array to ultimately return.

😉 Explaining and Simplifying the BFS Solution

The paragraph simplifies the initial BFS solution by removing the separate visited array, instead using the distances array for tracking visited status. It then explains how distances will be filled out in the BFS traversal. It also adds logic to process neighbors, updating distances and queue appropriately.

😁 Finalizing Solution and Demonstrating Correct Output

The paragraph finalizes the BFS solution by setting the initial distance value for the start node to 0. It then runs the code, showing that it works correctly. The speaker advises avoiding extraneous variables and choosing the right algorithm upfront.

Mindmap

Keywords

💡Breadth-first search

Breadth-first search is an algorithm for traversing or searching a graph data structure. It starts at a root node and explores all the neighboring nodes first, before moving to the next level of neighbors. In this video, breadth-first search is the perfect algorithm for finding the shortest path from a starting node to all other nodes in the graph. It allows tracking of the shortest distances.

💡Queue

A queue is a data structure that maintains the order of elements in a first-in-first-out (FIFO) manner. In a breadth-first search, a queue is used to track the next nodes to process. As nodes are visited, their neighbors get added to the back of the queue to be processed later.

💡Distances array

The distances array in the code holds the shortest calculated distance of each node from the starting node. It is initialized with -1 for all nodes. As nodes are visited in the breadth-first search, the array gets populated with actual distance values, which are ultimately returned.

💡Visited array

A visited array is a boolean array that tracks which nodes have already been visited in the search. It avoids visiting the same node multiple times. In this code, a separate visited array was created but later removed as the distances array itself served the purpose of tracking visited status.

💡Neighbors

In a graph data structure, each node has edges connecting it to other nodes, called its neighbors. In the breadth-first search code, the neighbors of each visited node are iterated through and added to the queue for future processing if not already visited.

💡Edge distance

An edge is a connection between two nodes in a graph. An edge can have an associated value or distance. In this code, all edge distances are defined to be 1, so the distance to a node is the distance to parent node + 1 for the connecting edge.

💡Start ID

The start ID refers to the starting node in the graph from which shortest paths need to be calculated to every other node. The distance of the start ID node from itself is initialized as 0 in the distances array.

💡Depth-first search

Depth-first search is another graph traversal algorithm, but it explores as far as possible along each branch before backtracking. Depth-first search would not work for this shortest reach problem because it does not necessarily find shortest paths.

💡Graph

A graph consists of nodes or vertices connected by edges. The input to this problem is a graph data structure where each node has a unique ID and connections to neighboring nodes. The algorithm performs a traversal across this graph.

💡Shortest path

The shortest path refers to the path between two nodes in a graph with the minimum number of edges. Breadth-first search visits nodes in increasing order of distance from the start node, and thus is able to calculate shortest paths.

Highlights

The shortest reach problem requires finding the shortest path from a starting node to each other node in a graph

Breadth-first search is a perfect algorithm for this problem since it is designed to find shortest paths

A queue data structure tracks the next nodes to process in breadth-first search

A boolean array tracking visited nodes is unnecessary since distances can indicate unvisited nodes

The distance to a neighbor node is the distance to the current node plus 1 for the edge

Initialize the starting node's distance to 0 in the distances array

Too many unnecessary variables overcomplicate the solution

Pick the right algorithm upfront instead of depth-first search

Be comfortable with breadth-first and depth-first search algorithms

These graph search algorithms apply to many interview questions and occasionally real-life problems

The distances array tracks the shortest path distance to each node

Initialize distances array with -1 to indicate unvisited nodes

Only need to check if a node is unvisited by seeing if its distance is -1

Update the neighbor's distance when traversing neighbors

Return the distances array at the end as the shortest path lengths

Transcripts

play00:00

Hi, I'm Gayle Laakmann McDowell, author of Cracking the Coding Interview. In this video I'm

play00:04

going to solve the shortest reach problem. In this problem we are given a

play00:08

graph and each node has a label from 0 through n minus 1, and we want to return the

play00:13

shortest path from some starting node to each of the other nodes and we're going

play00:18

to return this in an array. So breadth-first search is the perfect algorithm to start

play00:23

with here because breadth-first search is designed for finding the

play00:27

shortest path, so its perfect for that problem. Now remember in breadth first

play00:30

search that we're going to need to have some sort of queue that sort of tracks and

play00:35

maintains the next nodes to process. So let's go ahead and add that in here so

play00:40

Java we create this as a linked list, and then we initially add the root node to

play00:45

that because that's where we want to start off with.

play00:47

Now the other thing I typically use in a breadth-first search is some sort of

play00:52

boolean array that tracks what's visited. Let me go ahead and add that in

play00:58

here so thats, in this case we can just use a simple array, and that's going to

play01:01

be, of size nodes dot length. And then I also know I'm gonna have to

play01:09

have some sort of distances array in there, because that's going to be what I

play01:13

ultimately return. So I'll create as a new array, nodes dot length and I also want, if I

play01:26

don't find some node, I want to return negative 1 as the distance from the

play01:31

start ID to that node, so I'm gonna just start off with filling the entire array

play01:36

with negative 1, and then my basic process is going to be that I'm going to

play01:42

go pull off this root node so while...

play01:47

Oh, while the queue is not empty

play01:51

node, actually this is an integer, and node equals queue dot pull do some other stuff here and

play02:04

then eventually I want to go down and return distances. So let's let's go back

play02:09

to this actually. So we're going to be filling in,

play02:13

we're going to walk through our breadth first search, we're going to do this breath first

play02:15

search, and we're going to be filling in distances as we come to some particular

play02:19

node. So until we visited that node it will be negative 1 and after we

play02:24

visited it will have some value of 1 2 3 4 etc. So it's actually unnecessary

play02:29

to have a separate visited array because we can just say that visited

play02:34

essentially false if distances of that node is negative 1. So this array is

play02:40

actually unnecessary and we can just take that out. So then we're going to walk through

play02:43

each neighbor and add it to our queue. So for each neighbor in nodes of node dot

play02:53

neighbors and if it hasn't been visited yet, so remember we have to check that, so if

play03:02

distances of neighbor is still negative 1, then add it to our queue.

play03:12

Okay. But how do we ever update distances?

play03:16

Well the distance to our neighbor is just one more than the distance to us. So

play03:22

that's all we have to do. The distances of neighbor is the distances to where we

play03:29

are right now, the node plus this edge distance at right now is defined to be

play03:33

just six. And then the last thing we need to do is make sure that we have the

play03:38

right starting value for distances. So distances of the start ID, so the

play03:44

distance from myself to myself should be just 0. So now let's run the code and see

play03:50

how we did.

play03:50

Perfect. Seems to be working, so that's the basic approach to doing this problem.

play03:57

One mistake that a lot of candidates will make in this problem is

play04:01

just having way too many variables in there, so they'll throw in distances and

play04:04

they'll have a visited array also and all this other stuff. And so really helps before you

play04:10

start writing code to understand exactly what you're about to do and think about

play04:14

what's actually necessary and what's not necessary. And then obviously remember

play04:18

make sure you're picking the right algorithm to start with, so, depth-first search

play04:22

would not be a good choice for this problem. So keep that in mind, be

play04:26

comfortable doing breadth-first and depth first search, it's a really useful

play04:30

fundamental that comes up in a lot of interview questions, as well

play04:34

as occassionally real life as well. So keep that in mind and good luck.