Programming Illustration : Emergence of Connectedness
Summary
TLDRThis script explores the surprising result that a graph with n nodes becomes connected with just n log n edges when added randomly. It demonstrates through coding in Python using NetworkX how to create a graph, add nodes and random edges, and determine the number of edges needed for connectivity. The script also includes a visual comparison of the actual number of edges required against the theoretical n log n, showing that while not exact, the number of edges needed is approximately of that order, thus validating the claim.
Takeaways
- 📈 The script discusses a surprising result in graph theory, stating that a graph with n nodes can be made connected with just n log n edges.
- 💡 The concept is demonstrated with a practical experiment using a screencast to write code that tests this theory with random graphs of varying sizes.
- 🔍 The code involves creating a graph with a specified number of nodes and then adding random edges one by one until the graph becomes connected.
- 🛠️ The script includes a step-by-step guide on how to write the code in Python using the NetworkX library to handle graph operations.
- 📝 A function named 'add_nodes' is defined to add a specified number of nodes to a graph, initializing the process of making the graph connected.
- 🔗 The 'add_random_edge' function is introduced to add a single random edge to the graph, helping in the gradual connection of the graph.
- 🔄 A loop is used to continuously add random edges to the graph until the 'is_connected' function from NetworkX returns true, indicating the graph is fully connected.
- 📊 The script suggests plotting the relationship between the number of nodes and the number of edges required for connectivity to visualize the results.
- 📈 The plot shows a curve that, with averaging over multiple instances, becomes smoother and more indicative of the theoretical n log n relationship.
- 🔍 The script questions whether the number of edges required is exactly n log n, and adjusts the plot to compare the actual data with the theoretical n log n curve.
- 🧩 Finally, the script concludes that while the exact number of edges may not be n log n, it is of the order of n log n, supporting the initial claim with empirical data.
Q & A
What is the surprising result discussed in the script about graph connectivity?
-The surprising result is that on a graph with n nodes, only n log n number of edges are needed to make the graph connected, allowing a path from any node to every other node in the network.
What is the purpose of the 'connectivity.py' file in the script?
-The 'connectivity.py' file contains the code necessary for creating a graph, adding nodes, and adding random edges to the graph until it becomes connected.
What is the role of the 'add_nodes' function in the script?
-The 'add_nodes' function initializes a graph with a specified number of nodes using the NetworkX library, which is essential for setting up the graph for further operations.
How does the 'add_random_edge' function work in the script?
-The 'add_random_edge' function selects two random vertices from the graph and adds an edge between them, provided the vertices are not the same to avoid self-loops.
What is the purpose of the 'add_till_connectivity' function?
-The 'add_till_connectivity' function continuously adds random edges to the graph until the graph becomes connected, demonstrating the process of achieving graph connectivity.
How does the script verify if the graph is connected?
-The script uses the 'nx.is_connected' function from the NetworkX library to check if the graph is connected after adding edges.
What is the significance of the 'create_instance' function in the script?
-The 'create_instance' function encapsulates the process of creating a graph with a certain number of nodes and determining the number of edges required to connect the graph.
What does the 'plot_connectivity' function do in the script?
-The 'plot_connectivity' function plots the relationship between the number of nodes and the number of edges required for connectivity, providing a visual representation of the graph connectivity process.
Why are there spikes in the initial plot of the script?
-The spikes in the initial plot occur due to the randomness of the experiment. Each execution can yield slightly different results, causing fluctuations in the number of edges required for connectivity.
How does averaging the results over multiple instances affect the plot?
-Averaging the results over multiple instances smooths out the plot by reducing the impact of random fluctuations, providing a more consistent and reliable representation of the data.
What does the script conclude about the relationship between the number of nodes and edges required for connectivity?
-The script concludes that the number of edges required to connect a graph is of the order of n log n, which aligns with the initial surprising result discussed.
Outlines
😲 Introduction to Graph Connectivity Theory
This paragraph introduces a surprising graph theory result: a graph with n nodes can be made connected with just n log n edges. The speaker plans to demonstrate this through a Python screencast, using the NetworkX library to create random graphs and observe the number of edges needed for connectivity. The coding process begins with creating a 'connectivity.py' file and importing necessary packages, followed by defining a function to add a specified number of nodes to a graph.
📝 Coding the Graph Node Addition and Random Edge Functions
The speaker proceeds with the coding tutorial by defining a function to add nodes to a graph and another to add a single random edge. The 'add_nodes' function utilizes NetworkX's built-in capabilities to add a list of nodes to a graph, while the 'add_random_edge' function selects two random nodes and adds an edge between them unless they are the same, to avoid self-loops. The paragraph details the code implementation, including handling a minor spelling error, and verifying the addition of nodes and edges in the graph.
🔁 Automated Edge Addition for Graph Connectivity
The tutorial continues with the creation of a function 'add_till_connectivity' that iteratively adds random edges to a graph until it becomes connected. The process uses a while loop with NetworkX's 'is_connected' function to check for connectivity. The speaker demonstrates the function by adding edges to a graph with ten nodes and observing the number of edges required for connectivity, which in this case, is sixteen.
🌐 Generalizing the Connectivity Process for Different Node Counts
The speaker aims to generalize the connectivity process by creating a 'create_instance' function that takes the number of nodes as input and returns the number of edges needed for connectivity. This function encapsulates the previous steps, including adding nodes and edges, and is used to test different node counts, such as ten, twenty, and thirty, to observe the pattern in the number of edges required.
📈 Plotting and Analyzing Graph Connectivity Data
To visualize the relationship between the number of nodes and edges required for connectivity, the speaker introduces a 'plot_connectivity' function. This function plots the number of edges against the number of nodes for a range of node counts. The resulting plot shows a variable number of edges required, with spikes due to the random nature of the experiment. The speaker suggests averaging the results over multiple instances to smooth out the plot.
🔍 Refining the Connectivity Experiment and Comparing with n log n
The speaker refines the experiment by averaging the connectivity results over multiple instances to obtain a smoother curve. They then compare the empirical data with the theoretical n log n relationship, adjusting the plot to show n log n divided by two to better match the observed data. The adjusted plot demonstrates that the number of edges required for connectivity is of the order of n log n, confirming the initial theoretical claim.
Mindmap
Keywords
💡Graph Theory
💡Connected Graph
💡Random Edges
💡NetworkX
💡Node
💡Edge
💡Logarithmic Growth
💡Python
💡Code Execution
💡Plotting
💡Averaging
Highlights
A graph with n nodes can be made connected with just n log n edges.
Demonstration of a screencast coding session to explore graph connectivity.
Introduction to the NetworkX library for graph operations in Python.
Creating a function to add a specified number of nodes to a graph.
Explanation of how to add random edges to a graph to achieve connectivity.
Utilization of the random.choice function to select nodes for edge creation.
Ensuring no self-loops are created when adding edges between nodes.
Development of a function to add edges until the graph becomes connected.
Use of nx.is_connected to check the connectivity status of a graph.
Plotting the relationship between the number of nodes and edges for connectivity.
Observation of the graph connectivity process through random experimentation.
Averaging results over multiple instances to smooth out the connectivity curve.
Comparing the experimental connectivity curve with the theoretical n log n curve.
Adjusting the theoretical model to n log n / 2 for a closer fit to experimental data.
Final assertion that the number of edges required is of the order of n log n, confirming the initial claim.
Practical application of graph theory concepts through coding and visualization.
Importance of random experimentation in verifying theoretical predictions in graph theory.
Transcripts
We just now looked at a very surprising and interesting result which says that on a graph
if we have n nodes then one just needs n log n number of edges to make this graph connected
that is if we take n nodes and we randomly put n log n edges in this graph the graph
becomes connected so you can start at any node in this network and from this node you can find
a path to every other node in the network. Now we will be doing a screencast for it we
will be writing a piece of code in which we will take random graphs of different different
sizes and then we will look at the number of edges required to make this graph connected so
we will do this we will look at the plot lets get started let's start from very basic coding
so what we want to do is we want to have some nodes in the graph and then we add random edges
in this network one at a time and our aim is to see after how after addition of how many edges
does this network becomes connected. So we first create a file for this. So let's open a new file
in notepad lets name it as connectivity dot py. So here we have a file oh we first of all import
our basic packages which we will need import networkx as nx. See I think its sufficient for
the time being ok so what we do we first define a function to add some specified number of nodes in
the network so the function already exists in networkx you all know but just for the sake of
simplicity and clarification I will again make a function here and lets name this function as add
nodes and you pass a number n as a parameter to this function so what this function is going to
do is it will create a graph g equals two nx dot graph and add n number of nodes in
this graph so we can do g dot I hope you remember the function add nodes from so this function adds
the nodes in the graph from a list and the list we give here so the list is from number zero to
number n minus one so n nodes get added to this graph and we finally return the graph simple
function we will just keep commenting everything side by side so we comment it add n number of
nodes in the graph and return it perfect so we will save it and lets come back to our window
so what we need to do now is import this file we import connectivity. So this gets imported let's
call a function to call a function we have the syntax connectivity dot function name our function
name is add underscore nodes and we had let's say we want to add ten nodes in the graph you
execute it so we get a graph here. So our function return the graph. So let's collect the output in a
graph name g so g equals to connectivity dot add underscore nodes ten now I want to see whether
ten nodes have been actually added in the graph or not so we can see that using the function g
dot number of nodes you see that there are ten nodes in this graph next we will write a function
for adding one random edge to this network. So if you see currently in this network there are
just ten nodes there are no edges so this network is disconnected completely disconnected. So to
check that we have a command nx dot is connected. Let say you want to see whether this graph is
connected or not lets call ok so before using ok nx dot is connected parse the graph g here which
shows us false. If the graph is not connected ok now one by one we add edges to this network. So
first we write a code for adding one random edge to this network we go back to our file which was
connectivity dot py and now we add a second function here and the function is for add one
random edge in this network and one random edge. Lets name the function as add random edge and it
takes us input the graph g graph g which has been already created with no edges so we pass that
graph g actually you can pass a graph g with the edges also that's up to us so the function takes
as input one in graph and adds one extra random edge to this graph define add underscore random
edge g now we need two random vertices from the network so how do we pick a random vertex so to
put to pick a random entry we need one package name random so we have discussed about it before
in the first week what does it do we are going to use the function random dot choice first vertex
v one so this is the first random vertex which we want to pick it is random dot choice random
dot choice function takes as the input one list and outputs as one random value from that list.
So the list which we are going to pass here is g dot nodes that is the list of nodes in
the network. So v one gets first random node from this network lets choose another node
v two v two equals to another random node random dot choice and again we have here
g dot nodes we got two random vertices now, and we want to create an edge between them,
but there might be a case where v one and v two might be same nodes so we would have picked both
the random nodes to be same although thats a very rare case that might happened but we do not want
that to happen because we do not want any self loops in our graph so we just put a condition
here if v one is not equals to v two only then we add an edge that is this two vertices should be
separate so if v one is not equals to v two we go ahead and add an edge g dot add edge from v
v one to v two after adding this edge we return a graph g save it we come back to our console
lets now we not need to reload our file so we reload connectivity since we have made changes
to it reload connectivity now the spelling is correct reload connectivity ok so it is
reloaded we already have a graph here which has ten nodes so what do we do now is g equals to
connectivity dot and we call our function add random edge on the graph g itself ok seems to
be some problem so its a name error which says that global name ok there is a spelling mistake.
So let's go back and correct it if we see here yeah just small spelling error we correct it save
it come back reload it and call the function again it works fine, so we have we now one
random edge should have been added in a graph. So lets see it lets look at the list of edges in
a graph so we do g dot edges and we can see here that a random edge and edge has been
added from two vertices zero to seven so we have written the code till now for adding a specific
number of vertices in the graph and adding one random edge what is our aim our aim is to keep
adding these random edges in the network till our network becomes connected and then to look at how
many edges were these which made this network connectedfor doing that we go back to our code
and lets create another function here what is function does is it add random edges let rather
say keep keeps adding random edges in g till it becomes connected and lets name this function
as add till connectivity add till connectivity and it takes the graph g as input so what it
is going to do till the graph doesnt become sorry till the graphs become connected .
So while nx dot is connected g so this is a function in network x which returns true if the
graph is connected false if it is not connected so while n x dot is connected g is equals to false
it keeps running and what does it do it adds one random edge to the graph for graph g becomes add
random edge g so one random edge is added to this graph and it keeps adding more random edges till
the graph becomes connected and at the end lets return the graph. So lets look at what this code
is doing again we reload connectivity and now what we are going to do is g equals to connectivity dot
add till connectivity. So lets see how many how many edges it adds so we already had one edge
which we have added previously and we pass the graph g here and lets now look at g dot edges
So we see that many edges have been added to this network to be precise the number
of edges added is sixteen. So its seems like in a graph having sixteen nodes just sixteen edges
are required to make this graph connected. Lets also quickly check whether its connected or not
so nx dot is connected g we can say that it is we can see that it is connected it is true .
Till now what we have done is we have written a small piece of code which takes a graph as
input keeps adding edges to this graph till the graph becomes connected next what we want
to do is to repeat this procedure for different number of nodes so here for example in this we
did it for ten nodes and we looked at that the number of edges required was sixteen now want
to do it for twenty look at the number want to do for thirty look at the number and so on.
So for doing that first of all let's create a new function here so this function is going to
simplify what all we have written here lets name the function create creates an instance
of the entire process so what do i mean here by instance of an entire process it it takes
as input so the function will simply take as input number of nodes and returns the number
of edges required for connectivity so it will make our task easier so we will just input a
number and a number of nodes and it will output us it will tell us how many edges are required
to connect a graph having these many nodes. So let's define this function lets name it as
create instance and whatever is required inside the code for this function we have already done
that so define create underscore instance and lets pass a number and here what we do next is first
of all we have to add nodes so g equals to add underscore nodes and we add n nodes so we called
the first function which we defined previously so whatever we have done till now in i python
that only we are replicating here in the file. So we first call this function add underscore
nodes and which returns us the graph and next we can simply call g equals true keep adding till the
graph gets connected so add till connectivity we again pass here g and it will return us what it
will return us the number of edges return g dot number of edges ok lets see how does it how is
it working reload the file and lets call create underscore instance and lets pass the number ten
here ok so it is in our file connectivity dot py so connectivity dot create underscore instance
ten so we see that if there are ten nodes we need twelve edges lets make it twenty nodes we
need twenty seven edges we have thirty nodes it is fifty edges and so on and what we are interested
in i want to plot this and visualize how does this plot looks like so we just needs to we just need
to plot what we have done so for plotting it we go back to the file we can define a function here
so what is this function do plot the desire let say we know what is desired so desired is the
number of edges required for connectivity plot the desired for different number of
edges and lets define a function and lets call it plot and lets call it plot connectivity ok
lets define this function plot connectivity and it doesnt require as input anything we are not taking
any input for simply for different number of nodes will be plot it so for plotting always we need the
arrays corresponding to a two axis x axis and y axis so lets the array associated with x axis be
x and the array associated with y axis be y so we have here x and y then we take a number i equals
to ten so i what does i tells us i is the number of nodes for which we are experimenting so i is
the number of nodes while and lets lets take the number of nodes till hundred or lets rather make
it four hundred so while i is less than or equals to four hundred what is loop will do in x axis we
append the number of nodes which is nothing but i and in y axis we can append or do we append
the number of edges required to connect these many nodes which we can get from our function
create instance so y dot append create instance and we pass i here and we need lets increment i
with ten every time i equals to i plus ten. Now we need to plot this for plotting.This we
need the package matplotlib we import matplotlib dot pyplot as plt come back ok. So what do we do
plt dot plot what we want to plot is x against y and plt dot show. So lets just do a little bit
of decoration here. Before plotting it lets add some labels and the title for the curve. Let's
say plt dot x label, which is the label for the x axis is nothing but the number of nodes plt
dot y label label for the y axis is the number of edges required to connect the graph and plt dot
title. Let's give a title to this curve and lets title it as emergence of connectivity it done.
Lets look at how does this plot look like save the file we will reload it and then
we call connectivity dot plot connectivity I will take some time to execute yes so here we
can see the curve here we have the number of nodes here we have the number of edges and we
can see that as the number of nodes increases the number of edges required to connect it also
increases although we see that although we see this plot where number of nodes with number of
edges we see a lot of spikes in this curve. So why are these spike occurring? It's because
our experiment is a random experiment which gives a different output every time you
execute it. Whenever you execute a randomrandom experiment, it will give you a slightlyslightly
different outputs. So one time it can throw a value three other time five other time four,
but if you take the expected value it gives you a better estimate. So to remove this spikes in the
plot instead of taking one value corresponding to one instance we might want to average it out
so lets how can we do that we go back so what are we doing here is corresponding to one value
of I that is corresponding to one particular instance given a particular number of nodes
stand here we execute this code only one time. So we take a graph having ten nodes we will look
at how many edges are required to connect it. And then we return the number of edges what will give
us a better result. If we do it multiple times so we take a graph having ten nodes then look at the
number of edges we again do this again do this do it ten times and take an average over all those
experiments that will give us a better estimate so lets try doing that so doing that is very easy
so in this create instance so create underscore instances for one instance you give it one number
of nodes and it gives you number of edges. Let's average it over some particular number of
instances lets average it over four instances. So n, see what I am going to do here is we define a
new function n and let's name it create average instance n and what it does is, it call the
function create underscore instance four times and takes an average over them. So we create a
list here list one and for i in range zero to four so its repeated four times what does it do is list
one dot append what does it append to list one is create underscore instance n. So it appends
it to list and what it returns is the average so it return numpy dot average of list one.
so we have yet node imported numpy so lets import numpy here import numpy so its returns
numpy dot average list one so what we are going to do iswhile plotting in plotting connectivity
instead of using creating underscore instance now we create we use create underscore average
underscore instance so the value is averaged over four particular instances now lets look
at the change in the curve it creates we reload connectivity and we again call this function
so since it is doing it four times, it takes some time, and you can see you can say that
the difference in two curves is visible this curve is more smoother because we have averaged out the
values over four instances where it was only one instance in this case lets play around with it a
little bit further and lets change this four to let say ten we do an average over ten instances
we make this four as ten we reload it and we call the function again so let's wait for the output.
So again you can see a visible change in the curve it gets all the more smooth so you can
change this value from ten to twenty to thirty two hundred and the curve will become more and
more smooth. So we can try it for higher values now the final and the biggest question what was
our claim was that if there are n nodes in the network, it takes just n log n number of edges
to connect this network. The question is are these number of edges n log n? So if I look at
this curve is it the curve of n log n oh lets write a piece of code and check that as well.
So what I am going to do is I am going to draw this plot. So I am going to have two plots in
the same figure. One plot is the plot which we have achieved from our code which is the actual
data and lets plot it against n log n and look at what is the result. So I want a very smooth
curve so for a very smooth curve what I do is I average it to hundred instances. So I average it
to hundred instances, and now I want to plot it against the n log n curve so what I do is
after doing this before plotting I do exactly the same procedure which is written above,
but here I am going to plot n log n. So whatever I am going to do here is simply a
replication of the above code with a little bit of change and you can write it as a separate function
as well and I am writing it here so I take a new array x one and a new array y one and then again
I set i equals to ten which is the number of nodes and while i is less than or equals
to four hundred what I do is x dot append x means simply i append i y dot append is the curve i want
and the curve which i want is i multiplied by log i ok sorry its x one dot append and its y one dot
append i into numpy dot log i and then i increment i with ten and i do plt dot plot x one y one so
its going to show me two plots in the same figure one is for xy which is the actual data number of
nodes and the number of edges required to connect them and second is the x one y one which tells me
a number on the x axis and if the number is i it tells me i log i on the y axis and again
reload the file and lets run it and lets give it some time to execute since it is averaging over
hundred instances it will take some decent amount of time ok the code here ends within a error and
next again a spelling mistake lets go back and correct it so just correct it here and what
we do here isI have added an extra statement here print i which keeps showing us how many
iterations has the code complete it. So for the time being let's make
this i two hundred since for four hundred it takes a lot of time so let's see how
does it work for two hundred but if we see a code below. So for the second plot it is
still plotting till the till four hundred nodes. So we also change this and make it
here two hundred and now we just save this code and let's run it so let's
reload our file we call a function connectivity dot plot connectivity.
So here you can see two lines in this plot one is the blue line which corresponds to our data where
we took different different number of nodes and looked at the number of edges required to
connect the corresponding network and the red line it corresponds to the n log n plot so the
linesare not totally overlapping in this case. So here I would like to remind you that according to
the analysis that we have done the number of edges required to connect the graph was not
exactly n log n it is of the order of n log n. So let's do a small a little bit of tweak in our
code lets go back to a notepad file and what I am going to do here is instead of plotting n
log n. What I plot here is n log n divided by two so please note that even if I divide this
term n log n by two in the final analysis the number of edges required to connect the
network still remains of the order of n log n so lets try doing that. So what I do here is
I divide the term log n by two and let's see how our plot changes. Let's plot it.
So here you can see that both the curves become very close to each other. Establishing the fact
that its essentially n log n by two number of edges; almost of the order of n log n which are
required to make our graph connecting hence satisfying a previous claim.
Ver Más Videos Relacionados
Introduction : Emergence of Connectedness
Algorithm Design | Approximation Algorithm | Vertex Cover Problem #algorithm #approximation
Projeto e Análise de Algoritmos - Aula 09 - Limite inferior para o problema de ordenação
3.5 Prims and Kruskals Algorithms - Greedy Method
The Lazy Way to Cut Pizza - Numberphile
AQA A’Level Graphs
5.0 / 5 (0 votes)