Datasets: Analysing Using Networkx

Social Networks
30 Jul 201737:13

Summary

TLDRThis tutorial video script guides viewers on analyzing various network datasets using the Python package NetworkX. It covers the process of reading different network formats like gexf, edgelist, dot net, gml, pajek, and graphml into NetworkX graph objects. The script demonstrates basic network analysis, including obtaining network information, visualizing networks with matplotlib, and exploring properties like degree distribution, density, clustering coefficient, and diameter. The goal is to provide a comprehensive understanding of network analysis techniques using NetworkX.

Takeaways

  • 😀 The video demonstrates how to analyze various network datasets using the Python package 'networkx'.
  • 📁 It covers six different network datasets in formats such as gexf, edgelist, dot (equivalent to pajek), gml, and graphml.
  • 🛠️ The tutorial starts by showing how to import the 'networkx' package along with 'matplotlib' for network visualization.
  • 📚 The first dataset analyzed is a Facebook network in edgelist format, using the 'read_edgelist' function from 'networkx'.
  • 🔍 Basic network information like the number of nodes, edges, and whether the network is directed or not, is obtained using the 'info' function.
  • 📈 The script explains how to read different network formats into a 'networkx' graph object using specific functions like 'read_pajek' for dot.net format.
  • 🌐 Visualization of networks is showcased using the 'draw' function from 'networkx' and 'show' from 'matplotlib'.
  • 📊 The video describes how to plot the degree distribution of a network, indicating the number of nodes with a particular degree.
  • 📉 A log-log plot of the degree distribution is introduced to identify if the network follows a power law distribution.
  • 🔢 The concept of network density is explained, which helps determine if a network is sparse or dense based on the ratio of actual to possible edges.
  • 💡 The clustering coefficient is discussed, illustrating how to calculate it for nodes and find the average clustering coefficient for the entire network.
  • 📏 The diameter of a network, which is the longest shortest path between any two nodes, is calculated to understand network connectivity.

Q & A

  • What is the main topic of the video?

    -The main topic of the video is analyzing different network datasets using the NetworkX package in Python.

  • How many network datasets were downloaded in the previous video?

    -Six network datasets were downloaded in the previous video.

  • What are the different formats of the network datasets mentioned in the video?

    -The different formats mentioned are gexf, edge list, dot net (equivalent to pajek), gml, pajek, and graphml.

  • Which software is used for editing the Python file in the video?

    -Sublime Text is used for editing the Python file in the video.

  • What function from the NetworkX package is used to read an edgelist format network?

    -The function used to read an edgelist format network is `nx.read_edgelist`.

  • What basic information does the 'info' function from NetworkX provide about a network?

    -The 'info' function provides basic details such as the number of nodes, number of edges, and the average degree of the network.

  • How can one determine if a network is directed or not using NetworkX?

    -One can determine if a network is directed by using the function `nx.is_directed` and passing the graph object as a parameter.

  • What is the purpose of the 'read_pajek' function in NetworkX?

    -The 'read_pajek' function is used to read networks in dot net or dot paj file formats, which are equivalent to pajek format, into a NetworkX graph object.

  • How can the 'draw' function visualize a network in NetworkX?

    -The 'draw' function visualizes a network by plotting it. It requires the graph object as a parameter and uses matplotlib's 'show' function to display the graph.

  • What is a degree distribution in the context of network analysis?

    -Degree distribution refers to the measure that shows the number of nodes in a network that have a particular degree, providing insight into how connectivity is distributed among the nodes.

  • How can one plot the degree distribution of a network in NetworkX?

    -One can plot the degree distribution by first obtaining the degrees of all nodes, then counting the occurrences of each degree, and finally using matplotlib to create a plot with unique degrees on the x-axis and the count of nodes for each degree on the y-axis.

  • What does a log-log plot of degree distribution indicate about a network?

    -A log-log plot of degree distribution can indicate if a network follows a power law distribution. If the plot forms a straight line, it suggests that the network has a power law degree distribution, meaning a few nodes have very high degrees while most have very low degrees.

  • What is the significance of the clustering coefficient in network analysis?

    -The clustering coefficient indicates the degree to which nodes in a network cluster together. It measures the likelihood that two nodes connected to a common node are also connected to each other, reflecting the presence of cliques or groups within the network.

  • How is the diameter of a network calculated?

    -The diameter of a network is calculated as the length of the longest shortest path between any two nodes in the network, essentially representing the greatest distance one must travel to reach any node from another.

  • What does the density value of a graph represent?

    -The density value of a graph represents the ratio of the actual number of edges to the maximum number of possible edges in the graph, indicating how sparse or dense the graph is.

Outlines

00:00

📚 Introduction to Network Analysis with NetworkX

The speaker introduces the topic of network analysis using the Python package NetworkX. They discuss the variety of network datasets downloaded in the previous video, including formats such as gexf, edge list, dot (equivalent to pajek), gml, and graphml. The speaker outlines the process of analyzing these datasets using NetworkX and matplotlib for visualization, starting with the Facebook combined edgelist network, and demonstrates the use of 'read_edgelist' function to convert the network into a graph object. Basic network information such as the number of nodes and edges, and the type of graph (directed or undirected) is extracted using the 'info' function.

05:07

🔍 Exploring Different Network Formats with NetworkX

This paragraph delves into the analysis of different network formats using NetworkX. The speaker explains how to read networks in dot (pajek) format using the 'read_pajek' function, and discusses the properties of the football network, such as being a multi-digraph with directed edges. The speaker also covers the analysis of gml and pajek formatted networks, emphasizing the multi-graph nature and the average degree. The paragraph concludes with the examination of graphml and gexf formats, highlighting the directed nature of the Wikipedia network and the use of 'read_graphml' and 'read_gexf' functions to convert these networks into graph objects for further analysis.

10:07

🖼️ Visualizing Networks in NetworkX

The speaker introduces the concept of visualizing networks using NetworkX. They demonstrate how to use the 'draw' function along with matplotlib's 'show' function to visualize the karate network in gml format. The paragraph showcases the interactive features of the visualization, such as moving and zooming into the graph. The speaker also discusses different layout options available in NetworkX, such as circular, spectral, and spring layouts, and how they can be applied to visualize the network differently. The paragraph concludes with a brief mention of saving the visualized graph as an image file.

15:10

📊 Analyzing Degree Distribution in Networks

The speaker discusses the concept of degree distribution in networks, explaining how it represents the number of nodes with a specific degree. They provide a step-by-step explanation of how to calculate the degree distribution for the karate network using NetworkX. This includes using the 'degree' function to obtain a dictionary of node degrees, converting this into a set to find unique degrees, and then counting the occurrences of each degree to create a distribution list. The speaker also describes how to plot this distribution using matplotlib and suggests that real-world networks often exhibit a power-law degree distribution, which is illustrated in the plot of the karate network.

20:17

📉 Examining Network Density and Properties

The speaker explores the concept of network density, which indicates whether a network is sparse or dense by comparing the actual number of edges to the maximum possible number of edges. They explain how to calculate the density using the 'density' function in NetworkX and provide examples, including a complete graph and an empty graph. The speaker also touches on additional network properties such as the clustering coefficient, which measures the closeness of connections between neighbors of a node, and the diameter, which is the longest shortest path between any two nodes in the network. The paragraph concludes with a brief discussion on how to calculate these properties using NetworkX.

25:17

🔬 Further Network Analysis Techniques

In this paragraph, the speaker continues the discussion on network analysis, focusing on the calculation of the clustering coefficient for individual nodes and the average clustering coefficient for the entire network. They demonstrate how to use the 'clustering' function in NetworkX to obtain these values and emphasize the significance of the clustering coefficient in understanding the interconnectedness within a network. The speaker also revisits the concept of network diameter, explaining its importance in gauging the overall connectivity of a network. The paragraph concludes with a practical example of how to determine the diameter of a network using NetworkX, highlighting the efficiency of real-world networks in reducing the distance between nodes.

Mindmap

Keywords

💡NetworkX

NetworkX is a Python package used for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. In the video, it is the primary tool used to analyze various network datasets, demonstrating its functions for reading different file formats and visualizing networks.

💡Datasets

Datasets refer to a collection of data, often used for analysis. In the context of the video, the datasets are different networks in various formats such as gexf, edgelist, dot net, gml, pajek, and graphml, which are to be analyzed using NetworkX.

💡Edgelist Format

Edgelist format is a way of representing a graph, where each edge (or connection between nodes) is listed explicitly. The video describes using the 'read_edgelist' function from NetworkX to import a Facebook network dataset in this format.

💡Graph Object

A graph object in NetworkX is a data structure that represents a graph in memory, allowing for the application of various graph-theoretic functions and algorithms. The video explains how different network datasets are converted into graph objects to facilitate analysis.

💡Visualization

Visualization in the context of the video refers to the graphical representation of networks. NetworkX, along with matplotlib, is used to draw networks, providing a visual understanding of their structure and relationships.

💡Degree Distribution

Degree distribution is a measure of the variation in the number of connections (or degrees) that nodes in a network have. The video discusses how to calculate and plot the degree distribution of a network, illustrating the concept with the karate club network.

💡Density

Density in graph theory is a measure of how many potential edges are present in the graph compared to the maximum number of edges a graph of the same order (number of nodes) could have. The video explains how to determine if a graph is sparse or dense by calculating its density value.

💡Clustering Coefficient

The clustering coefficient is a measure of the degree to which nodes in a graph tend to cluster together. In the video, it is used to assess the likelihood that friends of a node are also friends of each other within a network.

💡Diameter

The diameter of a network is the longest shortest path between any two nodes in the network. It is a measure of how 'spread out' the network is. The video explains the concept and shows how to calculate the diameter of a network using NetworkX.

💡GraphML

GraphML is an XML-based file format for graphs, which allows for the representation of complex network structures. In the video, one of the datasets is in GraphML format, and the 'read_graphml' function from NetworkX is used to analyze it.

💡GEXF

GEXF (Graph Exchange XML Format) is a language-independent, flexible, and extensible file format for complex network structures. The video mentions using the 'read_gexf' function to import a GEXF formatted network into a NetworkX graph object.

Highlights

Introduction to analyzing network datasets using the networkx package in Python.

Overview of six different network datasets in various formats: gexf, edgelist, dot net, gml, pajek, and graphml.

Demonstration of creating a Python file for network analysis and importing necessary libraries.

Using the read_edgelist function from networkx to analyze the Facebook network in edgelist format.

Explanation of the nx.info function to provide basic details about the network, such as number of nodes and edges.

Methods to determine if a network is directed or not using nx.is_directed function.

Conversion of dot net format networks, equivalent to Pajek format, into networkx objects using read_pajek function.

Reading GML and Pajek formatted networks with the same read_pajek function.

Utilizing read_graphml function to handle and analyze Wikipedia network in graphml format.

Reading gexf formatted networks into a graph object using the read_gexf function.

Visualization of networks using nx.draw and plt.show functions from networkx and matplotlib.

Introduction of different network layouts in networkx, such as circular, spectral, and spring layouts.

Analysis of degree distribution in networks to understand the connectivity of nodes.

Explanation of how to plot degree distribution and the significance of power law degree distribution in real-world networks.

Investigation of network density to determine if a graph is sparse or dense.

Calculation of clustering coefficients to measure the degree of clustering or community structure within a network.

Determination of network diameter to understand the longest shortest path between any two nodes.

Practical applications and flexibility of networkx for various network analysis tasks.

Transcripts

play00:04

Hey everyone! In the previous video we had  downloaded a number of network datasets in  

play00:10

different formats in this video we are going  to see how we can analyze them using networkx  

play00:15

package of python. Let's take a look at the  datasets that we have downloaded , so these are  

play00:26

the datasets that we downloaded in the previous  video so we had six networks we have a network in  

play00:31

gexf format we also had a Facebook network in edge  list format we had a football network in dot net  

play00:38

format which it isequivalent to pajek format and  we hadkarate club network in gml format we also  

play00:46

had karate network in pajek formatand we also  had a Wikipedia network in graphml format.  

play00:52

So these were the sixnetwork datasets that we  had now we going to see how we can analyze them  

play00:59

so I have all thedatasets in this folder I am  going to create a new python file here where I  

play01:07

will be writing all the four so data sets dot  py I am going to open in it in an editor I am  

play01:17

using sublime text here can use any editor for  that matter ok since we are going to make use of  

play01:24

networkx package and I am going import it we are  also going to visualize the networks so i am going  

play01:35

to import matplotlib as well. Now lets take the  first network in this folder we haveletsmake use  

play01:53

of this facebook combined dot txt network which  is in edgelist format let me copy this name  

play01:59

So now since the facebook network is in edgelist  format the function that we are going to make use  

play02:06

of is read edgelist so I am going to write g is  equal to nx dot read edgelist and here I am going  

play02:16

to get the name of the network nowour datasets  are kept in a folder I am sorry. I think thats  

play02:28

the name ok so in that we have this is the name of  the network so theyou see the function that we are  

play02:38

using is read edge list function which is present  in networkx package and as a parameter we are  

play02:44

giving the name of the network now this function  basically takes the network in a in the edgelist  

play02:53

format and returns a graph object and then we  can apply any function on this graph object.  

play02:59

For example if you want to look at basic  information of this network we can write  

play03:05

nx dot info on as a parameter here give g . So  info is function which provides a basicdetail as  

play03:12

to the number ofnodes number of edges etcetera  about to graph so let's save this file , and I  

play03:20

am going to open my terminal here I am going  to run this file. So all right so here you  

play03:37

seefirstly it tells us the type the type is  graph as in it basically tells us whether it  

play03:44

isdigraph or directed graph or it is multigraph  and it tells us the number of nodes it also tells  

play03:51

us the number of edges and the average degree. So these are just a basic details about the graph.  

play03:56

Now let's go back to our file andadd some more  things so this is what the basic things. If you  

play04:03

just want to get the number of nodes you can write  , so now nx dot number of nodes is a function  

play04:16

which returns you the number of nodes if you if  you have tojustget the number of nodes you can  

play04:22

use this function and similarly if you want the  number of edges if you don't want all the other  

play04:29

information you can use this and in case you want  toknow whether the graph is directed or not then  

play04:37

you can make use of this function print n x dot is  directed so as a parameter you pass the graph  

play04:45

So this should tell us whether the graph is  directed or not lets go back and try to run  

play04:53

this so here you see after the basic statistics  it told us the number of nodes and edges and  

play04:59

it's false that is it is non directed so this  facebook network that was a friendship network  

play05:06

its basically undirected network so that was  abouthow we can we read and edge list network  

play05:13

intonetworkx object now let us see the other  kinds of networks that we have we also have  

play05:19

here dot net format as I told you in the previous  video that dot net format is basically the pajek  

play05:25

format so in order to read this network into  the networkx object we will usepajek functions.  

play05:33

So let me show you how to do it so what I am  going to do is i am going to use this function  

play05:41

read pajek which is a function that is used  to read a dot net or dot paj file so let me  

play05:49

check the name of thenetwork football dot net  so change it ok so so I am reading this dot net  

play06:00

network through this function read pajek into  a graph object g and then applying all these  

play06:06

operations lets see and we run this . So here you  see that the type of graph is multi digraph. That  

play06:14

means there are multiple edges between the nodes  and its also directed graph and it's telling us a  

play06:20

number of nodes number of edges and since it is  a directed graphit's telling us the average in  

play06:27

degree and average out degree. And after that  again it's giving the result of the functions  

play06:33

number of number of nodes number of edges and  its true which means it is a directed graph.  

play06:37

So this these are just the basic functions let's  see what other kinds ofnetworks do we have we have  

play06:44

a gml network we have a pajek format as well ok  ah. Let me show you that for reading the pajek  

play06:52

files as well you use the same function that is  read pajek ok. So the network name is karate dot  

play07:06

paj so I will replace this ok. So when I run  this i am getting that this is a multi graph  

play07:19

ok and the number of nodes are thirty four  number of edges is seventy eight and this  

play07:25

is a average degree and its not a directed  graph so its so I getting false here ok.  

play07:34

Now two more network formats that we have  are graphml and gexf. So let me show you  

play07:40

quickly show you how we can read them as  well ok so graphml ah. How do we handle it?  

play07:47

We so let me ok the function that we use it  read graphml and the file name is wikipedia  

play08:03

dot graphml so this is how we will use it ok  since this is written information. I am going  

play08:13

to I am going to comment this ok. Now let me  run this all right so what we getting here is  

play08:24

that its a digraph wikipedia because this isgraph  betweenthat the nodes are the articlesso ah. Is it  

play08:33

basically tells us whether an article is referred  to in the other article or not. So its basically  

play08:38

a directed graph nine twenty one are the number  of nodes number of edges are given and since its  

play08:43

directed we are getting in degree as well as the  out degree the average in degree and out degree  

play08:48

and it's true that means it a directed graph. So let's go back here and the only format that  

play08:55

is left is g e x f let me quickly show you how  to convert into graph object as well and copying  

play09:00

its name and here lets go back ok [sin/since]  since this is a g fg e x f format will make use  

play09:15

of function read g e x f and what's name of the  function let me rename this so that it doesn't  

play09:41

create any problem and let me rename this is  well ok. So it should work let's see so this  

play10:01

is how we canread various networks in different  formats and convert them into a graph object  

play10:06

once they are converted into graph object we can  apply various functions on them and we can play  

play10:11

around with them now let also show you how we  can visualize a network in networkx package.  

play10:17

So let me take a small network let me  takekarate network itself andwe had it in g m  

play10:27

l format. [vocalize-noise] I thing we have in that  video ok we can quickly add it so the function is  

play10:36

read g m l so if you have a graph which is in g  m l format the function that you will use is n  

play10:44

x dot read g m l so ok all right. So we read  a karatenetwork which was in g m l format we  

play11:01

made use in function n x dot read g m l. So let's  execute this program ok so this is a simple graph,  

play11:11

and this is a number of nodes and edges and it is  an undirected graph now let me show you how we can  

play11:18

visualize this graph I am going to comment  this i am going to comment this as well.  

play11:22

Now the function that we used tovisualize the  graph is n x dot draw and the parameter we will  

play11:31

do the graph that we have to draw in order to  see that graph we have to use this function p l  

play11:38

t dot show basically the show function which is  available in matplotlib so that is how we will  

play11:44

be able to see this graph ok now let me run this  all right so this is all graph the karatenetwork  

play12:02

and the labels are given here let me also show  you a few features that this interface provides  

play12:10

for example thiswhen you click this you can  just move the graphthe way you want and the  

play12:19

next option is zoom to rectangle so when you  click that if you want to carefully observed  

play12:24

some part of the graph you can just zoom it and  see and if you further want to do that you can do  

play12:30

the way you want and then you can go back then  you go then go back so these are few function  

play12:35

that few feature that you can make use of and  this is configures subplots this will be used  

play12:41

when we plot something this is just a graph i  will show you the functionality of this later  

play12:46

then this is how you can save the figure ok let me close this window now let me go back to  

play12:52

the program let me also show you how a directed  graph in networkx looks like since this karate is  

play13:00

in undirected graph let me comment this and  if I am not wrong this football network was  

play13:06

a directed network , so I am doing the same  analysis on football dot net let me execute  

play13:13

it again you see this is howdirected graph and  networkx looks like soyou see the the arrows  

play13:21

are represented like this if you want to zoom  it again you can make use ofthis feature.  

play13:27

you want to zoom this area you can further zoom  it is well so this is how you can closely analyze  

play13:34

whatever you want in the graph right , and you  can then go back, or you can just press home  

play13:40

here ok then you can save the graph is well ok  let me close this now ok I am going to continue  

play13:49

the rest of the analysis on karate networks  so I am going to on comment object back let  

play13:57

me also show you that there are different  layouts which are available in networkx for  

play14:01

example we have as if now use this function nx  dot draw we can use another function if you want  

play14:07

to different layout so we can use n x dot draw  circular ok so this is one of the layouts there  

play14:19

are various layouts available let me show  you the output that we get in this case.  

play14:23

So here the all the nodes are arranged along  a circle and the edges between them are shown  

play14:31

like this so this iscalled circular layout.  There are number of other layout as well for  

play14:36

example spectral layout and spring layout so  you canjust read the documentation about them  

play14:44

so we have visualized the network. Let's close  this and lets go back to our program andwe can  

play14:51

do some basic analysis on this one of the  thing that we can check on this network is  

play14:56

we can check the degree distribution. Now what is degree distribution degree  

play15:04

distribution basically tells us how many nodes are  there in the network that have a particular degree  

play15:10

so this is done for all possibledegrees that  a node can have in the graph a let's take this  

play15:16

example graph so herehow many nodes are having  degree one sonode number four and node number  

play15:22

nine they are having degree one so corresponding  to one will have two and similarly we can check  

play15:29

how many nodes are having degree two how many  nodes are degree three four and five so these  

play15:35

are all the possibledegrees that nodes can  have in this graph andcorresponding to the  

play15:41

degree we have the number of nodes that havethat  particular degree so this basically is called the  

play15:47

degree distributionof the nodes in a graph nowit  is alwaysnice idea to plot the degree distribution  

play15:55

to get a better idea of the graph . So when we plot this we get this kind  

play16:00

of distribution so on the x axis we maintain the  degree and on the y axis we maintain the number of  

play16:07

nodes that have that particular degree so this is  the kind of plot that we get for thisexample graph  

play16:13

we can checkfor ourdatasets what kind of degree  distribution to theexhibit so lets go back to our  

play16:21

program and lets try to check what kind of degree  distribution this karate network has for that  

play16:28

purpose I am going to create a function,to plot  the degree distribution of this graph g so let  

play16:34

me create a function here ok before weimplement  function I want to show you a few things on the  

play16:48

ipython console so let me copy this and let me  open the ipython console here so basically, I am I  

play17:08

just copied those first two statements here  and let me also copied this statement here ok  

play17:20

so we have thekaratenetwork in the object g  now if I want to see the degree of each node  

play17:27

in this graph I can make use of this function  nx dot degree g so what this function doesthis  

play17:36

degree function it basically returns a dictionary  where the key is the number of the node and the  

play17:43

value is the degree of that node so here we get  the dictionary for all thirty four nodes we are  

play17:49

we are getting the degrees of these nodes now  lets go one step back andrecall what is our aim  

play17:55

here. Our aim is tohave the degree distribution  of nodes in the graph ( Refer Time: 18:00)  

play17:59

So basically we want that for a particular degree  how many nodes are there in the network that are  

play18:06

have in that degree so we basically first of all  want to get the possible degrees that the nodes  

play18:12

are having in this network so we are getting  the dictionary here where we are getting all  

play18:19

the possible values of the degrees thats in that  the nodes can have so what we are interested here  

play18:24

in is basically the values so what i can write  is nx dot degree g so this is going to give me  

play18:33

a dictionary all i am interested n is the value so  i am going to write dot values here so it should  

play18:41

return me earliest which is having all the values  so what is get here is basically all the possible  

play18:48

degrees that the nodes are having now my aim  is get the possible degrees that the nodes can  

play18:53

have here you see there are lot of reputation so  I want to get the unique degrees so in that case  

play19:01

what I can do is I can just write all this inside  of function set so what it will do is it will  

play19:11

convert the output into a set and we know that in  set there cannot be in reputations so what we get  

play19:19

is the unique values basically the unique degrees  that the nodes can have in this particular network  

play19:26

now a list is more flexible data structure as  compared to set because we can perform lot of  

play19:32

operation so i can further convert this intolist  if i want this is basically up to you how or you  

play19:42

want to handle the data so i convert this into a  list now what i get finally is a list of all the  

play19:48

unique valuesof degrees that the nodes can have  in this network so i show you here so that we can  

play19:55

seewhats whats going on in the function now lets  get back and use these functions inin the function  

play20:03

that we are creating so firstly I want all the  degrees so I will write nx dot degree gI want  

play20:16

all the values only I dont want the dictionary  so I will write dot values so here I get all the  

play20:24

degrees let me comment all the degrees . Now I am also interesting getting all the  

play20:34

unique degrees so that I can see what is the  possibledegree values so what I am going to dothe  

play20:41

same thing that I did on the console I am going to  passall degrees here ok so here I will get all the  

play20:51

unique degrees now to get the degree distribution  what I basically want is I want to out of all the  

play21:04

values that are there in this list unique degrees  I want to see how many nodes are having that  

play21:09

particular degree so basically what I will do is  I will fetch one element out of this list that is  

play21:15

unique degrees. I will see how many occurrences  of that value are there in all degrees right.  

play21:22

So probably I can start for loop here, so I will  write for I in unique degrees sorry for I so what  

play21:38

I want to check is how many occurrences of I  are available in all degrees so I can start a  

play21:44

variable probably x is equal to all degrees  dot count, so we have this function count  

play21:51

in a list which tells us the occurrence number of  occurrences of a particular element in that list,  

play21:56

so x will be telling us the number of occurrences  of the degree I in the list all degrees soprobably  

play22:05

we can keep a track of all these values so  we can create another list count of degrees  

play22:13

I will I started a empty list and , so I am going  to append the x values to this list so basically  

play22:29

the occurrences of the first first degree  are now stored in the list count of degrees.  

play22:35

So after this for loop is finished we will have  all thedegree distributions in thislist that is  

play22:43

called count of degrees and then we can plotted  at so lets try plotting is it so p l t dot plot as  

play22:50

you might be knowing there there aretwo parameters  that have to be passed here so on the x axis we  

play22:56

want all the unique degrees and on the y axis  we want have many nodes have have that degree  

play23:02

which we have stored in count of degrees so will  pass that here so let lets try plotting it so i  

play23:10

will write p l t dot show ok we havent call this  function x so let me call this function here and  

play23:20

comment this so I will call this function plot  degree distribution and i will pass this g here  

play23:30

ok lets go back herenot here will go ok will try  running this ok so this is the degree distribution  

play23:45

that we are getting for the karate clubah network So this is the x axis where the possible degrees  

play23:52

are there and on the y axis the number of nodes  having that degree are there and we you can just  

play23:57

again play around with this plot. As I told  you you can justuse this to move the plot  

play24:03

you can use this to zoom a particular  part, and you can just go back as well,  

play24:09

and this is aah feature that you can use here  ah. Basically you can just increase ordecrease  

play24:16

the x and ymargins so this is up to you whatever  way you want it , and you can always reset it is  

play24:26

well solet me close it so this is basically up to  you how you can how you want to visualize it ok  

play24:32

let be close it there was no x and y axis as you  see so we cando that so let me just quickly do  

play24:39

thatbefore that you can also change the way this  plot isappearing so if I if I put these dots you  

play24:53

will get the plot inin the form of these dots ok. So you get yellow dots here , and you can also put  

play25:02

line here and dots is well. So we will get  both the things. So this is plot that you  

play25:09

gettingah one thing to observe here is that  most of the real worldnetworks exhibit power  

play25:17

law degree distribution which means that there  are very few nodes which have very high degree  

play25:22

and there are lot of nodes which have very  less degree. So the same is being followed  

play25:28

in this small network as wellapart from one  exception of this node sothat might happen in  

play25:36

some cases but in general real world networks  have this power law degree distribution ok.  

play25:41

So lets go back and add a few more things may be  Lets add the x label so lets add degrees here and  

play25:53

then we can add the y label as well may be number  of nodes and maybe we can add the title as welland  

play26:07

degree distribution of karate network so lets sum  this ok you get the title and the x and y labels  

play26:22

as well soyou can justuse more features more  functions and and decorate this plot the way  

play26:29

you want theres one more thingthat usually is  done in case of power law degree distribution  

play26:35

we can also check the log log plot of that. So how can wedo that we can just replace this  

play26:42

plot by log log so in that case it will give  as alog log plot which basically means in take  

play26:49

the log of x axis it take thelog of y axis and  ifif a network is following complete power log  

play26:56

the log log plot should be in a straight line so  lets see what kind of plot we get in this case.  

play27:02

So here firstly we had section here secondlyit  wasnt perfect power law so we have this kind  

play27:10

ofline which is not exactly straight so this is  the kind of log log plot that we getting in this  

play27:16

case so this one about the degree distribution  nowI am closing it lets go back here and seesome  

play27:22

more properties that we can analyze on thisnetwork  we done with degree distribution lets go ahead.  

play27:28

So nextthing that we can check is density density  value ofgraph basically tells us whether it  

play27:36

is a sparse graph or it is a dense graph with  respect to the number of edges present so if  

play27:42

there are n nodes in a network the total possible  edges that that network can have will be n choose  

play27:49

to out of these n choose to edges how many what is  the fraction of the edges that are present in the  

play27:55

graph is basicallywhat is stored by the density  value so if it is a simple graph the density  

play28:01

value between will be between zero and one and if  it is empty graph the density will be zero if it  

play28:07

iscomplete graph density will be one however if  it is a multi graph wheremore than one edges are  

play28:13

allowed between two nodes in that case density  value will be more than one can be more one for  

play28:20

example in in this diagram you seesimple graph  with nine nodes so the total possible edges that  

play28:26

can be there in in this network will be nine  choose to which is nine into eight divided by  

play28:31

two that is six thirty six now the number of edges  that are present actually in this graph are eleven  

play28:37

so the density is going to be eleven divided by  thirty six and that is equal to point three one so  

play28:44

the graph is not very denseso this this is the  kind of indication that the density valuegives us  

play28:51

Sowe can go back to the console and seethe density  value forfewnetworks for example let me go here  

play29:00

and let's create, lets create a complete graph. So  I am going to write g is equal to nx dot complete  

play29:12

graph of say hundred nodes ok sosince it is  a complete graph what should be the density  

play29:20

value lets check nx dot density. So this is the  function density which is available in networkx  

play29:27

which will give us the density value obviously in  case of complete graph it is going to be one. Let  

play29:34

me now createin other graph let me repeat n t  nx dot graph, so I havent added an aedges into  

play29:45

it let me add few nodes here , so I am going to  pass a list , so I am passing only four nodes.  

play29:58

ah lets check the density value of this  network so since we have not added any  

play30:07

edge obviously the density value wasgoing to  be zero and let's go back here andsee what is  

play30:14

going to be the density value for arenetwork  karate network so what I will write is print  

play30:22

density is , so nx dot density sorry and I am  to pass g here let's go backcheck here density  

play30:42

is point one three nine so basically its sort  of[fas/parts] parts graphsothat we can check, so  

play30:49

that is about the densitylet's go back and seefew  morethings that we can check on these networks.  

play30:55

Nextyou can see clustering coefficient so for  a given node clustering coefficient basically  

play31:02

tells us the number of lengths that are present  amongst the neighbours of this node with respect  

play31:08

to the total number of lengths that can be  possibleI will show youusing examplelet's  

play31:15

take this networklet's try to find out the  clustering coefficient for this nodes six so  

play31:21

as you can see this node asfive neighbours write  these are one two three five and eight so there  

play31:28

are five neighbours so what we have to check  here is the number of links that are present  

play31:34

amongst these neighbours that is the number of  links amongst one two three five and eight  

play31:40

as you can see we see only one linkthat is  there between two and three there is no other  

play31:47

link present amongst these neighbours soon the  numerator we will put one and on the denominator,  

play31:53

we will put the total possible links that that can  be there amongst thesefive nodes so amongst five  

play32:01

nodes total possible links can be five choose  to which is five in to four divided in two  

play32:07

that that is ten so, in this case, the clustering  coefficient of this nodes six will be one divided  

play32:13

by ten so in case of friendship network this  clustering coefficient basically tells us how  

play32:19

closely net the friends ofparticular node areso  we can calculate the clustering coefficient value  

play32:26

for every node and then we can find the average  as well so average clustering coefficientfor  

play32:30

they tells us thethe amount ofclustering  present amongst the nodes in the graph.  

play32:38

So let's go back and try to check the same for our  network so I am going to comment this the function  

play32:49

that we can use for finding the clustering is nx  dot clustering however this function basically  

play32:56

returns a dictionary which gives the clustering  coefficient value for every node so you can always  

play33:02

iterate over this dictionary so what i will do is  for i in n x dot clustering gso i am interested  

play33:15

in all the items so i will write dot items i  want to print i so i am going to comment this  

play33:24

so what we are doing here isthis n x dot  clustering is returning a dictionary which  

play33:29

contains the clustering coefficient values  for all the nodes we are just going toprint  

play33:35

them so lets run this so we getting this  dictionary where for every node we are  

play33:43

getting the clustering coefficient value so if  you want the average clustering coefficient we  

play33:49

can either average average these values or we can  directly make use of another function which is  

play33:55

n x dot average i am sorry average clustering  so this should tell us the average clustering  

play34:07

present in the network so you see point five  seven is a average clustering so more this  

play34:15

value more the clustering and more title in it  thethe people in the friendship network are  

play34:21

that was about the clustering coefficient  lets go back and check few more properties  

play34:28

so what is the diameter of a networkdiameter  is basically the maximum shortest path that  

play34:36

we have to travel to go from one node to the  otherfor exampleif you if you know about all  

play34:43

pair shortest path algorithm it basically  the returns the metrics wherethe the values  

play34:49

are the length of the shortest path being the  two nodesso is it as that for everyevery pair  

play34:55

of thenodes so whatever is the maximum value in  that metrics will be the diameter of the network  

play35:03

in other words its the shortest path between  two most distant nodes in the network sofor  

play35:10

example if you see node one and node nine so  if you have to go from one to nine you would  

play35:16

have to traverse this path one to six to five to  nine. There is no other shortestpath between them  

play35:22

so the length of this this path path is three  and we dont see any othershortest path which is  

play35:31

longer than this threeif you want to go from one  to four again the length to the path is three if  

play35:37

you want togo from three to nine the length  is three so we dont see any other shortest  

play35:43

path which is more than three sothats why the  diameter of thisnetwork will be three we can  

play35:50

check the diameter of our network here is well.  so I am going to comment this and lets check the  

play35:58

diameter so I will write diameter is so I will the  function that we can use is n x dot diameter g  

play36:14

So it should give us the diameter so diameter is  five soso there are thirty four nodes here and  

play36:26

the diameter is fiveits its basically observe  that in real world networks the diameters is  

play36:33

basically very less becausethe nodes are  connected to each other and that makes the  

play36:38

distance between them very small and lets how the  diameter reduces so these were just thefewpoints  

play36:46

analysis thatwe performed on the networks that  we downloadedthat the main thing to noticehere  

play36:52

is that once you get the network in thenetworkx  graph object you can just play around with it  

play36:58

you can apply all the functions that are available  you can just read the documentation and apply the  

play37:04

functions which are relevant for that network  and you can go ahead with your analysis.

Rate This

5.0 / 5 (0 votes)

Ähnliche Tags
Network AnalysisPython PackageData VisualizationGraph TheorySocial NetworksFacebook NetworkKarate ClubWikipedia GraphPajek FormatGEXF Format
Benötigen Sie eine Zusammenfassung auf Englisch?