Today, I want to reinforce the learning we have done over the course of this series, summarizing our learning. This article will be useful to bookmark and revisit as a resource, as I will be discussing the different scales and views that are possible in network analysis.
Why Do This?
In my opinion, the hardest part of learning to analyze networks is learning to look at the different parts of a network rather than the whole thing. It’s very easy to make a very large graph about anything. However, once the graph is made, people get stuck. I have seen this repeatedly. People have graph data, and then they go, “Now what?” Teams consider whether they need a graph database, and then cycles are spent researching graph databases. Months go by and nothing gets done. People then sour on network analysis.
It’s a very confusing moment, after you realize that you have a ton of graph data and no idea how to use it.
One of the reasons I wrote my book was to take away this specific pain. Throughout my book, I do not teach graph databases. Over the course of this blog series, I’ve shown how to create and analyze several different kinds of graphs: collaboration networks, topic graphs, social networks, and more. And I’ve shown how to do this without any requirement of learning a graph database, giving you an easier learning course.
A graph database may eventually be useful to you. But it is not a requirement for learning to analyze and pull useful insights from networks. I do not use one for anything.
For me, things suddenly became easy when I learned to dissect networks, cut them into pieces, and look at the pieces individually. Million scale becomes thousand scale (or less) this way, and becomes very easy to investigate. Consider this as untangling complexity, or flying around a network.
High Level Overview
This table describes the approaches that can be used to look at the parts of a network.
In this article, I’m going to describe each of these and provide links to networkx documentation. I talk about these concepts throughout my book and blog series, so I’m not going to look up each individual where I mention these. You can use this image as a guide, and find articles where I discuss these concepts. Later, I will do an index for these posts, but they are still being written, and I don’t want to manage that right now.
Views and Scales
I’m calling these as views and scales. Each one of these provides a different view of the network, and at a different scale.
Whole Network
Whole Network Analysis (WNA) is the analysis of the whole network. At this level, you can capture high level insights:
Centralities and Importances will tell you what are the most important nodes based on some context.
For speed, I recommend that you start with PageRank. If your network is small, you can use Betweenness Centrality instead.
If your network is large, Betweenness Centrality will be very slow and so will Closeness Centrality. Part of this learning is learning which measures are useful with the network size that you have.
Big time saver: use Page Rank by default on any network, then add more that you notice will be suitable and usable.
If your network is small enough, you can explore the “shortest paths”.
Node and edge counts will tell you the size of the network
Density and other measures can tell you about the characteristics of the whole network
Bridges are cool to explore, to learn about the nodes that are holding the network together.
Modularity can tell the extend that group mixing is taking place in the network.
At the level of Whole Network Analysis (WNA), I think of this as Network EDA (Exploratory Data Analysis). At this level, it is about building a mental picture of the size, complexity, and key movers of a network.
Connected Components
Connected Components are essentially the connected structures of a network. Networks typically have one supermassive component, several smaller components, and lots of isolate nodes. In my own book, I describe them as continents and islands.
Throughout this series, I’ve shown how to identify the connected components in a network, and you can read the networkx documentation here.
This part has a lot of overlap with community detection, so I tend to just identify the individual connected components, inspect the largest ones, and move along. Community detection is a bit more useful to me, so I keep this part light.
Communities
You can use Community Detection algorithms to identify the communities that are at play in any given network. If the network is a social network of people or animals, you will find the groups that interact. If the network is not life-based, you will find groups of things. You can think of community detection as clustering algorithms. They identify the clustering that is taking place in any network based on nodes connectivity.
There are several community detection algorithms, and I’ve written about them in my book and on this series. My favorite is the Louvain Method, and you will see that used on most days.
There are so many different approaches that can be used for community detection, that networkx has made a whole section for this topic. Here are a few I recommend looking at:
In my book, I discuss these and others. I also have a whole chapter on using Graph Machine Learning to identify communities, and showcase a cool algorithm called Scalable Community Detection (SCD) which rivals the Louvain Method in both speed and accuracy. You should buy my book. You should also buy this book, as it shows how Machine Learning can be used for community detection and other things.
K-core
K-core doesn’t get the attention it deserves. It is rarely talked about, and I only accidentally discovered it in one of the Social Network Analysis books that I own. I use K-core for two different reasons:
It’s a great shortcut for removing all isolates with one line of code
It’s useful for zooming in on the core of the network
The core of a network can tell you a lot about the key movers and shakers. If you zoom in on the core of any network, you will see a small group of densely connected nodes. Once you find these nodes, you should start coming up with questions to learn more about them.
You can find examples of me using k-core throughout this series, and you can read the documentation here.
K-corona
Where k-core is about looking at the core of a network, k-corona is about being able to look at the layers of a network. Think of a network as an onion. There will be nodes with:
0 edges (no connections)
1 edge
2 edges
3 edges
and so on all the way to the maximum edge count for a given node
With k-corona, you can look at each of these levels. Personally, as k-core can be useful for quickly removing all isolates, I mainly use k-corona for quickly finding all isolates so that I can analyze them rather than discarding them. To do this, I would do something like
isolates = nx.k_corona(G, k=0)
However, you are not limited to using this to inspect isolates. You could use this to analyze any layer. You can read more about k-corona here.
So that this concept sticks: think of this as peeling an onion and being able to analyze the peeled layer
Egos and Alters
Egocentric Network Analysis is one of my favorite parts of any network analysis. I’ve written about this in my book and in this series. Here is another book that I own on the topic. It’s a good read.
In Egocentric Network Analysis, you zoom in on one node, the ego node. This one node—the ego node—is where the name Egocentric Network Analysis comes from.
In Egocentric Network Analysis, you pull an ego network from the whole network. Usually, I do this after identifying the most important nodes using centralities and importances. That task leads right to this. The first task makes a shortlist of important nodes, and then in Egocentric Network Analysis, I inspect each one.
In yesterday’s article, I took Egocentric Network Analysis further than I have ever done before by identifying the modularity of a single ego network. I am still thinking about what I can do with this and use it for, and how to make the process simpler.
In an ego network, the ego node is in the center, and all of the related nodes (friends, enemies, acquaintances, coworkers, etc) also appear. However, they are known as alters. Egos and alters. Those are the words to know. Ego is the center, alters are the others. Ego is the subject, alters co-stars.
You can find plenty of examples throughout this series, and you can read the networkx documentation here.
Subgraphs
A subgraph is just a part of a graph. A graph within a graph. Several of the above extract subgraphs. An ego network is a subgraph of the entire network, focused on one ego node and its alters. A connected component is a subgraph of the entire network, focused on one single structure in the network. Communities are also a subgraph of the entire network, focused on identifying and extracting the individual groups.
But there is one networkx function that can be used directly to pull a subgraph.
I personally use this alongside community detection. I’ll use community detection algorithms to identify the nodes that belong together, and then I’ll use a subgraph to pull the community from the larger network.
It’s also just a generally useful tool to have around, for anytime you want to inspect only a part of the network. You can inspect any given nodes using this.
Modularity
Modularity is a concept I have been learning about recently. You should read yesterday’s post to learn more about it, so that I don’t have to type all of that again.
Here is an image from Networks by Mark Newman.
Modularity has to do with the extent that different groups mix in a network. Think of it this way: communities are made up of people, but people are not all the same. Modularity allows us to peer into networks, to understand how relationships exist between groups. This means that community detection and ego networks aren’t the deepest we can go. We can use modularity to investigate each of these small groups further.
I linked to the networkx resources in yesterday’s post. Please read it.
Node Centralities and Importances
Last but absolutely not least, node centralities and importances are probably the first most exciting thing that anyone will face when they begin learning to analyze networks. It is the first OH MY GOODNESS moment in the learning process, when you realize you can take a gnarly spider web and pull out a clean list of most important and influential nodes.
Node centralities and importances tell you who the stars are of any network. There are several that you should know about:
PageRank: your go to, no matter the network. It is fast and useful for finding key nodes, no matter the size of your network. Importance has to do with the number of inbound and outbound edges. The Google founders created this algorithm and used it in their search engine to find important websites on any given topic.
Betweenness Centrality: this is the one people usually learn first. Betweenness centrality has to do with nodes that appear in the most “shortest paths”. Nodes with the highest betweenness centrality scores sit between people and groups. Information flows through them. They become important to decision making. They could also be a blocker or gatekeeper. This algorithm becomes unusable the larger the size of your network. If it becomes unusable, PageRank or Degree Centrality will do.
Degree Centrality: nodes are important if they have more degrees/edges. This is always fast and can be useful on very large networks, when Betweenness Centrality is no longer usable.
Closeness Centrality: nodes are important if they are close to many nodes. This algorithm becomes unusable the larger the size of your network. If it becomes unusable, Page Rank or Degree Centrality will do.
There are many to explore. There are so many of these that networkx has a whole page about them. There are more available than I have found time to learn. This page is a very good resource for learning.
That’s All for Today
Learning to fly around networks and inspect different levels is useful for capturing a wealth of useful context and network insights.
Thank you for reading today’s article. Today’s post is a bit of review, and I hope it gave you a fresh perspective of things that can be immediately useful in any network analysis. This is a good page to bookmark as a resource for later use.
Thanks to everyone who has been following along with this series. Happy learning! If you would like to learn more about networks and network analysis, please buy a copy of my book!