Today’s post is going to be high level. No code will be written, and no networks will be analyzed. I don’t always feel like writing code. But I enjoy thinking about networks during my weekend. So, I wanted to do a post today, but I wanted to avoid writing code. Rest is good.
My first thought was that I should do a post about Network EDA. I will eventually do one on this topic, as I believe that we should strive for some standardization. Many blog posts and videos have already been written about Network EDA, and my book is about exploring networks (the E in EDA), but I would like there to be a tool someday that does similar to df.describe() if you are familiar with Pandas.
But I will save that for another day. Today, I decided to talk about the different types of network analysis, because there are more than one. In fact, I’ve already shown several. When I create reports about networks, I typically have a few sections in my reports for different types of network analysis. In fact, I even do that on this blog series. I’ll start with a high level overview of the network, then look for key nodes, then look at ego networks, then look at communities, and so on. Each of these is a separate analysis, but together they provide a lot of context.
Types of Network Analysis
As I mentioned above, my reports often have a few key sections:
High Level Overview
Key Nodes
Ego Networks of Key Nodes
Communities and Community Context
That’s a good outline for any network analysis, but different techniques are used in each section. I’ll get away from the above simpler names and give more appropriate network analysis terminology. Here are some of the types of network analysis I do and describe:
High Level EDA (Whole Network Analysis - WNA)
Thorough Whole Network Analysis (Similar to above but much deeper context)
Centrality Analysis / Key Node Analysis
Egocentric Network Analysis
Community / Clique / Group / Cluster Analysis
Hub and Authority Analysis
Bridge Analysis
Core and Corona Analysis
I am leaving off things like edge prediction or attack simulation. Those are things that can be done with network graphs, but they are not analysis. But the outputs can be analyzed.
This is only scratching the surface. Use the arXiv data collector and start reading Network Science articles if you want to learn more about what researchers are researching. In this blog post, I am writing about what I personally use in my work and personal life. I write about Practical, Applied, Hands-on Network Analysis.
Discussion of Analysis Types
Let’s discuss what each of these are, where I’ve demonstrated them, and what techniques or functionality is useful.
High Level EDA
In this analysis, your goal are quick high level findings, to give yourself a map for deeper investigation.
High level Network EDA is done using what is called Whole Network Analysis, shortened as WNA. In high level EDA, my goal is to quickly learn as much high level information as I can about the network. EDA gives us a map for deeper exploration and analysis. When I am doing this, I am looking for node and edge counts, key node identification (centralities), connected component counts and sizes, overall density, connected component densities. You can see examples of many of these in these posts.
Thorough Whole Network Analysis
In this analysis, your goal is to learn as much as you can about the structure of the entire network. This can be very time consuming, so set a cutoff date or goal, or this will be aimless and wasteful.
With networks, we’re typically looking for something. To me, personally, WNA context is more trivia than useful. Like, what am I going to do with the knowledge that a 500,000 node network has an overall density of 0.00234353? It tells me that it is a sparse network, but not super helpful. What am I going to do with the knowledge that there are 28,698 triangles in the network?
But you can spend days on WNA. There is always more to explore, and more insights to find, and good insights sometimes come from where we least expect it. It’s a great place to be, for learning about network science and network science research.
Whole Network Analysis is the top down analysis of the entire network at once, completely zoomed out.
Centrality Analysis / Key Node Analysis
In this analysis, your goal is to identify the key nodes in a network. There are many different kinds of centrality algorithms, depending on the context you are after. Explore them all!
Other than the moment of visualizing your first network, centrality analysis is probably the first memorable experience you will have in network analysis. Centrality analysis is about identifying the key nodes in any network. The graph is your input data, and you receive a list of important nodes, but you an also see how important.
These are the Page Rank scores of the top twenty most important characters in Alice in Wonderland (according to Page Rank). It is clear that Alice is the main character.
I work this into every network analysis I do, because there is value in understanding the key players in any network. I recommend that you learn as much as you can about different kinds of centralities.
I actually already wrote a bit about centralities.
This leads naturally to Egocentric Network Analysis.
Egocentric Network Analysis
In this analysis, your goal is to zoom in on nodes of interest.
There are whole books about Egocentric Network Analysis. Egocentric Network Analysis is all about “zooming in” on the nodes in a network, by investigating their Ego Networks. I have written about this in my book and in this series, multiple times, and shown examples.
This is Alice’s Ego Network, from Alice in Wonderland. These are the characters she is affiliated with, and you can even see who the alter nodes (others) are affiliated with as well.
Keep this simple by thinking of it as zooming in on a node. Network Analysis feels a lot like using a microscope.
Community / Clique / Group / Cluster Analysis
In this analysis, your goal is to identify groups and understand what makes them different.
Networks typically have multiple communities, groups, or clusters of some sort. You can find them by looking at connected components, as I’ve shown several times. These communities can act as their own tiny ecosystems, acting very differently than other parts of the network. Understanding their differences can be illuminating. Why is group A so different from group B? As you do network analysis, write down your questions. They will lead you to more discovery.
Beyond investigating connected components, you should spend time learning about Community Detection Algorithms. So far, I like the Louvain Method and SCD (Scalable Community Detection) best, as they both supposedly scale to billion-scale networks.
Hub and Authority Analysis
In this analysis, your goal is to identify the key hubs in a network, or to identify the key authorities in a network.
I haven’t written about it on my blog, yet, but I think I wrote about it in my book. I need to go back and check. But here is a very cool algorithm for identifying both the hubs and authorities of any network.
Hubs: many outbound links
Authorities: many inbound links
Think about the internet, or social media. News aggregators are hubs. A news aggregator will link to hundreds of websites. A popular website is an authority. Hundreds of websites will link to a popular website.
On social media, a person who constantly shares new stories from many sources is a hub. A person with a million followers is an authority.
There are different uses for this knowledge, depending on what you are doing. If you are doing internet research, then hubs and authorities is an important thing to analyze. It would also be useful in understanding any kind of flow (dataflow, information flow, influence, etc).
Bridge Analysis
In this analysis, your goal is to identify what is holding the network together.
Networks are held together by bridge nodes. Let’s say that there are two communities: gamers, and anime fans. These are two different things, but some gamers love anime. Some gamers will be a bridge between these two communities. If you look at the two communities themselves, often their behavior will be subtly different than where the bridges form. I think about it like when paint mixes. When ideas collide, interesting things happen. Information isn’t a steady thing. Ideas compete to be included. This is influence.
Bridge nodes are the glue of a network. They pull separate communities together, forming a larger ecosystem. Node importance algorithms such as betweenness centrality and Page Rank are useful for finding these bridge nodes, but networkx also has a nx.bridges(G) function that makes short work of identifying bridges.
Seek to answer the question of who the key bridges are and what makes them special.
On days 22 and 23, I show the impact of attacking bridge/key nodes in a network. If you attack bridge nodes, the network shatters into pieces, leaving only the most dense network structures remaining.
Core and Corona Analysis
In this analysis, your goal is to understand what is happening in different layers of the network.
Network Analysis is compared to lots of things (using a microscope or telescope, zooming in, exploring the universe), but I think of peeling an onion when I think of core and corona analysis.
Networks have layers. When I think of the layers of a network, I am thinking of k_core and k_corona.
With k_core, I can investigate what the nucleus/core of a network is made of, to understand what is guiding the network.
With k_corona, I can explore the layers, separately. K_corona(0) will give me all nodes with zero edges. K_corona(1) will give me all nodes with one edge, and any connections between them and other nodes with only one edge. K_corona(2) will give me all nodes with two edges and any connections between them and any other nodes with two edges. And so on. It is another way to look at the network, like peeling an onion.
I personally don’t have a lot of use for this, but I often use k_core(G, 1) as a shortcut to quickly throw away all isolate nodes. That’s my main use. But I would love to explore this further and learn more.
That’s All, Folks!
There’s no code today, just reading. Take this time to think through what you have read and brainstorm the kinds of networks you’d like to explore and how you would use these different types of analysis to learn more.
That’s all for today! Thanks for reading! If you would like to learn more about networks and network analysis, please buy a copy of my book!