Welcome to day 2 of #100days of networks.
If you would like to learn more about networks and network analysis, please buy a copy of my book!
Today, we are going to start with the WHAT and WHY behind understanding networks. I'm going to explain what networks are, where we can find them, and why they are useful to explore, analyze, and understand.
Let's do that, real fast, and I'll discuss deeper in the remainder of this post.
WHAT: a network is just a manifestation of things and their relationships.
WHERE: networks are all around us, and network data is easy to get.
WHY: being able to analyze networks gives us new ways to understand the world and universe.
You should study network analysis, because networks are everywhere, and learning how to do this will give you skill to be able to analyze and explore data in new ways.
Ok, back to the beginning. What even is all of this? What are we going to discuss during #100daysofnetworks? What am I going to discuss in this specific post? My plan is to start at the beginning, describing networks and parts of networks, and describing why you should care.
In this post, I will discuss the following:
What is a network?
What is a node?
What is an edge?
What is a community?
What is a subgraph?
If you read my book or followed along with the first iteration of #100daysofnetworks, then you probably know the answer to all of these questions, but I am hoping to introduce this topic to more people, so it is important to start at the beginning. Today's discussion will be about networks, their parts, and how they manifest in seemingly everything around us.
The point of this is to show how pervasive networks are and to explain that being able to explore and interrogate them gives us new abilities in understanding life and the world around us. Life is not flattened, structured data. Life is complex.
What is a network?
As mentioned above, a network is just a manifestation of things and their relationships. I'm sure that a more academic definition can be found, but at the end of the day, a network is things and relationships.
A graph, on the other hand, is a representation of a network, for use in analysis, prediction, etc.
Often, I will use the terms graph and network as synonyms. That's common. People will often use the following phrases:
Graph Theory
Network Science
Social Network Analysis
Graph Data Science
There is so much overlap. I personally think of all of these as related. I apply Network Science when I am doing Social Network Analysis. I do not call what I do as Graph Data Science, but other people do. I like to keep things simple and just consider all of these as parts of Network Science, similar to how there are different domains in Data Science, or different domains in Software Engineering. That's how I think about all of this.
A network is a manifestation of things and their relationships, and a graph is a representation of a network. That's how I see it.
A network exists in the real world, and we often cannot know all of its parts. We can be aware that the network exists, and we can be aware of parts of the network, but we cannot see or understand everything.
A graph is our representation of what we know of the network. Perhaps we have--on paper--created an edgelist after watching people's social interactions and hand-drawn the network of how people have interacted. We've created a graph. If we add arrows showing the directionality of the relationship, we've created a Directed Graph, etc, etc.
But I slip up all the time. I'll have a whole day where I will call everything a network. Sometimes this is intentional. If in one sentence I call something a graph, and then in the next sentence I call something a network, a person may think I am talking about two different things. Or if they are from cybersecurity, they'll by default think I am talking about a computer network.
So, graphs, networks, they're synonymous to me, and I work with them every day. You can be more strict with yourself, if you like.
Regardless, networks exist in the real world and they are represented as graphs, and we can use graphs for analysis, prediction, and so on. Graphs are simultaneously a tool, a map, and a usable data structure. When they are visualized, they are often beautiful, like art.
Networks are all around us. Here are some examples, and we'll cover several of these during this adventure.
People Networks
Adversaries and Allies (Social Network)
Collaboration Network (Authors, Teams, etc)
Communications Network (Email, Tweets, Telegram, etc)
Computer Networks
Music Networks
Songs and Genres
Artist Collaborations
Song Evolution (Song -> Remix)
User Songs (for recommendations)
Data Networks
Entity Relationship Diagram (RELATIONSHIP is a giveaway)
Dataflow Diagram (Code -> Data)
Amplification (websites, social media accounts, etc)
Knowledge Graph
This is a very short list, just off the top of my head. What other kinds of networks can you imagine?
Think about what we are doing when we try to 'network' with other people. We are attempting to start some kind of relationship with other people so that we can find opportunities. Analyzing networks is another way to identify opportunities or understand reality.
Here is a visualization of the social network from Les Miserables.
Here is the same network with labels.
What is a node?
As mentioned before, a node is a thing in a network.
A node can be a person, a song, a food ingredient, an outcome, a computer program, a data file, a database table, etc. Use your imagination. A node is just a thing.
To keep things simple, hold the idea of a node as being a person or a computer. We all understand that human relationships are a thing, and we are all probably aware that computer networks exist. Computers are nodes on a computer network. People are nodes on a social network.
A node is shown on a network visualization as a dot or circle.
What is an edge?
An edge is a relationship between two nodes. Put another way, an edge is a relationship between two things. Put another way, things have relationships with things, and they are portrayed as an edge.
An edge is shown on a network visualization as a line. The edge is the line that exists between two nodes, the line that exists between two dots or circles.
I am one person. You are another person. If you are reading this, you are interacting with my words. We now have an author/reader relationship.
We now have an author/reader relationship. I am one dot, you are another, and there is a line between us. In the real world, nobody can see that line, but it exists.
What is a community?
A community is a group of connected things. Typically, when we talk about communities, we are talking about living things. However, there is such a thing as communities of websites, communities of social media accounts, etc. We could say that that's because there are people behind those websites and people behind those accounts, and I'd agree with that. But community detection algorithms are useful beyond studying living things.
I will show how to identify and visualize communities in the near future. We will use community detection very often.
To keep things simple, going back to our idea of people networks, a community would be a network of connected individuals.
For instance, families are densely connected. We tend to interact with people we live with. Work networks are less densely connected, and there are clear cliques/communities that work together. If you were to construct a network of every single person on the planet, it would be sparsly connected, and it would also include communities.
If you are reading this blog post, you are probably part of the IT community on LinkedIn, and the IT community has smaller communities for Data Science, Cybersecurity, Data Engineering, etc.
Try to think about communities that you belong to, online and offline.
In a network, a community is a group of connected things. In life, we are connected to people we interact with.
What is a subgraph?
In a network, a subgraph is similar to a community. A community IS a subgraph of the whole graph/network.
Let's keep this simple:
Graph: representation of the entire network
Community: connected things that are part of that graph; a smaller section
Subgraph: a smaller section of the entire graph
A subgraph is a smaller section of a larger graph. You can extract a part of the whole network for analysis, rather than working with the whole network.
And a community is also a smaller section of a larger graph.
But a subgraph does not need to be a community. For instance, if I wanted to see the subgraph of the whole graph that contained three nodes from community A and three nodes from community B, we'd likely end up with two separate networks. If you visualized it, you'd see two clusters.
A community is a subgraph, but a subgraph is not necessarily a community.
For instance, here is a subgraph taken from the larger Les Miserables network.
We will explore subgraphs more throughout this adventure, as they are extremely useful.
That's Enough for Today
I hope you found this to be an enjoyable read, and I hope my explanations made sense. This blog post was written quickly. If you would like to learn more about networks and network analysis, please buy a copy of my book!