Welcome to day 8 of #100daysofnetworks!
If you would like to learn more about networks and network analysis, please buy a copy of my book!
So far, through the course of this adventure, we've already learned some useful things such as:
How to use internet resources (such as Wikipedia) to create network data that can be useful for learning network analysis.
How to create network graphs using this data.
How to identify important nodes in the network.
How to identify communities in the network.
How to explore ego networks in the overall network.
This is already more than enough to be very effective in network analysis. This is because it is useful to be an explorer of networks and graphs, not just a user of graph data. I consider these as things to explore, not just things to use. By that, I mean that graphs are more than just input data for Machine Learning models, and that whole network metrics (overall density, number of nodes, number of edges, number of triangles) is just really not all that useful, in reality. When I do network analysis, I am looking for things, and it is never just a count of edges, or count of triangles, or a count of anything.
With the above skills, we already know enough to get started exploring networks, so let's get away from the dry fact learning, and do something interesting. Let's use what we know and use it for MUSIC DISCOVERY! This is an excellent weekend activity.
I have one goal: I want to learn three to five new things about my favorite band. I specifically set that to three to five, because it is easily possible to spend hours and hours and hours digging through a network. You should set goals so that you have something to aim for, and so that enough is enough. You can always return to an interesting network.
Let's get to work!
Wilco! My Favorite Band!
My favorite band is Wilco. I have been listening to them since 1997. I love their folk/rock sound and their experimental nature. They aren't just folk music. They get almost jam band'ish at times. They play in chaos. But they are also really mellow, and I love their folk and rock roots. I play guitar, and I find their music approachable. I am insecure about my ability to play lead guitar, and their lead guitarists has a style and skill that is approachable, where other artists are much more difficult.
Here's a picture of Wilco I found on a Google search.
In the front is Jeff Tweedy, the singer and songwriter. He is easily my favorite songwriter and frontman.
Here's one of my favorite Wilco songs.
They're just a great band, and I love to listen to them when I need to relax. I love playing guitar along with them as well.
You can learn more about them here.
Yesterday, I used my Wikipedia crawler to create an edgelist for exploring the Wikipedia network that exists around Wilco. Let's get to it!
Network Analysis Time
As always, the code is available on Github.
Datasets are also available. Use these to explore if you don't want to use the crawler!
This time, the dataset was a little noisy/messy, so I used Wilco's "ego network" with a radius of 2 to create a purer Wilco graph. You can see how I did that in the code. I do not explain to do that in my book. It is a useful option for quick cleanup.
After creating the network graph, the first thing I usually do is inspect the core of the overall network.
This is one of the most interesting network cores I have seen in a long time. I saw a similar "split nucleus" (what I call it) when I inspected an NRA (National Rifle Association) graph a few years ago and could clearly see that there were two parts (politics + gun enthusiasts) in the network.
I can see that there are two parts to this core: there's the actual Wilco stuff on the top left, and there's a bunch of stuff related to Roger Wilco on the bottom right. Because of the name, Roger Wilco was pulled into the dataset while crawling, using "snowball sampling". When using snowball sampling, you're always going to get some stuff you didn't expect, and it can be seen as junk or as something interesting. To me, this is interesting, not junk.
Notice that it looks like this core would fly apart if someone took a pair of scissors and made a few snips in the middle. If we were to cut four edges, the network would cleanly split in half. We're going to actually explore network resilience soon and will play with that idea.
I look at the core of every network to understand what is influencing the network. The core are nodes that are most connected in the network. The core is the foundation, and it can say a lot about the overall makeup of the network.
But the core is not everything. Isolates (nodes with no edges) can be very important, too.
However, the core gives me an overall picture of what is driving this network.
Nodes of Interest
After looking at the core of a network, I want to understand which are the most important nodes in the network. This can easily be done using Page Rank or one of many of the different kinds of centrality measures that we discussed previously, such as Betweenness Centrality.
This is a small network of only a few hundred nodes, so I will use Betweenness Centrality (which is slow on massive networks) and Page Rank to identify important nodes based on their network positioning.
First, here are the nodes that Page Rank found to be most important.
And here are the nodes that Betweeness Centrality found to be most important.
Here is an opportunity to build intuition: Look at the images without clicking on them to look closer. Notice that Betweenness Centrality looks very different than Page Rank. Why do you think that is? For both of them, Wilco and Jeff Tweedy are the most important nodes, but in Page Rank, the values are very different, and Jeff Tweedy has a much lower Betweenness Centrality than Wilco does. This is a good research topic, for learning. Get to know the centralities. They will quickly become intuitive, with use.
With my goal of finding 3-5 new things about Wilco, I'm going to look closer and see if anything stands out, and turn it into a question:
Who is Glenn Kotche?
Who is Billy Bragg?
Who or What is Uncle Tupelo?
Who is Jay Farrar?
Who or What is Tweedy?
Who is John Stirratt?
Now we are entering the realm of open source intelligence (OSINT)! Let's find out who each of them is and how they relate to Wilco!
Glenn Kotche is the drummer in Wilco. Cool!
Billy Bragg made two albums with Wilco. I love these albums!
Uncle Tupelo was Jeff Tweedy's band before Wilco, from 1987 to 1994.
Jay Farrar was part of Uncle Tupelo, Jeff Tweedy's band before Wilco.
Tweedy is a band that Jeff Tweedy made with his son.
John Stirratt plays bass in Wilco and other bands.
Other than Billy Brag, I did not know about the rest of them! I struggle to memorize names, and I haven't kept up with the individual musicians in Wilco. So, already, I've identified two bands that might be cool to listen to (Uncle Tupelo and Tweedy) and found musicians that might be worth further exploration (Jay Farrar and John Stirratt). The goal of finding 3-5 new things is already met, and we are just getting started. That's real value, and amazingly quick insights.
Community Detection
Next, I use Community Detection, to see how Wikipedia pages link together to form network communities. I prefer the Louvain Method, but Karate Club has a Machine Learning model called Scalable Community Detection (SCD) that is also excellent. I show how to use that in my book, and we will use it later, in this #100daysofnetworks adventure.
Community Detection essentially separates a graph into a bunch of smaller graphs that can be explored separately. It is useful on massive graphs, to break them apart for separate analysis, and it is useful on small graphs as well, to detect the cliques and communities that exist.
In the code, you can see more communities, but I will show and talk through just a few, here.
This is the largest community in the network. The node for "Billy Bragg" stands out. The node is colored red because it has the highest Page Rank value, of all the nodes in this community graph. This community is related to the album Mermaid Avenue and Mermaid Avenue volume II. I highly recommend listening to those albums. They are great, especially California Skies and One by One.
It makes sense that this is the largest community in the network, as this is one of their older albums. The album itself is a beautiful tribute to Woodie Guthrie.
Let's look at another community.
This community is very useful to me, as it is related to the band Uncle Tupelo, who I have not listened to. It was Jeff Tweedy's band before Wilco, so I should check them out! My goal was to learn 3-5 things, and I have already learned five, but it looks like there are a few more things to learn!
Will I like the album Anodyne?
Will I like the album No Depression?
Are there other albums on Jay Farrar's discography page that I might like?
Will I enjoy the rapper IDK?
What is Day of the Doug? Is that a band or an album? Will I like it?
I'm not going to chase those answers down right now. Let's keep going! That gives me something to explore when I need new music!
Let's look at another community!
This community relates to the video documentary "I am Trying to Break Your Heart". If you enjoy Wilco after reading this post, you will love that documentary. It's got great footage, and I have probably watched it over a dozen times by now. This introduces me to some new names and brings up questions:
Who is Jay Bennett?
Who or what is Kamera?
What is the Conet Project? What's it about?
What other documentaries has Fred Armisen made? Will I like them? What is his most recent?
What is Sam Jones' Wilco photography like? Are any of his shots for sale? Are they affordable?
What is "What I Mean to Say is Goodbye"? Is this a movie, album, or what?
Who is Bill Fay?
Notice something. Everything that we explore in a network brings up more questions. Graph exploration can unveil a MOUNTAIN of insights. Keep a notebook next to you, or a way to keep notes on your computer. Networks are complex and gradually an overwhelming feeling builds up if you explore too long without taking breaks. Look around, write notes, look around more, write more notes, then walk away for a while and take a long break. Good network analysis isn't done in one shot. It is iterative. Find insights, ask questions, do more digging, repeat. Do this until you have as many answers as you need. There is no DONE. You are done when you have enough.
Let's jump to Ego Networks, as I had the most success working with them for this analysis, more than communities. Check out the code to see more communities!
Egocentric Network Analysis
With community detection, we were interested in how pages linked together to form a community of nodes.
With Egocentric Network Analysis, we are interested in seeing what nodes exist around a node of interest, and how they link to each other. The main node is called the 'ego node', and all other nodes are called 'alters' or 'alter nodes'.
In an "ego network", the node of interest will typically be in the center, and alter nodes will surround it. However, one of my personal techniques that I find useful is to DROP the center node, the ego node. By doing this, the ego network can split apart, revealing the distinct groups that exist in an ego network. For instance, if a person is both an athlete and a video gamer, their social circle might contain both groups, and they may not interconnect. That's one example.
Opportunity for intuition: what other combinations could cause a person's ego network to have different groups? Think hobbies, religions, dating life, education, etc. Humans are complicated.
Let's look some ego networks!
This is the ego network for Wilco. When I said that today I had better luck with ego networks than I had with community detection, this is why. With the ego network of Wilco, I get nodes that are linked to or linking to the Wilco page, so they are all relevant in some way. And this is a very interesting ego network, because there is so much cross linking between alters. That is not always the case in networks. We are lucky! I could easily find several things that catch my interest, but let's keep going.
This is the ego network for Jeff Tweedy. As he is the lead singer for Wilco, it is going to be very similar to Wilco's ego network. In fact, Wilco and Jeff Tweedy's nodes show different node colors indicating high Page Rank values. Jeff Tweedy is in Wilco's ego network, and Wilco is in Jeff Tweedy's ego network. Let's continue.
This is the ego network of the band Tweedy, Jeff Tweedy and his son's band. I had never heard of them before this analysis, and I started listening to them immediately. So far, I like them! I will definitely explore this, more. I enjoy playing music with my kids, so I find this network relatable and sweet. I need to know more. Also, what is that band The Bronx like? New music! Let's continue.
This is the ego network of the Wilco discography--Wilco's albums! I accidentally stopped keeping up with them around 2012, so which ones have I not yet listened to?
Cate Le Bon is producing a new Wilco album, out in two weeks!!!!!
Star Wars: this is an album I have not listened to!
Schmilco: this is an album I have not listened to!
Ode to Joy: this is an album I have not listened to!
That's Enough!
And that's enough for one day! It is important to pace yourself when you do this kind of analysis.
My goal was to identify three to five new things I did not know about Wilco and I found many more than that.
What are the takeaways?
Networks and graphs are amazing exploratory devices for knowledge DISCOVERY
Networks and graphs are much more useful than in just Whole Network Analysis metrics (triangles, density, degrees, etc)
You can use OSINT to learn about GOOD things that interest you! OSINT isn't just for investigating bad or scary things! OSINT is literally knowledge discovery.
Everyone can use network data to learn about things. It is a choice to NOT use them. It is self-limiting to NOT learn to explore and analyze networks.
This was a fun post! Next post, I'll have some cool music to listen to while I am doing the work! I hope you enjoyed this!
Book Giveaway!
And for those who took the time to read this post, I am now going to announce a book giveaway! I am going to give away five copies of my book in digital and physical format.
Winners who live in the United States can receive either a signed physical copy (or unsigned) if they don't mind waiting a few days, or they can immediately receive a digital copy. I already have the digital copies ready for winners.
Winners who live outside the United States can receive a digital copy.
Hew is how to participate:
Find a topic of interest. For today's post, the topic of interest was "wilco".
Use the crawler from day 5 to create a network edgelist. Limit iteration to 3-4. Beyond that, if you are a beginner, it will be overwhelming. But I do encourage people to be bold in their learning.
Do network analysis like I did today. You can use today's Jupyter notebook as a template.
Create an "Insights Document" that has your findings. Use my blog posts for inspiration. You want to show some Whole Network Analysis findings, identify important nodes using Page Rank or Centralities, and look at a few ego networks. If community detection is difficult, skip it for now!
Message me the insights document on LinkedIn. I need to see your work and the insights you discovered. A Google document is easiest!
You don't have to be too thorough. Just do what you can. If you know nothing, then building a graph and finding important nodes is enough! If you know your way around networks, then push a bit further and show me what you can do!
I will accept submissions until September 30, 2023!
I will take all acceptable submissions and add them to a raffle. I will call out the winners on October 1!
And if you make your own LinkedIn post and tag #100daysofnetworks, I will add your name TWICE!
If a submission is not complete enough, I will ask for a bit more. I will try not to do that. Do the work, and you'll have a shot to win. Do the work.
Thank You!
Thank you for reading this post! I hope you enjoyed it! I wanted to do something a bit more interesting than usual, as it is important to do things that keep us creatively engaged. I look forward to seeing what you can do!
If you would like to learn more about networks and network analysis, please buy a copy of my book!