Hello, everyone! About a week ago, I was tagged in a LinkedIn post about Cosmograph. So, big thank you to Sergey Mastitsky for that! He mentioned that I should consider writing about Cosmograph for the second edition of my book.
This really made my day! I really appreciate the callout!
I spent a little time looking at Cosmograph, to get a sense if it would be useful in my workflows. Good news: it can be swapped right in over my previous method, and I’ll show you how!
Here’s a few of Cosmograph’s cool demos:
It was actually the “English Words” demo that really caught my attention. Cosmograph did quite well rendering a large and dense network in a notebook. That’s impressive and not common with Python libraries. So, I became curious. The demos made it look simple, so I gave it a try.
Actually, this wasn’t obvious. It took a couple hours of figuring out, and the solution came to me when I inspected the example DataFrames, not the graph visualization code itself. Often, the problem is in the data itself.
The only real problem that I ran into was that:
In networkx, you can construct a graph with a DataFrame, and the ‘source’ and ‘target’ fields can be strings or integers.
Cosmograph expects integers and doesn’t complain about strings. It just won’t work with strings. It’ll appear to be working but not actually work.
So, I spent about an hour trying to figure out why none of the edges were showing, but all of the nodes were. Inspecting their DataFrame led me to the answer. So, there’s the lesson in troubleshooting things. Check the data.
First, Code
Today’s code is available in Google Colab, and you can run the notebook directly. I have made it available for public use. Please give it a try and enjoy looking at the interactive graphs! Win, win!
In the Jupyter repo for day 41, I link to the Google Colab notebook, so you can always find the link there as well.
You can access the Google Colab notebook here. ← THIS IS TODAY’S CODE.
To run the notebook, click “Runtime” and then “Run All”, or you can click through the individual cells.
What’s the Workflow?
The workflow for using Cosmograph isn’t very different than using Scikit-network with NetworkX, and I have created an identically named function “draw_graph” to show how it can be a direct replacement for how I have been doing graph visualizations.
The process is simple:
Load the data
Create the ‘points’ dataframe
Create the ‘links’ dataframe
Visualize the graph
Step three took the most figuring out, as networkx will create a graph when nodes are strings, but Cosmograph will not. So, I needed to create a numeric edgelist, and then this began to work. See the code to understand.
Finally, to be practical, I need steps 2-4 to be bundled up in a function that returns a visualization widget. If I do that, this will be usable as a replacement for what I already do, with some parameters tweaked.
The draw_graph Function
The code to render the graph is about as simple as it is in my other method that I use with Scikit-Network, it’s just different. Here it is:
There’s three parts to it:
Create the points DataFrame
Create the links DataFrame
Render the graph
I have done it this way intentionally. In investigations, I work with networkx to do things, and then I typically render the part of the graph I want to look at. I don’t typically render a whole network, as that’s not very useful unless the graph has already been denoised.
So, this function takes a networkx graph as input and then renders it, keeping things simple. I can pass in any graph, and this will render it. For example, this code:
ego = nx.ego_graph(G, 'Network science')
draw_graph(ego)
I’m using networkx to pull the Ego Graph for the “Network science" node from the full graph and then visualizing it.
Here is another example:
ego = nx.ego_graph(G, 'Artificial intelligence', distance=2)
draw_graph(ego)
In this example, I am doing something slightly differently in networkx, and then the draw_graph function does what it does.
I could pass in any graph. I could pass in any toy graph from networkx, or I could pull out the communities from the whole graph and render those, or I could look at other ego graphs. The point is, this is flexible and suitable as a swap-in for analysis. It is ready and suitable for use for real investigations.
Work in Progress
This is a work in progress. By default, it does not look better than what I do now, but it is fast, and it is flexible. There are an overwhelming number of parameters that you can play with to tweak the visualizations. You can read about them here.
I am currently showing my visualizations with a bold orange background. This happened out of frustration, when the edges weren’t appearing and I couldn’t figure out why. I was playing with all kinds of settings, trying to get edge colors to show, playing with opacity, etc, etc. But now, I kind of like it. I like that I can easily play with background colors. This feels like Halloween, and it looks alright too!
That’s actually very good. That’s 7000 nodes, and it rendered almost immediately, and it is interactive. I can zoom in, zoom out, and drag things around. This is actually already an improvement, because this is zoomed out. But I need to see how high this can scale up, though it really doesn’t matter all that much. It seems good enough. Because if you apply community detection to a graph:
A billion scale network becomes million scale
A million scale network becomes thousand scale
And so on, for the most part. All graphs are unique, but in general, I will do preprocessing to remove stuff before I attempt to visualize a graph. A I showed above, I will just visualize the part of the graph I want to look at, which is often just dozens of nodes out of thousands. So, the performance seems fine.
And, it does look decent, zoomed in. It just needs work to look nice.
As you zoom in, it shows more node names. It becomes a little more difficult to take nice screenshots while zooming in, so just play with the notebook. But here is another example:
Challenges and Opportunities
Every new capability provides both challenges and opportunities, not just opportunities. These are just some that I immediately see on day one of using this library.
Challenges:
Everything takes time and effort to learn. Using this means using something I am less familiar with. It will slow me down for a bit, but that’s the way with learning. Sometimes, we slow down temporarily to go much faster.
Everything is different. Nodes are called points. Edges are called links. Also, Cosmograph doesn’t work directly with networkx, and I need to familiarize myself with dozens of parameters I have no idea how to use and learn their nuances.
Cosmograph isn’t playing nicely with my laptop. I will figure it out. If you have this issue, use Google Colab. Use my notebook as an example and create your own. You can read directly from 100daysofnetworks datasets on my github like I did in my Colab notebook.
Opportunities:
It really wasn’t hard to get working once I realized that I just needed to modify the edgelist a bit. This is easy to work with and opens up new doors.
This is interactive. My other visualization approach is not. With this, you can “fly around” even complex graphs. That will probably impact how I think about investigations, and I’m not sure how, yet. It used to be more work to get to interactive, so I’d just stick to 2D, but this opens up new doors that I am not even able to imagine, yet. Just like I encourage you, I need to play with this to learn more of what it can do.
There’s so much flexibility with the settings. You can have curvy edges or straight, tweak opacity, tweak node and edge colors, and so on.
But first things first:
I want to redo this notebook and see if I can get the graphs to render in an easy to read format, like my current visualizations. The visualizations in my book and previous blog posts are easy enough to read, and Cosmograph sometimes spasms, and the defaults don’t look great. Just needs some fiddling with parameters, I bet. It’s good enough, so long as I can make it more readable and suitable for print. That means, no spasms, and definitely crisp text.
I want to also stress test Cosmograph, seeing how well it will renter 10,000 nodes, then 50,000, then 100,000 (if it makes it that far). But, as I said, it’s fast enough, because I’m typically passing in a subgraph or ego graph anyway, not visualizing 100,000 at once. I just want to see how far it’ll go before falling apart.
There’s still a lot of unknowns. I’m still just getting started, but I wanted to share this. I wasn’t planning on writing today, but I was too excited after getting this to work and thinking about it. My mind was noisy, so I needed to write.
What Are You Going to Do?
I’ve written dozens of articles, showing many ways that Graph Analysis is useful. Hopefully, my writing has given you some ideas for using it in your own life and work. What are you up to? What are you using it for, or thinking about using it for? Have an idea and you need someone to hear you out? Do something with this!
Begin where you are. In order to find usefulness, you have to start using this, and learning from the insights you discover. Let me know if you get stuck. I challenge you to start playing with graphs on your own a bit. Please read my old articles or my book if you want to get back to the basics.
That’s All for Today
Special thank you to Sergey Mastitsky for letting me know about Cosmograph! I’m really grateful to my LinkedIn network and have learned a lot from my connections! Likewise, if you learn about or know of anything that can be helpful with #100daysofnetworks or graph analysis, please let me know, so I can play with it and possibly introduce others to it, like I am doing with this short article.
Thanks to everyone who has been following along with this series. Happy learning! If you would like to learn more about networks and network analysis, please buy a copy of my book!