Day 56 of #100daysofnetworks
GraphDB -> Get Data-> Use Data -> Visualize
Hi everyone. Today, we’re going to continue our progress in learning to work with Graph Databases. In previous articles, I gave a gentle introduction into Cypher, a graph database query language. Here are some articles to catch you up, if you are new to this series.
On this first day, I planned out several days of learning to work through, but what would this blog be without pivots? We will still learn those things, but we will learn them as we build rather than before building. We will learn what we need when we need it.
On day 52, we began actually learning the Cypher language. This is a good “get useful with Cypher fast” article. It will help you learn to write queries and see results.
Right after that, I lost my job and founded a company on the same day. So, Day 53 had to do with that. I am building a company now, so I have less time for things that are not practical.
On day 54, we returned to Cypher and learned about filtering, ways to remove noise and get to the signal of interest.
On day 55, I explained more of what is going on (Verdant Intelligence, GrooveSeeker, #100daysofnetworks) and mention that we are going to focus on building.
Today’s Plan
Today, we are returning to Cypher learning, but less about Cypher syntax. Today, we are going to learn to:
Connect to a graph database programmatically
Run a query and fetch results
Do something with the results (simple: save edgelist files)
Visualize the results
This is a pretty practical workflow that happens frequently in data. A person will query a database, get data, and then use it. We’re going to do it with a Graph Database. It’s not difficult, it just required some learning in order for me to do it, and I will spend the rest of my life improving my Cypher skills, I’m sure.
Artificial Life Database
After this article, I am going to pause the Artificial Life database and use new instances. I will keep that around for exploration and learning, but I will pause it so that it isn’t costing as much.
So, I’m excited to write this article, because next is some really creative development. After this article, we are going to learn about Knowledge Graph development, FROM SCRATCH. We are going to create them using Graph Databases and our #100daysofnetworks datasets. There are many fun ones to use. We are going to learn to create KGs from beginning to end.
But that’s not all we are going to work on. I am a creative person. I feel stuck when we work on one topic for too long. I’m a builder. I like to build things.
Today’s Show and Tell
Today’s demonstration is technical. I am going to share code and outputs. I will be concise to move fast. My writing style will shift, from here. In order to pull data, I used the ‘neo4j’ python library.
from neo4j import GraphDatabase
import pandas as pdI created some minimal working code to get a driver.
def get_driver():
uri = ""
user = "neo4j"
password = ""
driver = GraphDatabase.driver(uri, auth=(user, password))
return driverThis was for Neo4j Aura. Other instances may be different. Claude will help you. After that, I wrote a function to simply pull data from Neo4j based on a query.
def get_data(query):
# connect
driver = get_driver()
# run the query, get the data
with driver.session() as session:
result = session.run(query)
records = list(result)
# close the connection
driver.close()
# return the data
return recordsAnd today, I am specifically pulling Arxiv author collaboration networks, so I want to fetch this data as an edgelist DataFrame.
def get_edgelist_data(query):
# run the query, get the data
result = get_data(query)
# customize it for this particular use-case; customize here;
edgelist = []
for record in result:
source = record["a"].get("name")
source_description = record["a"].get('description')
relationship = record["r"].type
relationship_description = record["r"].get('description')
target = record["b"].get("name")
target_description = record["b"].get('description')
edgelist.append((source, relationship, target, source_description, relationship_description, target_description))
# build and return the dataframe
edgelist_df = pd.DataFrame(edgelist)
edgelist_df.columns = ['source', 'relationship', 'target', 'source_description', 'relationship_description', 'target_description']
return edgelist_dfWith those functions, I am set up nicely to run code like this, fetch data, and even visualize it. It’s a bit gnarly code, but I was going fast and wanted data.
query = """
MATCH (a)-[r:authored_by|written_by|authored|author_of|author|is_author_of|is_author|authors|_authored_by|co_authored|co_authored_with|co_author|co_authored_by|co_author_of_study|co_authored_paper|co_authored_study_on|co_authored_work|co_authored_study|co_author_of|authored_study|authored_on|authored_work_on|authored_study_about|authored_research_on|authored_on_date|authored_results|wrote|wrote_about|wrote_on|is_written_by|co_written_by]-(b)
WHERE a.description CONTAINS 'interstellar'
RETURN a, r, b
"""
df = get_edgelist_data(query)
df.to_csv('/work/edgelists/edgelist_alife_interstellar.csv', index=False, header=True)
dfThe query ran and fetched data.
I chose these fields and to build an edgelist because I wanted to visualize this with Cosmograph. But before doing that, I fetched a few different collaboration networks, using different WHERE statements.
WHERE a.description CONTAINS 'interstellar' (37 edges)
WHERE a.description CONTAINS 'comet' (5 edges)
WHERE a.description CONTAINS 'artificial life' (3 edges)
WHERE a.description CONTAINS 'network' (438 edges; nice)
NO FILTER (13,054 edges)I wasn’t interested in visualizing a small graph, so I explored a few keywords. The ‘network’ and NO FILTER ones will be neat to inspect.
I can see the specific papers that are associated with a topic ecosystem, for any topic.
Finally, I created multiple files to visualize with Cosmograph.
Cosmograph Visualization
Two days ago, Nikita Rokotyan announced a new release of Cosmograph, and I am a huge fan of Cosmograph, so I had to check it out. I’ve written about it previously on this blog, but if you are not aware, Cosmograph is MILLION scale Graph Visualization and Analysis software. Here are some of my articles about it:
If I visualize the whole NO FILTER network, it looks like this:
My heart was really happy when I saw this. Usually, with a graph at this scale, I don’t see this clarity. Even without zooming in, I can see a lot of life, a lot of structure. I love this latest release. The coloring is another nice improvement! I’ve removed labels so that you can see the shapes.
There is a lot to explore! So that’s the end of the tour! Thank you for reading and checking this out! This is useful progress!
Please Support this Work!
I have written over 50 articles for this series. Each one takes about four hours of research, and several pages of writing and editing. Here are some ways you can support the blog!
Please subscribe, if you have not.
LET’S DO BUSINESS. Reach out to me if you need data or AI help! Happy to help!
BIGGEST HELP to BLOG: Please consider upgrading if you are a subscriber. Thank you to all current paying subscribers for making this research and development possible!
Please buy my book to understand how I think about Natural Language Processing and Network Science combined.
Feel free to hang out in the comments and have a good time!
We have come so far since the very first day of the very first #100daysofnetworks. I love writing for this series. Thank you for being a part of it!













I got interested in KG’s when first learning vibe coding bc I thought it could be a powerful and efficient way for an LLM to map a user’s interests and personality. Maybe I’ll go back to that.
Brilliant walkthrough of the GraphDB-to-visualization workflow! Your approach to building edgelist DataFrames with those flexible relationship patterns is really elegant,especially how you can pivot between different topic filters to explore various collaboration networks. The fact that you immediately saw structural clarity in the 13k edge network with Cosmograph's latest release speaks volumes aboutthe tool's capability. I'm curious whether you've experimented with any graph metrics onthese collaboration networks before visualization, or if you find the visual exploration sufficient for pattern discovery at this stage?