Day 57 of #100daysofnetworks
Training Wheels are OFF; Graph DBs, SNA, and GraphRAG!
Hello everyone,
Since August, we have had a lot of fun on this blog, exploring Graph Databases, Knowledge Graphs, and GraphRAG for AI engineering. But during all of that, I was leaning on some things for my own learning.
Cognee is a beautiful tool, and you should learn about it. My use of it allowed us to quickly spin up Knowledge Graphs and GraphRAG interfaces, and we used it to explore various things:
Red Letter Scripture
Artificial Life Research
Literature
It was nice to rely on Cognee, because it helped me build familiarity in working with Graph Databases and GraphRAG architecture. I wrote a book on how to work with graphs programmatically, but I purposefully avoided graph databases for a long time, as described at the beginning of this blog series.
Exploration is important to understanding anything, and Cognee opened up an opportunity for all of us to learn about Graph Databases and GraphRAG, so we did!
But I realized something this morning: We had some really tedious and challenging hurdles to overcome, and we did. Those challenging hurdles take time and effort to overcome, but we did the work, and now we have a strong enough foundation to continue our learning and development without assistance. This is a powerful position, and a faster place to be in.
The result will probably be that articles will be shorter and more tactical while we learn to overcome smaller challenges. The writing will be easier, and hopefully more frequent.
Yesterday’s Breakthrough
Last night, I was just being pestered by my mind, wanting to take some GrooveSeeker data and put it into Neo4j. It was driving me nuts. My mind is like that, with software engineering and writing. Once an idea wants to be worked on, it will pester me until I do it.
I tried this back in August or September, and no matter what, I just couldn’t get the data to land right in Neo4j. I didn’t know what I was doing wrong. I was too new to Neo4j, too new to Neo4j Aura, Cypher was a complete unknown, and I wasn’t even sure what the correct process was.
Should I import an edgelist? Is there an ETL tool? Is that the way? That was my first thought, which I tried, and failed badly.
The articles I wrote about Cypher helped me the most. Here’s a bunch of relevant articles if you want to follow my journey:
So, yesterday, I downloaded a GrooveSeeker edgelist and decided to try to import it into Neo4j. Here’s some relevant code chunks:
from neo4j import GraphDatabase
import pandas as pd
data = '/work/gsv3_edgelist_20251205.csv'
edgelist_df = pd.read_csv(data)
edgelist_df.columns = ['source', 'target']
edgelist_df.dropna(inplace=True)
#edgelist_df = edgelist_df.sample(frac=0.01)
edgelist_dfYou won’t be able to run this, but I want you to see what I did. I loaded a GrooveSeeker edgelist that I pulled on 12/05/2025. I renamed the columns to ‘source’ and ‘target’. I dropped all rows that had a null target. Finally, importantly, when I try to do big things, I often throw in a .sample() to give myself a way to gradually turn up the heat. I don’t start at 100%. I start at 1% and get the data to look right, then I increase to 5%, then 20%, then 50%, then 100%. This helps validate things while they are small, and then go to full speed when you are comfortable.
# connect to neo4j
driver = GraphDatabase.driver(uri, auth=(username, password))
# convert dataframe to list of dicts
links = edgelist_df[['source', 'target']].to_dict('records')
# cypher query
query = """
UNWIND $batch AS link
MERGE (source:Website {name: link.source})
MERGE (target:Website {name: link.target})
MERGE (source)-[:LINKS_TO]->(target)
"""
# Batch import
batch_size = 1000 # 1000 edges per batch
for i in range(0, len(links), batch_size):
batch = links[i:i+batch_size]
with driver.session() as session:
session.run(query, batch=batch)
print(f"Processed {min(i+batch_size, len(links))} / {len(links)} links")
# close connection
driver.close()
print("Import complete!")This was some AI generated code to figure out batching. It worked and looks good.
# convert dataframe to list of dicts
links = edgelist_df[['source', 'target']].to_dict('records')
# cypher query
query = """
UNWIND $batch AS link
MERGE (source:Website {name: link.source})
MERGE (target:Website {name: link.target})
MERGE (source)-[:LINKS_TO]->(target)
"""This is what is most important. With the links line, I can gradually extend to more fields in the dataset, and then I can make adjustments to the cypher query to pull in these new relationships.
This means that we are at the beginning of opportunity. From here, things get fun, and we can be creative.
This successfully wrote the complete edgelist to Neo4j, and I can see the entire graph!
I can zoom in, interact with nodes, look at the communities. Even at this view, you should be able to see that there are communities and clusters. That isolated cluster on the right is interesting. If I look closer, I can see that it is Italy! Cool!
There’s not a lot to look at, at this stage. This was the breakthrough in thinking and the moment of impact.
What’s Next?
This opens up a lot of things. All of the datasets that we created for 100 Days of Networks since the beginning of this blog can be explored and used to create Knowledge graphs, and we have a lot of different kinds of networks.
But more generally, this is what I have in mind:
Find a large, complex edgelist that looks fun to explore and has content attached
Push it to Neo4j, with its content
Learn how to do Social Network Analysis using Neo4j
Learn how to design a Knowledge Graph using Neo4j
Learn how to create GraphRAG using Neo4j
Learn how to create Reliable AI from scratch using any or no Graph Database
A social network is a very simple graph. It is simply a graph of (a)-[:LINKS_TO]-(b), or even just a→b.
Knowledge Graphs have many more kinds of edges, so that’s going to be a fun and creative thing to work on. But it is a natural progression to start with a Social Network graph, then a Knowledge Graph.
I am also doing that because Social Network Analysis (SNA) is near and dear to me, and I want to see what Neo4j can do with the techniques that I have come up with over the years. I want to see how well it actually does for SNA, including Network Science metrics and approaches.
That’s All for Today!
Really cool, this little breakthrough also leads to simpler articles. We may be able to stay more tactical for a bit, and push forward in smaller learning chunks. It also makes the research and writing easier! Have a great day!
Please Support this Work!
I have written over 50 articles for this series. Each one takes about four hours of research, and several pages of writing and editing. Here are some ways you can support the blog!
Please subscribe, if you have not.
LET’S DO BUSINESS. Reach out to me if you need data or AI help! Happy to help! I am looking for partners and customers.
BIGGEST HELP to BLOG: Please consider upgrading if you are a subscriber. Thank you to all current paying subscribers for making this research and development possible!
Please buy my book to understand how I think about Natural Language Processing and Network Science combined.
Feel free to hang out in the comments and have a good time!
We have come so far since the very first day of the very first #100daysofnetworks. I love writing for this series. Thank you for being a part of it!






Fantastic walkthrough of that moment when theory clicks into practice. The incremental batch sizing approach (1% to 100%) is such a sleept-on pattern when loading into graph DBs, it saves so much pain compared to going all-in on the first try. One thing worth noting is that MERGE can get slow withlarge batches if the indexes aren't tuned upfront, might be worth a future post on when to use CREATE vs MERGE.