Day 62 of #100daysofnetworks
GraphRAG: When the Graph Speaks Back
Hello everyone! Merry Christmas and Happy Holidays!
Only a few days have passed since the previous article, but I have many updates to announce, so let’s start there!
First, my company Verdant Intelligence’s first brand GrooveSeeker just got its first swag made! We have t-shirts and stickers!
This was my first time seeing them, right out of the box. We will make these available soon! The sticker looks great on my laptop, and I love the t-shirt! The website is being brought online, and the community is gradually taking shape!
You can learn about GrooveSeeker here.
Second, most importantly, my company brought its first realtime AI online. GrooveSeeker is related, actually, as GrooveSeeker relies on “living information,” information that is in motion, not collecting dust. We call this our “Living Library of Knowledge,” and we are making the platform and API available right now. I will write more about this. AI should be able to answer questions about things as they happen. My platform makes that possible without web search, giving AI the ability to integrate with the heartbeat of the internet itself. This is very useful.
Next, and related to this blog, [BREAKTHROUGH!!!] I successfully loaded 100% of the Artificial Life dataset into a Knowledge Graph! Previously, we were stuck at 60%, and then at 10% before that. Now, we have the whole thing, and that sets us up for GraphRAG and building with Artificial Intelligence!
Verdant Intelligence now has two running GraphRAG setups, and we will make them both available for researchers, AI builders, and others.
Minimal GraphRAG from Scratch!
That was a lot of updates. After creating the Artificial Life Knowledge Graph, I then immediately went to work building a Minimal GraphRAG setup for it. As I mentioned in the last article, I want to create a MINIMAL thing first.
Because by doing it this way, I learn all of the parts. I can build each part independently, and I can test them independently. That is very useful, because in AI and Data Science, there is often an urge to start with complexity rather than simplicity. Nope, that breaks engineering principles. We start simple, and we add on complexity as we need it.
There is a famous quote attributed to Albert Einstein: “Everything should be made as simple as possible, but not simpler.”
That means to make things simple enough to be useful and functional, but not so simple that they do not work. Common sense.
Today’s approach is very simple. It looks like this:
There’s a sequence, and the first step pulls in the Knowledge Graph schema for help in designing Cypher queries. There is also a general use ‘prompt_llm’ function that is used in this workflow but is generally useful elsewhere, as well.
In reality, it works more like this.
IMPORTANT ANNOUNCEMENT: Code will be provided to paying subscribers. Free readers can read about what I have done and how I think about things, and I will be writing code walkthroughs for paying readers. Paying readers will also have the option of Graph Database access (if they want it). They can use this to do their own GraphRAG experiments with fewer headaches!
I’ll explain the process.
The interface receives a prompt. It could come from a human, or it could come from an AI. It doesn’t matter. The input is a prompt. The goal is to take this prompt (‘show me papers about Jupiter.’) and convert it into a Cypher query. I pass in the KG schema and relationships as a hint in how to design the queries, so it doesn’t just invent a bunch of nonsense edge types and try to find them.
With this generated Cypher query, a dataset is pulled from the Graph Database. The data contained in this dataset should be relevant to the prompt. If it is not, you have a big problem, and you should spend time improving step one. Step two is straightforward. Input Cypher, output data.
The third step is important as it improves AI response times and cuts down on your AI costs. There is a lot of extra noise that is returned from a Graph Database. I simplified it as much as possible without losing any information. I trimmed the fat.
And finally, in the fourth step, the simplified data is used to answer the question in the way that the user has asked.
Or to understand this even simpler:
Convert prompt to database query
Get data using the database query
Simplify the data so that things go faster and cost less
Use the data to answer the question
That’s really it for this minimalistic GraphRAG setup. If it can do these things and answer simple questions, I’ll be happy. This is version 0.1.0, not version 1.16.3.
Artificial Life Comes Alive!
The moment a Knowledge Graph talks back is a really incredible moment. Having done a lot of Graph Analysis over many years, the first visualization of a new network has always been kind of a lightswitch AHA! moment, and that feeling is unforgettable.
But experiencing GraphRAG come to life is another thing. When you go from crafting your own queries and manually inspecting the data to just asking, “What papers are about Jupiter?”
And then the Graph talks back! That’s a neat experience. Here is the first breath from the Artificial Life Knowledge Graph.
what papers are about jupiter?
CALL db.index.fulltext.queryNodes('fulltext_search', 'jupiter') YIELD node AS paper, score
RETURN properties(paper)
ORDER BY score DESC
LIMIT 20
Here is a list of papers that discuss various aspects related to Jupiter:
1. **Title:** [Jupiter - friend or foe? III: the Oort cloud comets](http://arxiv.org/pdf/0911.4381v1)
**Authors:** J Horner, B W Jones, J Chambers
**Published Date:** November 23, 2009
**Summary:** This paper examines how Jupiter impacts the collision rate of Oort cloud comets on Earth. It finds that a giant planet like Jupiter reduces the impact rate as its mass increases, suggesting that life on Earth thrived partly due to Jupiter's mitigating effect.
... a lot more ...
12. **Title:** [Strategies for the search of life in the universe](http://arxiv.org/pdf/astro-ph/9604131v1)
**Authors:** Jean Schneider
**Published Date:** April 23, 1996
**Summary:** Discusses the characteristics necessary for habitable zones and how Jupiter-like planets factor into the search for extraterrestrial life.
These papers explore a variety of topics surrounding Jupiter, from its role in shielding Earth from impacts to its influence on climate and planetary habitability.Cool stuff! There’s a few things I want to explain:
I asked: “What papers are about Jupiter”
It correctly used the papers fulltext index I had set up, for NLP search and returned by score/relevancy.
It only returned properties of the paper node, not additional node types. It didn’t attempt to craft some overly complicated query, because I was asking about one single node type.
It only returned 20 papers, instead of the entire database, which could have crashed my GraphRAG instance or made answers very slow.
This gets to some design principles. How did I get this to do the queries well?
It’s because I experimented with Prompt to Cypher. My first approach was ok, but my second approach was much better. It emphasizes simplicity AND completeness, only querying the types of nodes that are mentioned or implied in the prompt but demanding that no data be left behind. The first approach led to slower queries, datasets with data that went unused, and slower answers. The second approach made things a lot more stable, gave me the completeness that I wanted, and sped things up as well.
So, we’re going to be doing more with Prompt to Cypher, and it is worth researching.
Let’s look at another output:
what categories are related to papers about jupiter?
CALL db.index.fulltext.queryNodes('fulltext_search', 'jupiter') YIELD node AS paper
MATCH (paper)-[:BELONGS_TO]-(category:Category)
RETURN DISTINCT properties(category)
Papers about Jupiter are related to the following categories:
1. **Earth and Planetary Astrophysics (astro-ph.EP)**: This category covers a broad range of topics related to the interplanetary medium, planetary physics, planetary astrobiology, extrasolar planets, comets, asteroids, and meteorites, as well as the structure and formation of the solar system.
2. **Solar and Stellar Astrophysics (astro-ph.SR)**: Though it primarily focuses on stars and their evolution, studies about Jupiter may intersect with this field when considering the solar influences on planetary behavior.
3. **Instrumentation and Methods for Astrophysics (astro-ph.IM)**: This category involves detector and telescope design, as well as data analysis methods important for astrophysical research, which can include studies of Jupiter.
4. **Cosmology and Nongalactic Astrophysics (astro-ph.CO)**: This category addresses broader astrophysical phenomena and might tie into research themes regarding the formation and evolution of planets like Jupiter.
5. **Symplectic Geometry (math.SG)**: While more mathematical in nature, symplectic geometry can be relevant for modeling planetary dynamics, including those of Jupiter.
Here’s a complete look at the categories:
- **astro-ph.EP**: Earth and Planetary Astrophysics - Interplanetary medium, planetary physics, planetary astrobiology, extrasolar planets, comets, asteroids, meteorites. Structure and formation of the solar system.
- **astro-ph**: No specific name or description provided.
- **astro-ph.SR**: Solar and Stellar Astrophysics - White dwarfs, brown dwarfs, cataclysmic variables. Star formation and protostellar systems, stellar astrobiology, binary and multiple systems of stars, stellar evolution and structure, coronas. Central stars of planetary nebulae. Helioseismology, solar neutrinos, production and detection of gravitational radiation from stellar systems.
- **math.SG**: Symplectic Geometry - Hamiltonian systems, symplectic flows, classical integrable systems.
This gives a broader context for where Jupiter studies might fit in the academic landscape.
Query looks great. I asked about categories related to papers about Jupiter, and a good query was created, which fetched correct data, which led to a good answer.
what categories are there?
MATCH (c:Category) RETURN properties(c)
There are numerous categories provided within the data, which span a wide range of scientific and mathematical fields. Here's a comprehensive list of all the categories along with their respective descriptions:
1. **Adaptation and Self-Organizing Systems (nlin.AO)** - Adaptation, self-organizing systems, statistical physics, fluctuating systems, stochastic processes, interacting particle systems, machine learning.
... a lot more ...
32. **High Energy Physics - Theory (hep-th)** - Formal aspects of quantum field theory and string theory.
33. **Human-Computer Interaction (cs.HC)** - User interfaces, usability, and collaborative computing.Very good! I asked what categories there were, it did a search of category nodes, and it returned them all. The answer gave 33 categories. This is what COMPLETENESS looks like in an AI answer.
Can it filter?
what categories are there other than optics??
MATCH (c:Category) WHERE c.name <> 'Optics' RETURN properties(c)Yes it can! Can it do specific counts?
tell me about ten papers that are related to the category Physics
CALL db.index.fulltext.queryNodes('fulltext_search', 'Physics') YIELD node AS paper, score
RETURN properties(paper)
ORDER BY score DESC
LIMIT 10Yes, it can! It correctly set the limit to ten, and it will do three, five, one, seventeen, whatever.
However, in this instance, it incorrectly searched papers for the word “Physics” rather than finding the “Physics” node and then identifying related papers. It’s not perfect. It’s 0.1.0, the first release. It worked and ran, it is just suboptimal and might have missed things. Competency catches things like that.
What’s Next?
So, where do we go from here? Anywhere! This is version 0.1.0. Next is 0.2.0. After that is 0.3.0. I use Semantic Versioning for both KG construction and GraphRAG code, keeping each part independent. This way, I don’t feel like I have to get everything perfect in one shot. Learning and progress is gradual.
The real what’s next is simple: we make this more capable.
It is currently doing well for simple things. The KG is simple (0.1.0), and the GraphRAG is minimalistic and functional (0.1.0). We have options:
Do we improve the KG and set aside the GraphRAG work?
Do we keep the KG as is and use it as a baseline for AI research and development?
I can’t stand perpetual chasing and not getting anywhere. The second option is better for me. We have a working, usable, and useful Knowledge Graph. It isn’t blocking us. Let’s stay in the fun AI space, building out capabilities for GraphRAG. We can come back for KG 0.2.0 later.
Some of my ideas:
I want to be able to pass in more descriptive prompts, have them split into tasks, and then have the individual tasks worked on. In other words, instead of asking about papers about Jupiter, I want to say, “Find the most recent 20 papers about the moon, and then do two things: First, do a keyword analysis, and return the keyword counts to me. Second, do a network analysis of the author collaboration network and return the top authors to me based on Betweenness Centrality. Finally, write a three paragraph summary about the insights from these steps, and email it to me.”
We need to use reasoning models to identify the sequence of steps, and it’s not always a straight line. Mine will use DAGs, probably.
We need to pass them off to be worked on.
Stuff gets complicated.
I want the best Prompt to Cypher imaginable. This is fundamental and crucial.
I want flexibility but completeness in prompt answers. If I want it to tell me the answer in a summary, it should be able to do that. If I want it to tell me the answer in a Haiku, that is fun sometimes, too.
I want to collapse all the bottlenecks and make this as fast as possible. I want complete answers in less than eight seconds, the original rule of how long websites could take to load from the early 2000s.
So, there is a lot more to do, and we are not done. This blog from now on will prioritize AI Research and Engineering.
Merry Christmas and Happy Holidays!
The Artificial Life GraphRAG was brought online on Christmas Day, December 25, 2025! After bringing it online, I had fun playing with pictures for this article. I have two to show! I want these on a t-shirt.
This was the GraphRAG being brought online, on Christmas.
And here it is answering my questions about Jupiter, categories, and articles. Later, it will be able to do more! Can’t wait!
Please Support this Work!
I have written over 60 articles for this series. Each one takes about four hours of research, and several pages of writing and editing. Here are some ways you can support the blog!
Please subscribe if you have not. This motivates me like nothing else!
LET’S DO BUSINESS. Reach out to me if you need data or AI help! Happy to help! You can read about the partners we are currently looking for here.
BIGGEST HELP to the BLOG: Please consider upgrading if you are a subscriber. Thank you to all current paying subscribers for making this research and development possible! You can have Database access and minimal GraphRAG code if you want it.
Please buy my book to understand how I think about Natural Language Processing and Network Science combined.
Please reach out to me if interested in training related to Data Science, Data Engineering, Network Science, Knowledge Graphs, Artificial Intelligence, or anything else I write about. Feel free to message me on Substack or on LinkedIn!
Feel free to hang out in the comments and have a good time!
We have come so far since the very first day of the very first #100daysofnetworks. I love writing for this series. Thank you for being a part of it!












Congrats! 🎉 Love seeing GrooveSeeker coming to life — the first swag is a big milestone.
Wishing you and the team an amazing holiday season and a strong start to 2026!
GraphRAG bringing data vis to life is a game changer for exploratory analysis. The minimalist approach to building from scratch makes total sense, I've seen too many projects stall because they tried to optimize everything upfront instead of just getting something working first. That four-step breakdown (prompt to query, fetch, simplify, answer) is elegent dunno why more people don't embrace that clarity. The Jupiter query example really shows how much the quality of prompt-to-Cypher conversion matters for the whole pipeline downstream.