GraphMinds

8 May 2025·Also on Medium

This was my MSc thesis at the University of Birmingham, supervised by Professor Christopher Baber. The question was straightforward: can you get useful answers from a language model about sensitive documents without sending those documents to the cloud, and can you verify where each answer comes from?

GraphMinds extracts entities and relationships from unstructured text using sentence transformers and named entity recognition, then builds a knowledge graph in NetworkX. When a user asks a question, the system queries the graph first, retrieves the most relevant subgraph, and feeds that subgraph as grounding context to a locally-running LLM through Ollama.

Every answer includes a citation trail: which documents contributed, which entities and relationships were used, and how the graph was traversed. The PyVis visualisation lets you see the graph structure directly and judge whether the reasoning holds up. Transparency was a core design goal. If the system cannot find supporting evidence, it says so.

Nothing leaves the local machine. No API calls, no cloud processing, no telemetry. The full pipeline from document ingestion to graph construction to LLM inference runs offline. This matters for legal, medical, and proprietary contexts where data confidentiality is not negotiable.

Testing against plain RAG pipelines showed similar performance on straightforward factual recall. The difference appeared on multi-hop questions, where the answer depends on connecting information across several documents. The graph gives the system a structured map of how facts relate to each other. That structure is exactly what multi-hop reasoning demands.

The project convinced me that knowledge graphs and language models are complementary. Graphs are precise, verifiable, and efficient at structured relationships. Language models handle ambiguity and reason in natural language. Combining them produces something more trustworthy than either alone.