Building Knowledge Graphs with an AI Wingman

Author: Stratos Kontopoulos

Editorial Contributions: Benjamin Paul Rode, Bilal Ben Mahria, and Garima Gujral

LLMs as Useful Assistants

Through three practical SPARQL examples, this tutorial shows how LLMs can support knowledge graph (KG) development as useful assistants, while the human KG engineer stays responsible for validation, meaning, and final decisions.

Newsletter Outline

In this newsletter we are going to address how to use LLMs as support for humans, rather than the means to completely knowledge graphs from scratch:

Why humans remain as the final authority
How to setup the tutorial
How to work through the 3 example queries
What additional resources are found in the GitHub repository
Why keep human expertise at the center

Humans as the Final Authority

In a previous AI Infused article by Bilal Ben Mahria, we discussed that contrary to a recent trend that dictates that LLMs can play the role of seasoned ontology and KG engineers, LLMs in fact are not capable of building complete, consistent, and logically correct knowledge structures on their own. As the article argues, they can substantially accelerate ontology drafting making the whole process more accessible. However, LLMs should be used merely as assistants, while the presence of human experts in the loop remains necessary to ensure correctness, reasoning, and quality in the resulting ontology and/or knowledge graph.

This tutorial takes things further by showing what this collaboration can look like in practice: a set of SPARQL queries where the LLM helps enrich a KG step by step. In this workflow, the LLM acts as a well-versed “wingman”, serving as a mapper, candidate generator, and disambiguator inside a human-designed process. The human remains the final authority: the LLM proposes, but the KG engineer validates and commits.

Tutorial Setup

We will rely on Graphwise GraphDB as the RDF triplestore on which the examples in this tutorial will be run. GraphDB features a very elegant integration with LLMs through SPARQL magic predicates, which allow SPARQL queries to communicate with GPT models and combine model output with KG data. Specifically, GraphDB exposes gpt:ask, gpt:list, and gpt:table as SPARQL magic predicates, and also provides helper functions such as helper:tupleAggr and helper:rdf for passing structured local graph context into the prompt. It also allows setting the temperature as the final argument to these GPT calls.

Our reference dataset will be the Ontotext Star Wars RDF KG which is compact but still rich: it includes characters, species, films, planets, spaceships, and vehicles from the Star Wars universe.

🔧 In order to configure GraphDB appropriately, one must obtain an API key and set the graphdb.llm.api-key value in the tool’s configuration.

💡 An alternative triple store with similar capabilities is AllegroGraph, while Stardog also has strong LLM / KG functionality through Voicebox. The latter, however, is mainly a conversational layer and offers LLM-assisted querying over a KG but not magic predicates inside arbitrary SPARQL patterns.

Breaking Down 3 Queries

In the rest of this tutorial, we present a practical showcase of human + LLM collaborative KG engineering through a set of SPARQL queries over the Ontotext Star Wars RDF dataset.

The full queries are included below so that the tutorial can be more easily followed, while an accompanying GitHub repository provides the runnable files and outputs for local experimentation.

1. Lexical Enrichment

Scope

To add aliases / synonyms to entities in the KG.

Description

This query takes entities that already exist in the graph and asks the LLM to suggest alternative names for them. The goal is not to change the meaning of the entities in the graph, but to make it richer lexically. In practice, this helps with search, matching, and usability, because the same entity may be referred to in different ways.

How does the query work?

First, the query retrieves relevant Star Wars entities, such as humans, droids, films, planets, starships, and species.
Then it builds a prompt asking the LLM to suggest up to three aliases for each entity.
The gpt:list predicate sends this prompt to the LLM and returns one alias per row.
The query filters out empty or useless results, such as NONE or aliases identical to the original label.
Finally, it inserts the accepted alias suggestions into a separate named graph for review.

SPARQL query

PREFIX voc: <https://swapi.co/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX gpt: <http://www.ontotext.com/gpt/>
PREFIX ex: <http://www.example.com/>

INSERT {
    GRAPH ex:q1-synonyms-output {
        ?entity skos:altLabel ?alias .
    }
}

WHERE {
    {
        SELECT DISTINCT ?entity ?label WHERE {
            ?entity a ?type ;
                rdfs:label ?label .
            VALUES ?type { # Modify the set of entity types as needed
                voc:Character
                voc:Human
                voc:Droid
                voc:Planet
                voc:Film
                voc:Starship
                voc:Vehicle
                voc:Species
            }

            FILTER(LANG(?label) = "") # Make things simple: Keep only the "default" labels
        }
    }

    BIND(
        CONCAT(
                  "For the Star Wars entity '", STR(?label), "', suggest up to 3 short alternative labels or aliases. ",

                  "Only return plausible Star Wars aliases, lexical variants, or shortened forms. ",

                  "If there is no good alias, return NONE. ",

                  "Return one alias per line. No numbering. No explanation."

              ) AS ?prompt
    )

    ?alias gpt:list (?prompt 0.2)

    FILTER(UCASE(STR(?alias)) != "NONE")

    FILTER(LCASE(STR(?alias)) != LCASE(STR(?label))) # Make sure the proposed alias does not match the existing label

    FILTER NOT EXISTS { # Make sure the proposed alias has not already been added

        ?entity skos:altLabel ?alias
    }
}

Sample Outputs

Darth Vader was assigned aliases “Lord Vader”, “The Dark Lord”, and “Vader”, while Luke Skywalker was assigned aliases “Luke”, “Commander Skywalker”, and “Red Five”.

Remarks

For convenience, this and the next queries insert their respective results into separate named graphs, e.g. ex:q1-synonyms-output etc.
In all the queries, we apply a low LLM temperature since we want low-variance outputs.
Token limit note: Depending on the LLM provider and model used, the query may hit token limits if too many entities are processed at once.
In that case, reduce the number of entities with a LIMIT, restrict the VALUES ?type list, or run the enrichment in smaller batches.
The same workflow still applies; only the amount of data sent to the LLM in each run needs to be smaller.

2. Controlled Semantic Enrichment

Scope

To have the LLM choose links from a human-curated target vocabulary.

Description

This query asks the LLM to help attach higher-level meaning to entities that already exist in the graph. Instead of inventing totally new facts, it chooses from a controlled set of concepts that a human has already decided are relevant. So the model is acting more like a classifier or interpreter: it looks at what is already known about an entity and suggests which predefined themes or categories fit best.

How does the query work?

First, the query retrieves the films and their opening crawl text.
Then it builds a prompt asking the LLM to choose themes from a fixed human-defined list.
The gpt:table predicate returns the selected theme ID and a short reason.
The query matches the returned theme ID against the controlled list of allowed themes.
Finally, it inserts the selected theme links and the LLM’s reason into a separate named graph.

SPARQL query

PREFIX voc: <https://swapi.co/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX gpt: <http://www.ontotext.com/gpt/>
PREFIX ex: <http://www.example.com/>

INSERT {
    GRAPH ex:q2-semantic-enrichment-output {
        ?film ex:theme ?theme .
        << ?film ex:theme ?theme >> ex:reason ?reason .
        ?theme a ex:Theme ; 
            rdfs:label ?themeLabel .
    }
}

WHERE {
    {
        SELECT ?film ?title ?filmContext WHERE {
            ?film a voc:Film ;
                rdfs:label ?title .

            OPTIONAL {
                ?film voc:openingCrawl ?filmContext .
            }

            FILTER(LANG(?title) = "") # Make things simple: Keep only the "default" titles
        }
    }

    BIND(
        CONCAT(
                  "You are enriching a Star Wars film knowledge graph. For the film '", STR(?title), "', choose up to 2 theme IDs. ",

                  "Allowed theme IDs only: EmpireVsRebellion, Redemption, Mentorship, Destiny, Betrayal, PoliticalIntrigue, FoundFamily, Survival. ",

                  "Available film context: ", COALESCE(STR(?filmContext), ""), " ",

                  "Return up to 2 rows with exactly 2 columns: theme-ID, short reason. ",

                  "If nothing fits, return one row: NONE, no-fit. "

              ) AS ?prompt
    )

    (?pickedId ?reason) gpt:table (?prompt 0.2)

    FILTER(UCASE(STR(?pickedId)) != "NONE")

    VALUES (?theme ?themeId ?themeLabel) {

        (ex:EmpireVsRebellion 	"EmpireVsRebellion" 	"Empire vs Rebellion")

        (ex:Redemption         	"Redemption"         	"Redemption")

        (ex:Mentorship         	"Mentorship"         	"Mentorship")

        (ex:Destiny            	"Destiny"            	"Destiny")

        (ex:Betrayal           	"Betrayal"           	"Betrayal")

        (ex:PoliticalIntrigue  	"PoliticalIntrigue"  	"Political Intrigue")

        (ex:FoundFamily        	"FoundFamily"        	"Found Family")

        (ex:Survival           	"Survival"           	"Survival")

    }
    FILTER(STR(?pickedId) = ?themeId)
}

Sample Outputs

The table below contains the results for “Star Wars Episode IV: A New Hope”, which the LLM categorized under the “Survival” and “Empire vs Rebellion” themes:

Remarks

A few new constructs are introduced in the INSERT part of the above query: Class ex:Theme and predicates ex:theme and ex:reason.
To keep things simple, we do not have a corresponding ontology where these (and potentially other constructs) are formally specified. But the ideal approach would indeed be to formally specify any custom construct in an ontology.
To keep things simple, the reasoning behind the LLM’s choices is attached as an edge property (ex:reason) over predicate ex:theme via RDF-star.

3. Cross-graph Reconciliation

Scope

To have the LLM choose the appropriate Wikidata entity to link our custom entities to.

Description

This query is about linking together entities that appear in different datasets or graphs. The LLM looks at a set of candidates that are likely to match, considers the context, and picks the best one. So here the LLM is not searching the whole world on its own; instead, it is helping make the final judgment among plausible options.

How does the query work?

First, the query retrieves Princess Leia from the local Star Wars graph.
Then it collects a small set of candidate Wikidata entities from the local Wikidata slice.
The candidates include the correct entity and similar-but-wrong entities, such as the bikini and action figure.
The gpt:table predicate asks the LLM to choose the best match.
Finally, the query inserts the proposed Wikidata link and the LLM’s reason into a separate named graph.

SPARQL query

PREFIX voc: <https://swapi.co/vocabulary/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <http://schema.org/>
PREFIX gpt: <http://www.ontotext.com/gpt/>
PREFIX helper: <http://www.ontotext.com/helper/>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX ex: <http://www.example.com/>

INSERT {
    GRAPH ex:q3-reconciliation-output {
        ?source voc:wikidataLink ?match .
        << ?source voc:wikidataLink ?match >> ex:reason ?reason .
    }
}

WHERE {
    VALUES ?source {
        <https://swapi.co/resource/human/5> # Princess Leia

    }
    ?source a voc:Human ;
        rdfs:label ?sourceLabel .
    FILTER(LANG(?sourceLabel) = "")

    # Restrict to entities that are not already linked to Wikidata

    # Princess Leia is not already linked to any Wikidata entity, but this

    # filter could be applied for other entities in the Star Wars graph

    #FILTER NOT EXISTS {
    #    ?source voc:wikidataLink ?_ .
    #}

    {
        SELECT (helper:tupleAggr(?row) AS ?candidates) WHERE {
            GRAPH ex:wikidata-slice {
                VALUES ?cand {

                    wd:Q51797		# Princess Leia

                    wd:Q15136385	# Princess Leia's bikini

                    wd:Q125307040	# Princess Leia Organa miniature action figure

                    wd:Q51746		# Luke Skywalker

                    wd:Q51802		# Han Solo

                }

                ?cand rdfs:label ?candLabel .
                FILTER(LANG(?candLabel) = "en")
                OPTIONAL {
                    ?cand schema:description ?candDesc .
                    FILTER(LANG(?candDesc) = "en")
                }

                BIND(
                    helper:tuple(
                                    STR(?cand),
                                    STR(?candLabel),
                                    COALESCE(STR(?candDesc), "")
                                ) AS ?row
                )
            }
        }
    }

    BIND(
        CONCAT(

                  "We are reconciling a Star Wars knowledge graph entity with Wikidata. ",

                  "The local entity label is: ", STR(?sourceLabel), ". ",

                  "Candidate rows are given as: Wikidata IRI, label, description. ",

                  "Choose the best candidate only if it clearly refers to the same entity. ",

                  "Return exactly one row with 3 columns: Wikidata IRI, label, short reason. ",

                  "If none is a good match, return: NONE, NONE, no-match."

              ) AS ?prompt
    )

    (?matchIri ?matchLabel ?reason) gpt:table (?prompt ?candidates 0.2)

    FILTER(STR(?matchIri) != "NONE")

    BIND(IRI(STR(?matchIri)) AS ?match)

}

Sample Outputs

In the above example, the LLM correctly links Princess Leia’s entry <https://swapi.co/resource/human/5> to the correct Wikidata entity wd:Q51797 and not the other candidate entities, such as Princess Leia's bikini (wd:Q15136385) or Princess Leia's action figure (wd:Q125307040), along with the justification that wd:Q51797 is “clearly the main Star Wars character Leia Organa”.

Remarks

For this example, we downloaded a small slice of Star Wars-related entities from Wikidata and imported it as a separate named graph ex:wikidata-slice.
Alternatively, the same functionality could be achieved through SPARQL federated queries over the Wikidata SPARQL endpoint.

Additional Resources in the Github Repository

An accompanying GitHub repository serves as a practical companion to the tutorial and contains all the relevant resources:

The Star Wars RDF KG used as the main input dataset;
A small Wikidata slice with selected Star Wars-related entities, used for the reconciliation example;
The three SPARQL Update queries presented in the tutorial, showcasing the different levels of LLM-assisted KG development;
The resulting output named graphs produced by the three queries;
Documentation on how to set it all up and run the examples locally.

Keeping Human Expertise at the Center

In this tutorial, we showed how an LLM can support KG development in different levels of complexity: first by suggesting alternative labels, then by helping with controlled semantic enrichment, and finally by assisting in cross-graph reconciliation.

The main point is not that the LLM builds the KG by itself, but that it can act as a helpful wingman inside a workflow designed and controlled by humans. This way, we can benefit from the speed and flexibility of LLMs, while still keeping human expertise at the center of the process.

Building Knowledge Graphs with an AI Wingman

LLMs as Useful Assistants

Newsletter Outline

Humans as the Final Authority

Tutorial Setup

Breaking Down 3 Queries

1. Lexical Enrichment

Scope

Description

How does the query work?

SPARQL query

Sample Outputs

Remarks

2. Controlled Semantic Enrichment

Scope

Description

How does the query work?

SPARQL query

Sample Outputs

Remarks

3. Cross-graph Reconciliation

Scope

Description

How does the query work?

SPARQL query

Sample Outputs

Remarks

Additional Resources in the Github Repository

Keeping Human Expertise at the Center

Keep Reading

AI Infused