Large Language Model as a Source for Ontology

Roman Suzi
4 min readFeb 16, 2023

Internet is abuzz about ChatGPT and what it means for software development. How to get most benefit out of it? A forecast.

Photo by Jodie Cook on Unsplash

Now that everyone has played with the advanced NLP (natural language processing) artificial intelligence, there are concerns that it will make certain kinds of work obsolete. Here is my view on how advanced language models can start fixing software industry problems.

Writing software is ultimately about explaining domain knowledge to the computer while keeping it accessible to the human users. Software can be seen as an ontology, which is enhanced by rules. For any software project larger than a small set of scripts it usually makes sense to discover an ontology in one way or another. Ontology is supposed to be quite static, and it finds its representation in the database schema, interface, API, etc. It can be represented by classes, properties, and other relations between entities and value objects (be it OOP representation or special-purpose ontology languages like OWL). It’s the necessary part for any high quality software. It’s declarative.

What can’t be done statically can then be implemented in some Turing-complete language or rule-based system. This part is what is usually understood as software, this is what software developers are in a great hurry to put into code, oftentimes without any regard to the declarative ontology part.

Of course it is possible to write rules as a knowledge base (e.g., SWRL language), but it does not come naturally to most of the programmers and for a good reason. Rules and procedures can be more naturally encoded in a sequential or a functional way rather than as a set of equations and constraints.

If we look at ChatGPT we can see that it’s a giant ontology, where a lot of human knowledge is encoded and ready to be represented in a nearly perfect form. To illustrate the point, I asked ChatGPT to come up with an ontology for a CMS (content management system). And the bot came up with the following list and the YAML for the ontology:

    Assets: binary files and metadata
Collection: set of assets with shared metadata
Library: a collection of collections with shared metadata
Workspace: a user's personal collection
User: individuals with permissions to access and modify assets
Permission: defines user's access and modifications rights
Metadata: descriptive information about assets and collections
Version control: tracking of changes and history of assets
Workflow: steps and permissions required to publish assets
Search and retrieval: finding assets based on metadata or keywords.
Prefix: cms: <http://www.example.com/cms#>

Ontology: cms

Class: cms:Asset
SubClassOf: owl:Thing
ObjectProperties:
cms:belongsTo cms:Collection
DataProperties:
cms:hasMetadata rdf:PlainLiteral

Class: cms:Collection
SubClassOf: owl:Thing
ObjectProperties:
cms:belongsTo cms:Library
DataProperties:
cms:hasMetadata rdf:PlainLiteral

Class: cms:Library
SubClassOf: owl:Thing
ObjectProperties:
cms:hasCollection cms:Collection
DataProperties:
cms:hasMetadata rdf:PlainLiteral

Class: cms:Workspace
SubClassOf: cms:Collection
ObjectProperties:
cms:belongsTo cms:User

Class: cms:User
SubClassOf: owl:Thing
ObjectProperties:
cms:hasWorkspace cms:Workspace
DataProperties:
cms:hasPermission cms:Permission

Class: cms:Permission
SubClassOf: owl:Thing
DataProperties:
cms:allowsAccess cms:Access
cms:allowsModification cms:Modification

Class: cms:Metadata
SubClassOf: owl:Thing
DataProperties:
cms:describes cms:Asset
cms:describes cms:Collection
cms:describes cms:Library

Class: cms:VersionControl
SubClassOf: owl:Thing
ObjectProperties:
cms:tracks cms:Asset

Class: cms:Workflow
SubClassOf: owl:Thing
ObjectProperties:
cms:appliesTo cms:Asset
DataProperties:
cms:requiresPermission cms:Permission
cms:requiresStep cms:Step

Class: cms:SearchAndRetrieval
SubClassOf: owl:Thing
ObjectProperties:
cms:searches cms:Asset
DataProperties:
cms:usesMetadata cms:Metadata
cms:usesKeywords rdf:PlainLiteral

Of course, it’s not yet literally usable, and more details would need to be added to make the backbone of a CMS, but this can already be used for a conceptual design.

My medium range forecast for the prevailing usage of NLP in software engineering is different from what many think as a doom scenario for the industry. Right now large-language models can be seen as a concentrate of fuzzy knowledge, which can be expanded to some syntactic form.

However, code generation approaches have failed throughout history. And it will fail this time as well because it is in the nature of software to factor things out, not factor them in. There may be little sense in expanding some algorithm hundreds of times instead of writing it once and using it. Current hype and joy for syntactic expansion will be followed by a quality leap of compressing the knowledge into proven “libraries”, which will be building blocks for the next generation AI models and human developers to play with. This is the natural way for technical systems to evolve.

Most probably there will come programming languages and systems, which will be specifically designed for human-AI collaboration. For three decades or so programming languages were primarily designed for humans. Next decade programming languages for hybrid use will appear. And of course not only programming languages but AI-powered solutions may boost utilization of large amounts of data on the logical level (as contracted with statistical ones in the neural networks). Bridging data will become mostly automatic. This is probably the true paradigm shift just behind the corner.

It wouldn’t be complete without mentioning special-purpose AIs, which complement language models. It’s possible that AI interchange will emerge from the need to seamlessly connect different types of systems. This is how AGI (artificial general intelligence) will gradually be achieved. These efforts will accelerate, just as the development of computer hardware has accelerated, allowing for the development of finer and more optimized mechanisms and processors.

This is a fascinating future for people to live in. In a couple of decades parallel developments in neurocomputing and quantum computations may lead to even greater breakthroughs which philosophers will need to contemplate about.

PS. One of the paragraphs has been rewritten by ChatGPT. Can you guess which one?

--

--