The Missing Piece of AI Agents: Knowledge Infrastructure

Introduction

Every day we wake up to a new development in the field of artificial intelligence. Most recently, NVIDIA CEO Jensen Huang made headlines with his statement that “OpenClaw reached the level of adoption Linux achieved in 30 years within just a few weeks.” This statement, highlighted in this article, clearly reflects the reality: progress in the AI world is moving at an exponential pace.

Across the organizations I’ve worked with, there is a similar sense of urgency: not falling behind. Every company is trying to answer the same questions — how can we make our processes more efficient, and which tasks can we automate? From MCP to UCP, every new protocol creates excitement in the industry. The goal is clear: enable existing systems with AI capabilities and allow agents to interact with those systems directly.

However, this raises a fundamental question: how well do we actually understand our existing systems? Do we truly know the structure of the data we own or what the code running in our systems actually does? Or do we rely on the tacit knowledge accumulated by domain experts over many years? If we ourselves do not fully understand this infrastructure, how can we expect an AI agent to understand it?

In this article, we will explore these questions. How can we enrich and structure the data we already have? How can we make our systems more accessible to AI agents? How can we extract and systematize the domain knowledge that currently lives in the minds of experts? And ultimately, how can we build systems that are both AI-compatible and truly efficient?

Why Are Organizations Acting So Quickly?

Agentic AI offers organizations an unprecedented level of automation. Tasks that seemed nearly impossible to automate just a few years ago can now be executed with a few lines of prompts. Yet these tasks are still performed by people today. They require time, labor, and salaries, and organizations bear significant operational costs to keep these processes running.

The motivation for companies is straightforward: increase efficiency and generate profit. If certain processes can be performed faster, cheaper, and more reliably by AI systems, organizations naturally want to take advantage of this opportunity.

This explains the urgency. Companies do not want to miss the enormous efficiency gains promised by AI, so they are trying to integrate AI into their processes as quickly as possible. However, many organizations overlook an important reality: the knowledge embedded within long-running systems often exists only in the minds of domain experts.

Outside of these experts, very few people truly understand how systems operate, what the data actually represents, or how business processes really function. In such an environment, we expect an AI agent—whose knowledge is limited to the information provided within its context window—to generate meaningful value from these systems.

Why Existing Systems Fall Short

The root cause is simple: most enterprise systems were designed with humans in mind.

These systems are typically developed by a specific team and managed by humans over many years. Over time, the people who originally built the system leave, new people join, systems evolve, and new requirements emerge. However, during this process, the underlying knowledge rarely becomes embedded in the system itself. Instead, it remains tied to the individuals who operate it. Eventually, processes continue to be executed even though no one fully remembers why they exist.

Consider the following scenario: imagine placing a very capable junior developer in an environment where domain knowledge exists only within the minds of experts. Without sufficient context or documentation, the junior developer will attempt to produce results using limited information. However, the quality and efficiency of the output will rarely match that of someone who truly understands the domain.

AI agents face exactly the same challenge. Much of the domain knowledge does not live within the system—it resides in people’s minds. In such an environment, expecting an agent to produce reliable and efficient outcomes is unrealistic. The necessary knowledge infrastructure simply does not exist. Therefore, systems must be designed not only for human interaction but also to be readable and understandable by machines.

Human Centric Approach

Foundations of Knowledge Infrastructure

Knowledge infrastructure is the structured layer that defines what data, processes, and code components exist within an organization, what they represent, and how they relate to one another. It sits above the technical systems and acts as a semantic map that allows both humans and machines to understand how those systems function.

In many organizations, the foundations of such work already exist. Data governance teams attempt to define the meaning, usage, and relationships of data. A similar approach can also be applied to the codebase: code blocks, their dependencies, and their usage scenarios can be systematically cataloged.

For example, when a database table is created, the technical schema alone is not sufficient. The purpose of the table, the business processes it supports, and its semantic relationships with other tables should also be documented. Similarly, code blocks should not merely function correctly; they should also be documented as part of the organization’s knowledge infrastructure.

By doing this, organizations can build reusable knowledge and code catalogs. Developers—or even systems themselves—can retrieve the components they need directly from these catalogs. Such an approach benefits not only humans but also enables AI agents to better understand systems and operate more effectively.

How Can We Adapt Existing Systems?

Transforming legacy systems into knowledge-aware infrastructures is not a one-time project; it is a gradual transformation. Several complementary approaches can be applied to achieve this.

Automated Documentation

The fastest way to close the knowledge gap in existing systems is to automate documentation. Codebases can be scanned using static analysis tools to extract information about functions, dependencies, and call relationships. Database schemas can reveal table relationships, column meanings, and usage patterns. AI models can also help transform raw code and schemas into readable documentation. However, automatically generated documentation must always be validated by humans, which leads to the next approach.

Human in the Loop

Every output generated by agents should be reviewed and validated by domain experts. This is not only a quality assurance step but also a learning loop. Corrections and feedback from experts can be fed back into the system, improving prompts, fixing mismatches, and enriching the knowledge base. Over time, the agent will require fewer corrections, though human oversight will always remain necessary for critical decisions.

Semantic Layer

For agents, context is everything. Only the most relevant information should be included in the context window, as this directly affects output quality. Encoder-based models can generate semantic embeddings that identify the most relevant pieces of information for a given query. This semantic retrieval layer acts as a filter between raw data and the agent, reducing noise and increasing the probability of producing accurate results.

Knowledge Graphs

While the semantic layer helps identify the most relevant pieces of information, knowledge graphs make the relationships between those pieces of information explicit. When the entities within an organization—such as customers, products, processes, and systems—are represented within a knowledge graph, the system evolves from a simple data storage platform into a structure that represents organizational knowledge.

Knowledge graphs model entities and the relationships between them in a graph structure that machines can interpret. In such a graph, concepts like customer, order, product, and support ticket are represented as nodes, while the relationships between them are represented as edges.

This structure allows an AI agent to understand not only individual pieces of data but also the context in which they exist. For example, an agent that understands how a customer complaint relates to a product, a business process, and the responsible team can generate far more consistent and comprehensive solutions.

In this sense, knowledge graphs transform operational data into a semantic knowledge network that agents can navigate and reason over. Rather than relying on isolated data points, agents gain access to structured relationships that provide meaningful context for decision-making.

Governance Standards

For all these approaches to remain sustainable, governance standards are required. Rules must define how data is described, how code is documented, how ontologies are updated, and how the semantic layer is maintained. Without these standards, knowledge infrastructure quickly becomes outdated and unreliable. Many organizations already apply governance frameworks for data—such as data ownership, quality metrics, and change management. These frameworks should be extended to cover code and business processes as well. The goal is to make knowledge infrastructure a continuously evolving system rather than a one-time initiative.

Knowledge Infrastructure

Conclusion

Starting from scratch is often tempting. In personal projects, I frequently delete everything and rebuild when I dislike the result. In enterprise environments, however, this luxury rarely exists. Corporate systems contain years of accumulated knowledge, live customers, and operational dependencies where even minor disruptions can have significant costs.

If we aim to achieve real efficiency through AI, the first step is not adopting new tools but strengthening what we already have. We must uncover hidden knowledge within systems, document it, enrich it with semantic layers, and connect everything through sustainable governance standards.

The success of AI agents depends not only on the quality of the models but also on the quality of the knowledge infrastructure they operate on.

Building that foundation is our responsibility.

Thank you for reading.

Introduction#

Why Are Organizations Acting So Quickly?#

Why Existing Systems Fall Short#

Foundations of Knowledge Infrastructure#

How Can We Adapt Existing Systems?#

Automated Documentation#

Human in the Loop#

Semantic Layer#

Knowledge Graphs#

Governance Standards#

Conclusion#