When I began researching large language models (LLMs), my goal wasn’t to build a product. It was to understand whether enterprises could adopt AI responsibly — without losing control over cost, governance or transparency.
Like many architects experimenting with LLMs, the deeper I looked, the more their limitations surfaced in enterprise environments. Compute costs often rose faster than business value. Latency challenges undermined real-time responsiveness. And explainability gaps made compliance and audit assurance difficult to sustain.
These weren’t edge cases. They were systemic signals that the enterprise stack needed a different architectural foundation for AI — one shaped by the same principles we apply to reliability, risk management and observability in other strategic systems.
The search for modularity
This led me to explore modular approaches emerging across the industry, including semantic-layer architectures that combine small language models (SLMs) with retrieval-augmented generation (RAG). Rather than expecting one massive model to understand and govern everything, this model distributes intelligence across smaller, focused components. Each can reason over version-controlled, authoritative data and exchange results through structured governance layers.
Through independent architectural modeling and analysis, I found that this approach doesn’t eliminate complexity — it reframes it. Accountability becomes part of the architecture, not an afterthought.
The challenge with bigger models
One theme became clear early in my research: many assume that scaling AI means scaling model size. But in practice, the gap between model capability and operational reality grows wider when a single model is responsible for every function.
Industry examples and technical evaluations consistently point to three pressure points:
- Cost: Bigger models drive infrastructure decisions that can’t scale sustainably across domains. Even well-funded organizations are now pausing chatbot deployments until responsible foundations are in place.
- Performance: Large models strain latency budgets. When every operation must traverse billions of parameters in the cloud, user trust erodes — especially in high-volume systems.
- Governance: Auditing an opaque, centralized model is difficult enough once; it becomes unmanageable when dozens of workflows depend on it.
Across these observations, one conclusion stands out:
The problem isn’t the intelligence — it’s the architecture around it.
LLMs are remarkable, but they are not inherently aligned with enterprise control frameworks. Without a way to govern the reasoning and retrieval pathways, organizations place themselves at risk of unpredictable outputs — and unpredictable headlines.
Understanding SLMs and RAG
The modular approach I explored is built on two ideas: small language models and retrieval-augmented generation.
SLMs focus on specific domains rather than being trained to handle everything. Because they are compact and specialized, they can run on more common infrastructure and offer predictable performance. Instead of forcing one model to understand every topic in the enterprise, SLMs stay close to the context they are responsible for.
In practice, the shift to SLMs significantly reduces infrastructure requirements — enterprises report being able to train on just a few GPUs (thousands of dollars) compared to the multi-million-dollar GPU farms typically needed for LLMs.
RAG complements this by grounding model outputs in trusted information sources.
When an agent responds to a query, it retrieves relevant policies, documents or records first — and uses that data to shape the result. This makes reasoning more transparent and helps ensure decisions reflect the most current knowledge. In one industry study, adding RAG improved answer accuracy by approximately 5 percentage points.
Together, SLMs and RAG form a system where intelligence is both efficient and explainable. The model contributes language understanding, while retrieval ensures accuracy and alignment with business rules.
It’s an approach that favors control and clarity over brute-force scale — exactly what large organizations need when AI decisions must be defended, not just delivered.
A modular path forward
Distributed intelligence allows enterprises to scale differently: horizontally instead of vertically. Each new capability becomes a new component — not a new burden on the entire system.
At the heart of this approach is what I call a semantic layer: a coordination surface where AI agents reason only over the business context and data sources assigned to them. This layer defines three critical elements:
- What information an agent can access
- How its decisions are validated
- When it should escalate or defer to humans
In this design, smaller language models are used where focus matters more than size. A customer-service summary agent doesn’t need to know about compliance exceptions. And a risk-scoring agent doesn’t need product marketing copy.
Each is grounded in the data that actually governs the decision it makes:
- Product documentation for a support agent
- Regulatory rules for a compliance agent
- Internal policies for a risk evaluator
When an agent reaches a boundary condition or uncertainty threshold, it doesn’t guess; it hands the decision to the next appropriate component through that semantic layer.
This makes failure behavior predictable, not mysterious. Growth becomes a structural property: New requirements add new agents, not new megabytes to a monolith. Capabilities improve through local learning, not global retraining. It is an approach aligned with how enterprises already scale technology: with discrete responsibility, controlled expansion and traceability of change.
This direction aligns with industry reporting, including InfoQ’s 2025 Architecture & Design Trends Report, which highlights SLMs and RAG as emerging enterprise technologies.
Governance and clarity as architectural priorities
When AI takes on decision-making responsibility, understanding how those decisions were made becomes essential. Traditional software makes logic visible in code. Large models do not.
In modular designs, accountability is built into the system:
- Reasoning is grounded in retrieved, verifiable information.
- Disagreements or uncertainty prompt escalation — not silent guessing.
- Observability comes from clear signals: retrieval freshness, decision confidence, exception events and override activity.
This doesn’t produce perfection. But it does produce clarity — and clarity allows intelligence to grow responsibly, capability by capability.
The opportunity before us
More than anything, modular AI feels familiar. Not like a risky leap, but like the next evolution of enterprise systems. Progress isn’t defined by a single breakthrough moment. It emerges gradually as agents sharpen their expertise and as retrieval bases improve.
Stakeholders see value earlier. Adaptation becomes manageable. And intelligence can be woven into workflows without destabilizing them. In this sense, modular AI shifts the story from disruption to continuity. Innovation aligns with control.
Looking ahead
The direction is early but promising. Semantic-layer models could let organizations scale AI without surrendering oversight, while keeping adaptation aligned with business strategy. As models grow more specialized, the central question will increasingly become: How do we integrate intelligence into the systems we already trust?
AI will not remain a sidecar capability. It will become part of the architecture itself — observable, governable and improvable. And whether adoption moves cautiously or accelerates, a modular foundation ensures every new step strengthens transparency rather than stretching it.
That balance — ambition guided by structure — is what makes this approach worth exploring today. Not because it solves every challenge, but because it creates a path where intelligence can mature responsibly, one well-defined decision surface at a time.
Author’s note: This implementation is based on independent technical research and does not reflect the architecture of any specific organization.
This article is published as part of the Foundry Expert Contributor Network.
Want to join?
Read More from This Article: Why modular AI is emerging as the next enterprise architecture standard
Source: News

