Have we reached the end of ‘too expensive’ for enterprise software?

Generative artificial intelligence (genAI) and in particular large language models (LLMs) are changing the way companies develop and deliver software. What began with chatbots and simple automation tools is developing into something far more powerful – AI systems that are deeply integrated into software architectures and influence everything from backend processes to user interfaces. An overview.

The chatbot wave: A short-term trend

Companies are currently focusing on developing chatbots and customized GPTs for various problems. These AI-based tools are particularly useful in two areas: making internal knowledge accessible and automating customer service. Chatbots are used to build response systems that give employees quick access to extensive internal knowledge bases, breaking down information silos.

While useful, these tools offer diminishing value due to a lack of innovation or differentiation. Finally, chatbots are often inappropriate user interfaces due to a lack of knowledge about better alternatives for solving certain problems.

The future will be characterized by more in-depth AI capabilities that are seamlessly woven into software products without being apparent to end users.

GenAI as ubiquitous technology

In the coming years, AI will evolve from an explicit, opaque tool with direct user interaction to a seamlessly integrated component in the feature set. GenAI will enable functions such as dynamic content creation, intelligent decision-making and real-time personalization without users having to interact with them directly. This will fundamentally change both UI design and the way software is used. Instead of manually entering specific parameters, users will increasingly be able to describe their requirements in natural language.

A striking example of this can already be seen in tools such as Adobe Photoshop. The “Generative Fill” function no longer requires manual adjustment of multiple parameters. Instead, users can simply describe what they want to fill a selected area of the image. This trend towards natural language input will spread across applications, making the UX more intuitive and less constrained by traditional UI elements.

The challenge in the future will not be scarcity, but abundance: identifying and prioritizing the most promising opportunities.

The commodity effect of LLMs over specialized ML models

One of the most notable transformations generative AI has brought to IT is the democratization of AI capabilities. Before LLMs and diffusion models, organizations had to invest a significant amount of time, effort, and resources into developing custom machine-learning models to solve difficult problems. These required specialized roles and teams to collect domain-specific data, prepare features, label data, retrain and manage the entire lifecycle of a model.

LLMs are now changing the way companies approach problems that are difficult or impossible to solve algorithmically, although the term “language” in Large Language Models is misleading. These autoregressive models can ultimately process anything that can be easily broken down into tokens: image, video, sound and even proteins. Companies can enrich these versatile tools with their own data using the RAG (retrieval-augmented generation) architecture. This makes their wide range of capabilities usable.

In many cases, this eliminates the need for specialized teams, extensive data labeling, and complex machine-learning pipelines. The extensive pre-trained knowledge of the LLMs enables them to effectively process and interpret even unstructured data.

An important aspect of this democratization is the availability of LLMs via easy-to-use APIs. Today, almost every developer knows how to work with API-based services, which makes integrating these models into existing software ecosystems seamless. This allows companies to benefit from powerful models without having to worry about the underlying infrastructure. Alternatively, several models can be operated on-premises if there are specific security or data protection requirements. However, this comes at the cost of some of the advantages offered by the leading frontier models.

Take, for example, an app for recording and managing travel expenses. Traditionally, such an application might have used a specially trained ML model to classify uploaded receipts into accounting categories, such as DATEV. This required dedicated infrastructure and ideally a full MLOps pipeline (for model training, deployment and monitoring) to manage data collection, training and model updates.

Today, such an ML model can be easily replaced by an LLM that uses its world knowledge in conjunction with a good prompt for document categorization. The multimodal capabilities of LLMs also eliminate the need for optical character recognition (OCR) in many cases, significantly simplifying the technology stack. Does the data from the receipts also need to include net and gross prices or tax rates? An LLM can do that too.

AI-powered features that were previously impossible

GenAI enables a variety of features that were previously too complex, too expensive, or completely out of reach for most organizations because they required investments in customized ML solutions or complex algorithms. Let’s look at some specific examples.

Mood- and context-based search: Beyond keywords

Vibe-based search represents a significant advance over traditional keyword-based search systems.

It allows users to express their intent in natural language, capturing not only specific terms but also the full context and “vibe” of their query.

For example:

Traditional keyword search: “best restaurants in Berlin”

Sentiment- and context-based search: “I am a discerning connoisseur and love wine bars that also serve food, preferably with regional ingredients. Recommend restaurants in Berlin Mitte and Kreuzberg. No dogmatic natural wine bars, please.”

In the case of sentiment- and context-based search, an LLM can understand and process the following:

The self-description as a “discerning connoisseur”
A preference for wine bars that also offer food
A desire for regional ingredients
Specific neighborhood preferences (Mitte and Kreuzberg)
A distinction between ordinary wine bars and “dogmatic natural wine bars”

This level of nuance and contextual understanding enables the search function to deliver highly personalized and relevant results, rather than just matching keywords.

Implementing sentiment- and context-based search can significantly improve the user experience in a variety of applications:

Internal knowledge bases: Employees can use natural language queries to find information that describes their specific situation or need.
E-commerce platforms: Customers can describe products in their own words, even if they don’t know the exact terminology.
Customer service systems: Users can describe their issues in detail. The system then offers them more precise solutions or forwards them to the appropriate support staff.
Content management systems: Content editors can search for assets or content using descriptive language without relying on extensive tagging or metadata.

Intelligent data and content analysis

Sentiment analysis

Let’s look at a practical example: an internal system allows employees to post short status messages about their work. A manager wants to assess the general mood of the team during a specific week. In the past, implementing sentiment analysis of these posts with a customized ML model would have been challenging. With LLMs, this complexity is reduced to a simple API call.

The result does not even have to be output in human-readable language. It can be provided as structured JSON, which the system processes to display matching icons or graphics. Alternatively, the LLM could simply output emojis to represent the moods. Of course, such a feature would only be implemented with the consent of the employees.

Gaining insights from complex data

Another example that illustrates the power of LLMs in analyzing complex data is intelligent alarm management for cooling systems.

Traditionally, these systems have focused on:

A graphical alarm dashboard with real-time data and alerts
Complex, filterable tabular representations of time series data

These features are useful but often require significant human interpretation to yield meaningful insights. This is where LLMs can extend the system’s capabilities by converting raw data into actionable insights on a zero-shot basis, without the need for specialized machine learning models, namely:

Automatic reporting: LLMs can analyze time series data and generate detailed reports in natural language. These can highlight trends, anomalies, and key performance indicators that are valuable to both technicians and managers. For example, a report summarizing last week’s alarms, identifying recurring problems, and suggesting areas for improvement.
In-depth analysis: LLMs can go beyond simple data presentation to identify and explain complex patterns in the data. For example, they can identify alarm sequences that indicate major system problems – insights that might be overlooked in a traditional tabular view or chart.
Predictive insights: By analyzing historical data, LLMs can make predictions about future system states. This enables proactive maintenance and helps prevent potential failures.
Structured outputs: In addition to reports in natural language, LLMs can also output structured data (such as JSON). This makes it possible to create dynamic, graphical user interfaces that visually represent complex information.
Natural language queries: Engineers can ask the system questions in natural language, such as “Which devices are likely to switch to failover mode in the coming weeks?” and immediately receive relevant answers and visualizations. This significantly lowers the barriers to entry for data evaluation and interpretation. This functionality is now also available from OpenAI via a real-time API.

The multimodal black box: Writing, speaking, seeing and hearing

Multimodality greatly expands the capabilities of LLMs. Models that can process text, images, sound and speech enable complex feature combinations. One example of this would be an application that helps users process complex visual content and prepare it in text or speech.

The range of possible use cases is enormous: a video pan across a bookshelf fills a database with the recognized book titles; unfamiliar animals that appear in the surveillance video of the chicken coop are identified; a Scotswoman speaks street names into the navigation system of her rental car in Germany.

Technical restrictions and solutions

LLMs have certain technical limitations. One of the most significant is the context window – the amount of text (more precisely, the number of tokens) that a language model can process in a single pass.

Most LLMs have a limited context window, typically ranging from a few thousand to tens of thousands of tokens. For example, GPT-4’s context window is 128,000 tokens, while Gemini 1.5 Pro can process up to 2,000,000 tokens. While this may seem considerable, it can quickly become a bottleneck when dealing with input sets such as books or long videos.

Fortunately, there are several strategies for getting around this limitation:

Chunking (segmentation) and summarization: Large documents are split into smaller, or segments that fit into the context window. Each segment is processed separately and the results are merged afterwards.
Retrieval-augmented generation (RAG): Instead of relying solely on the model’s (extremely broad) knowledge, relevant information is retrieved from a separate data source and incorporated into the prompt.
Domain adaptation: Combining careful prompt engineering with domain-specific knowledge bases allows subject matter expertise without limiting the model’s versatility.
Sliding window technique: A sliding window can be used to analyze long text sequences, such as time series data or long documents. The model retains some context as it moves through the entire document.
Multi-stage reasoning: Complex problems are broken down into a series of smaller steps. Each step uses the LLM within its context window limit, with the results of previous steps informing the subsequent ones.
Hybrid approaches: Traditional information retrieval methods such as TF-IDF and BM25 can pre-filter relevant text passages. This significantly reduces the amount of data for subsequent LLM analysis, thus increasing the efficiency of the overall system.

GenAI as a standard component in enterprise software

Companies need to recognize generative AI for what it is: a general-purpose technology that touches everything. It will become part of the standard software development stack, as well as an integral enabler of new or existing features. Ensuring the future viability of your software development requires not only acquiring AI tools for software development but also preparing infrastructure, design patterns and operations for the growing influence of AI.

As this happens, the role of software architects, developers, and product designers will also evolve. They will need to develop new skills and strategies for designing AI features, handling non-deterministic outputs, and integrating seamlessly with various enterprise systems. Soft skills and collaboration between technical and non-technical roles will become more important than ever, as pure hard skills become cheaper and more automatable.

Robert Glaser is Head of Data & AI at INNOQ. With roots in software engineering and a passion for creating user-friendly web applications, he now guides companies through the AI landscape, helping them develop strategies and products for challenging technical problems. Fascinated by practical uses of generative AI in software, he hosts the podcast “AI und jetzt,” discussing AI’s potential across industries. Robert bridges tech and business, advocating user-centric digitization.

Read More from This Article: Have we reached the end of ‘too expensive’ for enterprise software?
Source: News