Skip to content
Tiatra, LLCTiatra, LLC
Tiatra, LLC
Information Technology Solutions for Washington, DC Government Agencies
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact
 
  • Home
  • About Us
  • Services
    • IT Engineering and Support
    • Software Development
    • Information Assurance and Testing
    • Project and Program Management
  • Clients & Partners
  • Careers
  • News
  • Contact

Inferencing holds the clues to AI puzzles

Inferencing has emerged as among the most exciting aspects of generative AI large language models (LLMs).

A quick explainer: In AI inferencing, organizations take a LLM that is pretrained to recognize relationships in large datasets and generate new content based on input, such as text or images. Crunching mathematical calculations, the model then makes predictions based on what it has learned during training.

Inferencing crunches millions or even billions of data points, requiring a lot of computational horsepower. As with many data-hungry workloads, the instinct is to offload LLM applications into a public cloud, whose strengths include speedy time-to-market and scalability.

Yet the calculus may not be so simple when one considers the costs to operate there as well as the fact that GenAI systems sometimes produce outputs that even data engineers, data scientists, and other data-obsessed individuals struggle to understand.

Inferencing and… Sherlock Holmes???

Data-obsessed individuals such as Sherlock Holmes knew full well the importance of inferencing in making predictions, or in his case, solving mysteries.

Holmes, the detective populating the pages of Sir Arthur Conan Doyle’s 19th-century detective novels, knew well the importance of data for inferencing, as he said: “It is a capital mistake to theorize before one has data.” Without data, Holmes’ argument proceeds, one can twist facts to suit their theories, rather than use theories to suit facts.

Just as Holmes gathers clues, parses evidence, and presents deductions he believes are logical, inferencing uses data to make predictions that power critical applications, including chatbots, image recognition, and recommendation engines.

To understand how inferencing works in the real world, consider recommendation engines. As people frequent e-commerce or streaming platforms, the AI models track the interactions, “learning” what people prefer to purchase or watch. The engines use this information to recommend content based on users’ preference history.

An LLM is only as strong as its inferencing capabilities. Ultimately, it takes a combination of the trained model and new inputs working in near real-time to make decisions or predictions. Again—AI inferencing is like Holmes because it uses its data magnifying glass to detect patterns and insights—the clues—hidden in datasets.

As practiced at solving mysteries as Holmes was, he often relied on a faithful sleuthing sidekick, Dr. Watson. Similarly, organizations may benefit from help refining their inferencing outputs with context-specific information.

One such assistant—or Dr. Watson—comes in the form of retrieval-augmented generation (RAG), a technique for improving the accuracy of LLMs’ inferencing using corporate datasets, such as product specifications.

Inferencing funneled through RAG must be efficient, scalable, and optimized to make GenAI applications useful. This inferencing and RAG combination also helps curb inaccurate information, as well as biases and other inconsistencies that can prevent correct predictions. Just as Holmes and Dr. Watson piece together clues that may solve the mystery underlying the data they collected.

Cost-effective GenAI, on premises

Of course, here’s something that may not be mysterious for IT leaders: building, training, and augmenting AI stacks can consume large chunks of budget.

Because LLMs consume significant computational resources as model parameters expand, consideration of where to allocate GenAI workloads is paramount.

With the potential to incur high compute, storage, and data transfer fees running LLMs in a public cloud, the corporate datacenter has emerged as a sound option for controlling costs.

It turns out LLM inferencing with RAG running open-source models on-premises can be 38% to 75% more cost-effective as compared to the public cloud, according to new research1 from Enterprise Strategy Group commissioned by Dell Technologies. The percentage varies as the size of the model and the number of users grows.

Cost concerns aren’t the only reason to conduct inferencing on premises. IT leaders understand that controlling their sensitive IP is critical. Thus, the ability to run a model held closely in one’s datacenter is an attractive value proposition for organizations for whom bringing AI to their data is key.

AI factories power next-gen LLMs

Many GenAI systems require significant compute and storage, as well as chips and hardware accelerators primed to handle AI workloads.

Servers equipped with multiple GPUs to accommodate parallel processing techniques that support large-scale inferencing form the core of emerging AI factories, which includes end-to-end solutions tailored to handle organizations’ unique requirements for AI solutions.

Orchestrating the right balance of platforms and tools requires an ecosystem of trusted partners. Dell Technologies is working closely with NVIDIA, Meta, HuggingFace, and others to provide solutions, tools, and validated reference designs that span compute, storage, and networking gear, as well as client devices.

True, sometimes the conclusions GenAI models arrive at remain mysterious. But IT leaders shouldn’t have to pretend to be Sherlock Holmes to figure out how to run them cost-effectively while delivering the desired outcomes.

Learn more about Dell Generative AI.

1 Inferencing on-premises with Dell Technologies can be 75% more cost-effective than public clouds, Enterprise Strategy Group, April 2024.

Artificial Intelligence


Read More from This Article: Inferencing holds the clues to AI puzzles
Source: News

Category: NewsApril 9, 2024
Tags: art

Post navigation

PreviousPrevious post:Atos staves off bankruptcy, casts wider net for refinancingNextNext post:How Ipsos has digitally adapted to changing business needs

Related posts

CDO and CAIO roles might have a built-in expiration date
May 9, 2025
What CIOs can do to convert AI hype into tangible business outcomes
May 9, 2025
IT Procurement Trends Every CIO Should Watch in 2025
May 9, 2025
‘서둘러 짠 코드가 빚으로 돌아올 때’··· 기술 부채 해결 팁 6가지
May 9, 2025
2025 CIO 현황 보고서 발표··· “CIO, 전략적 AI 조율가로 부상”
May 9, 2025
독일 IT 사용자 협회, EU 집행위에 브로드컴 민원 제기··· “심각한 경쟁 위반”
May 9, 2025
Recent Posts
  • CDO and CAIO roles might have a built-in expiration date
  • What CIOs can do to convert AI hype into tangible business outcomes
  • IT Procurement Trends Every CIO Should Watch in 2025
  • ‘서둘러 짠 코드가 빚으로 돌아올 때’··· 기술 부채 해결 팁 6가지
  • 2025 CIO 현황 보고서 발표··· “CIO, 전략적 AI 조율가로 부상”
Recent Comments
    Archives
    • May 2025
    • April 2025
    • March 2025
    • February 2025
    • January 2025
    • December 2024
    • November 2024
    • October 2024
    • September 2024
    • August 2024
    • July 2024
    • June 2024
    • May 2024
    • April 2024
    • March 2024
    • February 2024
    • January 2024
    • December 2023
    • November 2023
    • October 2023
    • September 2023
    • August 2023
    • July 2023
    • June 2023
    • May 2023
    • April 2023
    • March 2023
    • February 2023
    • January 2023
    • December 2022
    • November 2022
    • October 2022
    • September 2022
    • August 2022
    • July 2022
    • June 2022
    • May 2022
    • April 2022
    • March 2022
    • February 2022
    • January 2022
    • December 2021
    • November 2021
    • October 2021
    • September 2021
    • August 2021
    • July 2021
    • June 2021
    • May 2021
    • April 2021
    • March 2021
    • February 2021
    • January 2021
    • December 2020
    • November 2020
    • October 2020
    • September 2020
    • August 2020
    • July 2020
    • June 2020
    • May 2020
    • April 2020
    • January 2020
    • December 2019
    • November 2019
    • October 2019
    • September 2019
    • August 2019
    • July 2019
    • June 2019
    • May 2019
    • April 2019
    • March 2019
    • February 2019
    • January 2019
    • December 2018
    • November 2018
    • October 2018
    • September 2018
    • August 2018
    • July 2018
    • June 2018
    • May 2018
    • April 2018
    • March 2018
    • February 2018
    • January 2018
    • December 2017
    • November 2017
    • October 2017
    • September 2017
    • August 2017
    • July 2017
    • June 2017
    • May 2017
    • April 2017
    • March 2017
    • February 2017
    • January 2017
    Categories
    • News
    Meta
    • Log in
    • Entries feed
    • Comments feed
    • WordPress.org
    Tiatra LLC.

    Tiatra, LLC, based in the Washington, DC metropolitan area, proudly serves federal government agencies, organizations that work with the government and other commercial businesses and organizations. Tiatra specializes in a broad range of information technology (IT) development and management services incorporating solid engineering, attention to client needs, and meeting or exceeding any security parameters required. Our small yet innovative company is structured with a full complement of the necessary technical experts, working with hands-on management, to provide a high level of service and competitive pricing for your systems and engineering requirements.

    Find us on:

    FacebookTwitterLinkedin

    Submitclear

    Tiatra, LLC
    Copyright 2016. All rights reserved.