6 generative AI hazards IT leaders should avoid

OpenAI’s recent announcement of custom ChatGPT versions make it easier for every organization to use generative AI in more ways, but sometimes it’s better not to. Two AI safety summits in as many weeks on both sides of the Atlantic raised questions about the potential dangers of AI, but neither the science fictional threats of killer robots nor the oddly specific guidelines on exactly which AI models are likely to be regulated will seem particularly helpful to organizations grappling with how to take advantage of the flood of generative AI tools now available.

Some of the most vocal complaints about generative AI have come from authors and artists unhappy at having their work used to train large language models (LLMs) without permission. Settling the months-long writers and actors strikes in Hollywood in early November required the studios to make concessions limiting the ways AI tools can be used to replace human writers and performers. But even businesses outside creative industries need to be careful about using generative AI, which can backfire in embarrassing ways and potentially expose them to legal action.

Many businesses cite copyright as an area of concern (and the FTC appears to agree); submitting a business plan that looks like it was copy pasted from your competition could be problematic. But that’s not the only problem you might run into.

Damage to reputation

Microsoft’s recent experience with the Guardian, who claimed the tech company caused “significant reputational damage” when its AI news system automatically generated an insensitive poll inviting readers to speculate about a woman’s death and inserted it in the middle of the news story, is a textbook example of what not to do with generative AI.

But this wasn’t the first time Bing’s AI news added dubious polls to sensitive news stories. The same AI tool has created other polls asking if refusing to kill a woman who was later shot dead was the right or wrong decision, whether human remains found in a national park were correctly identified, if people in an area where 50 houses had been lost to fire really needed to follow emergency evacuation advice, and whether readers might feel “hopeful and supportive” about the death of two children in a fire because of fundraising for the other child burn victims.

Ads in the AI-powered Bing Chat have also included links to malware, and Microsoft’s AI tools have suggested visitors to Ottawa eat at a food bank, spotlighted fake news from obscure websites about politicians including President Biden, and mixed up details of a news story so badly that it suggested an actress had assaulted a sports coach when actually it was the coach who was accused of mistreating a horse.

One difference from previous high-profile errors made by generative AI models like ChatGPT is that lawyers and medical professionals, for the most part, at least had chances to check results before proceeding. But these Guardian polls appear to have been published on Microsoft properties with millions of visitors by automated systems with no human approval required.

Microsoft called the poll an error and promised to investigate, but it already seems to clearly breach several of the company’s own principles of responsible AI usage, such as inform people that they’re interacting with an AI system, and guidelines for human-AI interaction. And the advice it offers Azure OpenAI customers cautions against producing “content on any topic” or using it in “scenarios where up-to-date, factually accurate information is crucial,” which presumably includes news sites.

Overuse of AI

More generally, the comprehensive transparency notes for Azure OpenAI helpfully warn that the service can produce inappropriate or offensive content as well as responses that are irrelevant, false, or misrepresent trusted data. They list several scenarios to avoid — political campaigns and highly sensitive events where use or misuse could be consequential to life opportunities or legal status — and others to be cautious about, such as high stakes areas in healthcare, education, finance and legal. But questions restricted to a specific domain are less likely to yield longer, problematic responses than open-ended, unconstrained ones.

Microsoft declined to identify any specific areas where it felt generative AI would be inappropriate, instead offering a list of areas where it says customers are finding success: creating content, summarizing or improving language, code generation, and semantic search. But a company spokesperson did say: “We’re in a world where our AI has become incredibly powerful, and it can do amazing things. However, it’s crucial to understand that this technology is a journey, with ample room for growth and development. This distinction is vital.”

Not all generative AI customers have got that message. Confusing and badly written content created by generative AI is already showing up in business contexts, with conference biographies, blog posts and slide decks that might sound impressive but make no sense being signed off by managers who should know better. There are increasing examples of professional writers and security educators who submit such content supplemented with AI images of, say, people with an impossible number of fingers. This kind of inane gibberish will probably become disturbingly common, and it’s incumbent on companies to be vigilant with reputations at stake.

Insensitive comment is just as inappropriate internally, too. Polls and quizzes liven up long meetings and team chats and it’s tempting to have generative AI create them based on what people have been talking about. That could go badly wrong if, for example, someone shares details of a family illness or losing a pet.

“Generative AI is typically not suited for contexts where empathy, moral judgment, and deep understanding of human nuances are crucial,” notes Saurabh Daga, associate project manager of disruptive tech at industry intelligence platform GlobalData. His list of sensitive areas is similar to Microsoft’s guidelines: “High-stakes decision-making, where errors could have significant legal, financial, or health-related consequences, might not be appropriate for AI.”

Until there’s more work done on multi-modal models, it’s important to be cautious about generative AI that mixes text and images in any scenario since the wrong caption can turn a perfectly acceptable picture into something objectionable, and image generating models are very prone to assume that all nurses are women and all CEOs are men.

“Generative AI in particular is amplifying issues that previously existed but were not whole-heartedly addressed,” warns Matt Baker, SVP AI strategy for Dell Technologies (which offers services to help customers build AI systems with Microsoft 365 Copilot or open access models like Llama2). “Take for instance processes and workflows where algorithmic bias could become a factor such as in HR and hiring. Organizations need to have an honest look at their hygiene, priorities, and data sensitivities to ensure they’re plugging GenAI tools into the areas where they get maximum reward and minimized risk.”

Assume AI is always right

Impressive as they are, generative AI tools are inherently probabilistic. That means they’ll often be wrong and the danger is that what they produce can be inaccurate, unfair or offensive – but phrased in such confident and convincing language that it slips through.

The key is not to expect a result you can use straight away and be alert to recognizing the ways generative AI can be usefully wrong. Treat it as a brainstorming discussion that stimulates new ideas rather than something that will produce the perfect idea for you fully baked.

That’s why Microsoft has adopted Copilot rather than autopilot for most of its generative AI tools. “It’s about putting humans in the loop and designing it in such a way that the human is always in control with a copilot that’s powerful and helping them with every task,” CEO Satya Nadella said at the Inspire conference this summer. Learning to experiment with prompts to get better results is a key part of adopting generative AI, so tools like Copilot Lab can help employees gain these skills.

Similarly, rather than attempting to automate processes, create workflows for your own generative AI tools that encourage staff to experiment and evaluate what the AI produces. Remember to account for what information the human reviewing the AI suggestions will have about the situation — and what incentive they have to vet the results and check any cited sources, rather than just save time by accepting the first option they’re given without making sure it’s accurate and appropriate.

Users need to understand the suggestions and decisions they accept from generative AI well enough to know what the consequences could be and justify them to someone else. “If your organization doesn’t explain AI-assisted decisions, it could face regulatory action, reputational damage and disengagement by the public,” warns the UK’s Information Commissioner’s Office.

Offering multiple alternatives every time and showing how to interpret suggestions can help, as well as using prompts that instruct an LLM to explain why it’s giving a particular response. And in addition to having generative AI cite the sources of key information, consider ways to highlight elements that are important to double check, like dates, statistics, policies, or precedents that are being relied on.

But ultimately, this is about building a culture where generative AI is seen as a useful tool that still needs to be verified, not a replacement for human creativity or judgement.

“Generative AI or any other form of AI should be used to augment human decision-making, not replace it in contexts where its limitations could cause harm,” Daga points out. “Human reviewers should be trained to critically assess AI output, not just accept it at face value.”

As well as a process that includes human review, and encourages experimentation and thorough evaluation of AI suggestions, guardrails need to be put in place as well to stop tasks from being fully automated when it’s not appropriate. “For instance, AI might generate company press briefings, but only a human editor can approve the sharing of content with selected journalists and publications,” he adds.

Generative AI can certainly make developers more productive, too, whether exploring a new code base, filling in boilerplate code, autocompleting functions, or generating unit tests. You can take advantage of that extra productivity but still decide code won’t be released into a production environment without human review.

Businesses are accountable for the consequences of their choices, and that includes deploying AI in inappropriate areas, says Andi Mann, global CTO and founder of Colorado-based consultancy Sageable. “The customer will not let you off the hook for a data breach just because, ‘It was our AI’s fault.’”

Hide the AI

It’s crucial to ensure responsible use of the system, whether that’s by employees or customers, and transparency is a big part of that. An embarrassing number of publications use AI-generated content that’s easy to spot because of its poor quality, but you should be clear about when even good-quality content is being produced by an AI system, whether it’s an internal meeting summary, marketing message, or chatbot response. Provide an ‘off-ramp’ for automated systems like chatbots that allow users to escalate their question to a human.

“Customers should have the option to opt out of interactions with generative AI, particularly in sensitive areas,” says Daga.

Assume AI can solve every problem

As generative AI usage increases in business, so does the awareness that people need to be using their own judgment on what the AI suggests. That’s what eight out of 10 IT staff said in last year’s State of DevOps Automation Report, and up to just over 90% in the 2023 State of DevOps Automation and AI study.

That caution is justified, says Mann, especially where domain-specific training data that can be used to generate predictable, desirable, and verifiable outputs is limited, as in IT operations because it’s prone to inaccurate results given insufficient training data.

“GenAI will be less meaningful for any use case dealing with novel problems and unknown causes with missing or undocumented knowledge,” he warned. “Training an LLM is impossible if undisclosed human tribal knowledge is your only potential input.”

He does see opportunities, though, to use GenAI as a sidekick. “It can be an advisor or active expert by training an engine to learn what ‘known good’ IT operations look like across defined disciplines and knowledge stores, and recognize known problems, diagnose known causes, identify known inefficiencies, and respond with known remediations,” he says. But while some IT problems that may seem new can be tackled with familiar processes and solutions, it won’t be clear in advance which those are.

“We know gen AI almost never says it doesn’t know something, but instead will throw out misleading, spurious, wrong, and even malicious results when you try to get it to solve ‘unknown unknowns,” says Mann.

Make more work for humans

Content produced by generative AI can be helpful, of course, but because it’s so easy to create, it can also end up making a lot more work for those who need to vet it and take action based on it.

Fiction magazines report receiving so many low-quality AI-written stories that it’s effectively a denial of service attack. Publishers have been experimenting with AI to copy edit manuscripts, but writers and editors alike report that suggested edits are frequently unhelpful, irrelevant, or just plain wrong — running into problems with technical terms, house style, complex sentence structures, and words used in correct but unusual ways, for starters. Be honest when you assess what areas generative AI is actually able to contribute to.

A key part of adopting any AI tool is having a process for dealing with errors beyond correcting them individually each time. Don’t assume generative AI learns from its mistakes, or that it’ll give you the same result every time. If that matters, you need to use prompt engineering and filters to constrain results in the most important areas.

Also be prepared for generative AI use in areas and processes you hadn’t planned for, where it may be less accurate. Again, transparency is key. Staff need to know the company policy on when they can use generative AI and how to disclose they’re using it. You may also want to include generative AI usage in audits and eDiscovery the same way you do with enterprise chat systems.

Organizations may need to start setting these policies with more urgency. Out of a thousand US businesses surveyed by TECHnalysis Research in spring 2023, 88% were already using generative AI, but only 7% of those early adopters had formal policies.

And in a recent IDC study on AI opportunity, a over a quarter of business leaders said lack of AI governance and risk management was a challenge for implementing and scaling the technology. Initial concerns have been about the confidentiality of enterprise data, but reputational damage should also be a priority. In addition, over half called a lack of skilled workers their biggest barrier, which usually refers to developers and data engineers. But less technical business users will also need the skills to carefully frame questions they put to an AI tool, and assess and verify the results.

Application Management, Application Performance Management, Artificial Intelligence, CIO, Generative AI, IT Leadership, IT Management, Risk Management
Read More from This Article: 6 generative AI hazards IT leaders should avoid
Source: News