The New World: Testing AI with AI

AI (or the promise of AI capabilities) as a feature in software has become “table stakes”. Meaning, it’s expected to…

By Scott Moore, June 20, 2025

AI (or the promise of AI capabilities) as a feature in software has become “table stakes”. Meaning, it’s expected to be somehow integrated into the software or it runs the risk of being called “legacy” – which seems to be considered a bad word in the IT industry. That is why it is so important that testing AI with AI exists.

Salesforce is no exception. They have woven artificial intelligence into their platform with tools like Einstein, which was introduced in 2016. Einstein is a collection of AI-driven features that goes across all of their cloud SaaS offerings. The promise is smarter decisions, predictive analysis, and better automated workflows.

The end goal for Salesforce is to integrate Artificial Superintelligence (ASI) into their product. Since its launch, Salesforce has steadily expanded Einstein’s capabilities. As for the question of ASI surpassing human intelligence across all domains—only time will tell. This brings a lot of questions and concerns for users and those who implement the software for their company.

AI Is a Moving Target

It is a great goal to take raw data from a customer and turn it into actionable insights and make things easier. The problem is that this can be a nightmare to testing professionals. Unlike the past, where there were predictable buttons and forms, AI doesn’t play by the same rule. It learns, adapts, and sometimes hallucinates (but very confidently). At the same time you have AI features in Salesforce, you also have AI features in the products that test Salesforce – like Tricentis Testim Salesforce. This is AI testing AI. Let’s see if we can understand this and not lose our collective minds.

Salesforce drops three big releases a year each packed with updates. Some features are now AI-driven, thanks to Einstein. For example, Einstein Prediction Builder crunches your data and spits out a score for things – like a customer who wants to discontinue your service. It sounds like a great metric, but the score isn’t set in stone. It shifts as the AI model retrains on fresh data or reacts to new inputs. Today’s “correct” prediction might not match tomorrow’s, even if nothing else changes.

Testing a traditional Salesforce workflow was fairly straightforward. Click a button, and then check if the page loads with the expected results. With AI, you might not be validating a single outcome, a dynamic range of possible outcomes. It gets worse. Many Salesforce implementations are customized to the point of insanity. They have unique workflows and all kinds of integrations that make it work for their particular needs. When an update rolls out, those customizations can break the AI functionality in ways that are not obvious. A churn prediction might stop working because a custom field got changed, or a chatbot might start spitting out gibberish. Not only is that a failed test, it can mean downtime, frustrated users, and the IT department doing a major scramble to fix it quickly.

Tricentis Testim Salesforce is built to handle these kinds of Salesforce’s quirks, using AI to adapt tests when the UI shifts—like during those frequent releases. It’s about letting AI figure out what broke and how to fix it. It’s about testing something that evolves with a tool that also evolves. It can be like trying to hit a moving target while riding a rollercoaster, but Testim for Salesforce can make it easier to do it.

Challenges of Testing AI with AI

Below are some of the dragons that await you in the quest for better software testing on AI-driven features:

Outputs That Won’t Sit Still: The answer may not be the same every time. Take Einstein’s recommendations—say it suggests emailing a client today. Tomorrow, with new data, it might say call them instead. Even if you can keep the test running with UI changes, how will the test logic determine what’s “right”, when “right” keeps changing? Pass/fail gets fuzzy fast.
Dynamic Everything: Salesforce Lightning is where a lot of the AI is at, and there are dynamic elements everywhere. Testim’s smart locators are great at keeping up when buttons move or IDs change. What happens when the AI feature itself tweaks the UI? What if there is a personalized popup alert?
Data Is A Wild Card: AI lives or dies by data. If your test data is off—too clean, too fake, or just not enough—Salesforce’s AI might choke or hallucinate. Testim can run the workflow, but you also need to be able to react to bad data. How do you know if your dataset is flaky?
AI on AI Confusion: Testim’s AI can tweak a locator when Salesforce updates. Salesforce AI might learn something new and shift outputs. When something breaks, is it Testim overcorrecting or Einstein? You have two entities talking to one another, and you may not have complete insight into both of these. How do you account for this in traditional test automation tools? It is MUCH more difficult, if not impossible.
What Is “Correct”?: Traditional testing tools can tell you if the web form was saved. However, the testing tools for AI-driven workflows must know if the prediction on the web page is reasonable, or if the chatbot sounds human enough. How can a testing tool judge whether a 75% churn risk makes sense for a specific business?

When approaching these kinds of issues, think conceptually. It’s less about finding a perfect solution and more about wrestling it into something manageable. It’s much easier to do this with an AI-enabled testing tool.

Taming the Beast

It’s not about cracking the code. It’s about creating a strategy that just makes simple sense. Here’s a practical way to approach it, based on what actually works in the trenches:

Stick to the Basics First: Find the non-negotiable stuff. Einstein might be fancy, but it’s got rules—like a score can’t be negative. Use Testim to automate those checks. Its AI can handle Salesforce’s UI dance while you make sure the core logic doesn’t collapse. It’s a lifeline when everything else feels slippery.
Split the Work: Don’t try to test everything at once. Run one set of Testim tests for the boring stuff—page loads, clicks, forms. Then tackle the AI parts separately—predictions, recommendations. You might need custom scripts (think Apex) to check if outputs fall in a sane range. It’s clunky, but it keeps you focused.
Get Ahead with Pre-Releases: Salesforce lets you peek at updates in sandbox environments before they hit. Testim can run early tests there, catching clashes between your customizations and the AI features. It won’t predict every data-driven quirk, but it’ll flag the big stuff—like a broken prediction model—before it’s live.
Fake It ‘Til You Make It: Data is the fuel, so build test datasets that mimic real life—some messy, some sparse, some overloaded. Testim can automate the workflows, but you’ll need to eyeball the results. Did the AI freak out on the edge case? That’s not Testim’s job to decide—it’s yours.
Watch It Over Time: One-and-done testing doesn’t cut it with AI. Use Testim to flag immediate breaks—like a UI glitch—but set up monitoring for the AI stuff. Track how predictions drift after a release. Pair it with logs or analytics, because AI keeps learning long after you sign off.
Keep Humans in Charge: Testim’s AI is a workhorse—let it handle the grunt work like fixing tests when Lightning updates. But don’t trust it to judge AI outputs. That’s where you come in. Know your business, know what matters, and check if the AI’s doing its job. It’s a team effort—machine and human.

This isn’t glamorous. Part of this is about automation, and part of it is about checking your gut. It is doable, and it can keep you from drowning in the chaos.

The Honest Truth

Testim’s great at keeping tests alive through Salesforce’s constant updates—those smart locators and self-healing tricks are great. Salesforce’s AI, meanwhile, is brilliant but slippery. As it continues to evolve, it will challenge traditional testing tools every time. It’s better to have a testing tool where AI is part of the product strategy to keep up. The real challenge isn’t the tools, it is the human and the process behind the tool. We’ve got to rethink what “good enough” means in terms of Quality when the target keeps moving.

Wrapping Up

Testing Salesforce’s AI features with Tricentis Testim Salesforce can be like wrestling a shadow—it’s there, but it’s hard to pin down. The promise of AI in CRM is huge, but the reality of testing it is messy, unpredictable, and humbling. By anchoring tests in business rules, splitting the workload, hitting pre-releases early, playing with data, watching trends, and staying hands-on, you can wrestle this beast into submission. It’s not about winning—it’s about not losing too badly. As Salesforce keeps pushing AI deeper into its platform, we’ll need to keep adapting. Tools like Testim are a start, but they’re not the whole answer. The rest? That’s on us to figure out, one test at a time.

Try out Testim Salesforce today with a free 14-day trial at this URL: https://app.testim.io/?utm_source=blog&utm_medium=organicsocial&utm_campaign=testim_testauto_testim.io-salesforce-blog-cta-_blog_ams_en_2025-05#/signup