Imagine the entire sum of human knowledge being hoovered up by GenAI models.

Well, we’re nearly there.

To date, AI models like ChatGPT have been trained using vast amounts of text mostly sourced from the web. But that real-world data is running out.

What’s more, publishers and media outlets have started banning AI firms from using their intellectual property for training purposes. As a result, companies like OpenAI are having to broker deals to buy content — a slow, multi-million-dollar workaround.

Basically, the free lunch is over. Which is why AI firms are turning to ‘synthetic’ data to train their models instead. And as marketers, that’s a turning point we need to talk about.

What is synthetic data and how is it used?

Any information that has been generated artificially is synthetic data. When real-world data is exhausted, firms like OpenAI and Google will entirely rely on this kind of data to train each successive AI model. One model training the next one, training the next one, ad infinitum.

There are lots of potential problems with this.

The snake eats its own tail.

An overreliance on synthetic data can lead to something called ‘model collapse.’ This is where the output of AI systems becomes more unreliable and generic with every passing generation, eventually to the point when they start producing total nonsense.

In one example, the University of Oxford fed a model some sample text about the building of a medieval church. Each new generation of the model was trained on data created by the previous one. By the ninth generation, the model was talking about jackrabbits.

Here’s the initial text fed to the AI model:

And here’s what the output looked like by the ninth generation:

For marketers, model collapse isn’t something to be overly concerned about: if an AI tool wrote you a nonsensical social post, you wouldn’t publish it. The problems lie in the incremental changes that are hard to notice — the factual errors and inconsistencies that creep into the output of successive, seemingly well-functioning GenAI models.

What are the risks for marketers that rely on synthetic data?

Echo chamber content.

There’s a reason AI-generated content often feels sterile and generic. If you ask ChatGPT to write you a blog post called, say, Five Mistakes B2B Marketers Should Avoid, it’ll write something based on the average of its training data. It won’t be ‘thinking’ about the most interesting things to say; it’ll simply try to provide the most likely answers.

So while that post might be serviceable enough (and even sound pretty polished), it’ll also inevitably feature cliches, tired phrasing, and well-worn themes.

But now, consider those same systems being recursively trained on their own data. It’s easy to imagine how that would lead to further creative narrowing, where a particular subset of ideas, expressions, biases, and factual errors slowly proliferate — without anybody really noticing.

Within this closed-box system, marketing content would almost certainly have a heightened sense of sameness, increasingly devoid of creative spontaneity.      

Sowing seeds of doubt.

Does your content convince your customers? Or does it leave them feeling skeptical, questioning your authenticity, mistrusting your brand?

Increasingly, AI-generated content falls into that latter camp of doubt and negativity. For instance:

What does this all mean when synthetic data gets added to the marketing mix? It means campaigns may end up feeling even more ‘fake’ (especially when consumers realize how the sausage gets made), which in turn could erode brand trust for the long term.

Perception is everything.

Even if your content looks beautiful, what does it say to your audience if they know it’s been generated quickly, cheaply, synthetically?

In a recent talk at Nudgestock, ad exec Rory Sutherland made the point that the very act of creating an ad is actually where a lot of the value comes from, saying, “I think there’s a fundamental correlation between…the amount of time and effort that gets invested in the act of persuasion and how persuasive it is.”

In other words, when consumers can see that you’ve sweated a bit over the content you’ve created, it functions as a form of value signaling. It essentially says we care about what we do and we don’t take shortcuts.

It’s fair to say, any advert that’s been quickly whipped up by GenAI communicates the exact opposite sentiment.

Where is all this going?

Let’s be real for a second — AI-generated content isn’t going away. But as our industry becomes flooded with this kind of material, it also creates an opportunity for brands to stand out by choosing to swim against the tide.

To be clear, the use of synthetic data is complex and there’s still a lot to learn on this topic. As a B2B marketing agency, our teams have found GenAI helpful in assisting with areas like strategy, audience research, and competitor analysis.

But for the reasons mentioned above, our view is that actual content creation is better with a human behind the wheel. If you’re interested in applying our expertise to your next B2B campaign, let’s chat.