The Register warns of an “AI model collapse” — a spiral in which models trained on AI-generated data degrade over time, losing fidelity and diversity. Its imagery of “polluted” data and models “eating their own tails” echoes the mythic ouroboros—but is the crisis as existential as suggested?
To be sure, legitimate scholarly work has identified two clear stages of degradation: an early loss of rare (tail) data – the original date, then a flattening into incoherent babble. Case studies reach alarming conclusions: training on successive generations of synthetic text can irreversibly warp underlying distributions — much like playing a game of Chinese Whispers with nonsense at the end.
Yet the picture isn’t uniformly dire. Recent analysis highlights that collapsing into gibberish requires extremely artificial conditions—such as excluding human-generated data entirely between rounds . In realistic scenarios—where fresh, high-quality human content keeps flowing in—model collapse poses less of a systemic threat than a performance nuisance.
The Register cites Apple research claiming flawed test design undermines findings. This highlights a deeper question: are we facing inevitable decline, or merely operational missteps? Some argue that labelling AI-generated content and excluding it from future training (via watermarks or provenance tags) may guard against collapse—but widespread adoption remains uncertain, and indeed this would be almost impossible to enforce. As LLM and AI develop it is our opinion that they will become more adept at discerning the quality of content – AI, Human or Hybrid – and help improve the output quality.
A value-arithmetic question more than a technical emergency
What’s often underreported is the positive offset AI brings — from medical diagnostics to climate modelling. The real question: do the benefits outweigh the risks of training drift? Illia Shumailov’s research warns us of accelerated entropy in AI data pools — but does that mean collapse is lurking, or just that models become less incisive on edge cases?
Think of it like this: an incumbent AI might stay “fit for purpose” for mainstream use (chat, summarisation), yet begin failing spectacularly at niche topics. It’s a divergence—not necessarily a freefall.
As content creators – What should we do?
As marketers, writers and content creators it’s our duty to ensure the quality of our content is good and performs well. What should be do ensure our use of AI aligns with this?
1. Mix human and synthetic data wisely. Guardrails matter. Diverse, curated human data buffers collapse — but also raises questions about what count as “clean” data in an age where human output itself may be AI-influenced. See the stats below: Nearly 74% of new web pages created in April 2025 contained some AI-generated content!
2. Audit and provenance. Watermarking, metadata, provenance tracking: these middle-ground strategies could help but demand coordination across platforms and jurisdictions — not just voluntary compliance and may even be impossible to enforce.
3. Embrace domain-specific models. Rather than endlessly scaling general LLMs, targeted models in medicine, education or science—with curated, human-vetted corpuses—offer “cleaner” applications.
4. Keep public horizons grounded. Sensationalist headlines—like “AI polluting the internet forever”—risk overreaction. What we need is steady, transparent oversight, not moral panic. We must define which collapses matter, and at what stage: weak collapse at the fringes could be negligible; strong collapse at scale urgently demands action.
The Register raises an essential alarm: unchecked self-feeding models risk degeneration. But framing it as inevitable doom obscures the nuance. With thoughtful data governance, layered provenance, and human curation, collapse isn’t destiny—it’s a manageable hazard. As AI matures, our stewardship counts. Not all apocalypse warnings come true—but some serve their purpose by waking us up.
📊 How much of the internet is AI-generated?
The rise of generative AI hasn’t just been noticeable—it’s seismic. Here are the most credible data from recent studies:
- Nearly three-quarters (74%) of new web pages created in April 2025 contained some AI-generated content. Of 900,000 pages analysed, 2.5% were entirely AI‑written, and 71.7% blended human and machine output. Data from Technewsworld.com
- A 2025 paper estimates 30–40% of active web page text may now originate from AI sources, using markers common in generative models to support their conclusions. Data from arxiv.org
- Historical estimates paint a broader trend: by 2023, around 57% of web text was AI‑generated or translated by AI, though this includes large-scale machine translations . Data from Forbes
- On LinkedIn, over 50% of long-form English posts now bear hallmarks of AI assistance—driven by built-in platform tools. Form Wired.com
- Meanwhile, content farms—websites that mass‑produce low-grade AI‑generated posts—are pumping out hundreds of articles daily, often with minimal oversight. (from arxiv.org)
- In academia and scholarly publishing: in 2023, 17–18% of new CS papers and 1% of scholarly articles were likely written with LLM assistance. In visual media, it’s estimated that 34 million AI images are created every day, totalling over 15 billion generated images since mid‑2023 . (Data Wikipedia.org)
What this means for model collapse
When three-quarters of new content is AI‑generated, and a large proportion of all active pages bear synthetic markers, the internet increasingly feeds itself. This creates a high risk of models being trained on their own outputs—potentially leading to the “pollution” or “collapse” The Register describes.
Yet it’s also a call to action: without effective governance—such as watermarking, provenance tagging, and continued human curation—the cycle not only erodes quality, but intensifies model degradation from AI feeding itself.
In short: the data show we’re well into the self-reinforcing loop. The question now is whether we can step out before the spiral accelerates.
Follow Media-M for more insights into AI trends, model showdowns, and the future of generative tech.



Leave a comment