AI’s Biggest Risk: the Danger of Model Collapse

9

October

2024

No ratings yet.

The Model Collapse Theory highlights serious concerns about the future of generative AI and its reliance on high-quality human data. Below, the number of papers found by PubMed containing the word “delve” is shown. It does not take a background in statistics to note the large increases since 2023, the same year that ChatGPT gained popularity. Coincidentally, ChatGPT is known to use the word “delve” more frequently than humans.

The chart demonstrates that AI generated content has found its way into our everyday lives. The line between a response from ChatGPT and a piece written by a human is fading quickly. In 2023, an expert even estimated that 90% of online content could be generated by AI by 2025. This leads to an urging question: can we even distinguish AI generated from human created content anymore?

The crux of the issue lies in how generative AI models are trained. Generative AI models require vast datasets of high-quality human content to train on. Human content is more nuanced, creative, and reflective of the real world. However, as more AI generated content is produced, it becomes increasingly more difficult to keep AI generated content out of the datasets. Training generative AI models on AI generated data has been shown to decrease both the quality and diversity of AI generated content (Briesch et al., 2023). This results in a feedback loop where the AI recycles patterns without innovation, leading to stagnation. In short: using AI to generate content has polluted the very data sources needed to train future generative AI models.

In response, GenAI companies have been looking for solutions. This sparked a race to secure exclusive partnerships with organizations who can provide human created data. These partnerships may postpone the issue, but do not solve it. At some point, more data will be needed, and proper data sources will become increasingly difficult to find. The only remaining option will be to use the content that is available, which includes AI generated content.

Some experts warn that the influx of AI generated content will slow down generative AI advancement – or potentially even bring them to a halt. Without sufficient high-quality human data, we risk a future where genAI models produce content that lacks depth and creativity.

How do you assess the risk of Model Collapse? Is Model Collapse a real risk, or will a solution be found?

Sources: 

Briesch, M., Sobania, D., & Rothlauf, F. (2023, November 28). Large language models suffer from their own output: An analysis of the Self-Consuming Training Loop. arXiv.org. https://arxiv.org/abs/2311.16822

Banner: https://www.thedigitalspeaker.com/content/images/2023/06/Danger-of-AI-Model-Collapse-Futurist-Speaker.jpg

Chart: https://www.linkedin.com/posts/marnimolina_the-rise-of-delve-in-scientific-literature-activity-7186362840360869888-Sz16

Please rate this

Leave a Reply

Your email address will not be published. Required fields are marked *