Disaster Risks in Chatbot Training
Photo by Getty Images
Revealing insights from multiple whistleblowers indicate that individuals hired to conduct high-quality conversations and testing for AI model training are engaging in fraudulent behavior by utilizing chatbots like ChatGPT. This alarming trend jeopardizes the future of artificial intelligence, possibly leading to a “breakdown” of sophisticated models, as reported by New Scientist.
Currently, many AI models rely on data sourced from the internet for training. However, as the demand for larger datasets grows, AI companies are increasingly turning to gig workers to interact with and evaluate AI, hoping to yield high-quality data that enhances the effectiveness of future large-scale language models (LLMs).
Typically, these gig workers are employed through third-party agencies and often receive low pay without full-time commitments. Alice*, a worker in this field, claims this precarious situation can drive them to shortcuts, such as using chatbots to fulfill tasks quickly, despite company policies against it.
“It’s very common. Every company I’ve worked for has stringent policies in place to catch violations, yet I doubt they can fully eliminate the issue,” Alice explains.
Alice adds she doesn’t feel guilty about employing ChatGPT for training tasks, stating that if instructed not to show obvious AI traits, it’s easy to evade detection. “Only the most careless users get caught,” she suggests. “If you’re aware of an AI’s qualities, you can instruct it to modify its output. So, what’s the verdict?”
Alice argues, “For companies seeking quality data, offering better contracts is essential. Instead, they hire struggling individuals, keep them temporarily, and abruptly release them at project completion without notice.”
Bob*, another employee, initially worked on a training platform known as Outlier, where he alleged misuse of AI for training purposes and later assumed a leadership role with responsibilities for catching others in similar practices.
“Management fluctuated between mild acceptance and outright prohibition,” Bob notes. Outlier employees are monitored via a tool called Hubstaff, which captures screenshots of their screens at random intervals to ensure task compliance. Bob reviews these screenshots for indications of AI use.
“People often have AI applications like ChatGPT visible on their taskbars, either open in another tab or minimized,” Bob observes.
Outlier, owned by Scale AI, did not respond to inquiries. Scale AI claims to provide services to major tech companies like Meta and Cisco, yet both have not responded to requests for comments from New Scientist. Bob reported involvement in a project for Google, which also declined to comment.
Carol*, who has worked across multiple platforms, explains that her use of AI began as a method to check for compliance with task guidelines. Violations could lead to project expulsion and loss of income.
“Initially, I feared losing my income source, but it became easier to complete tasks using LLMs,” shares Carol. “Many of my current projects involve scenario creation, where I deploy one LLM to formulate scenarios and another to generate corresponding files. I feel guilty, but it was about minimizing initial errors.”
“I’m concerned this practice could deteriorate AI quality. I thought training with model outputs diminishes their value,” Carol expresses.
According to Mark Lee, a researcher at the University of Birmingham, UK, their study indicates that AI models may degrade when recursively trained with AI-generated content, a phenomenon sometimes referred to as AI cannibalism or AI inbreeding.
“That’s the worst-case scenario, and it likely isn’t the norm,” Lee remarks. “Humans still represent a minority in training data; having around 10% human-generated input can mitigate these risks and prevent model collapse.”
However, Lee notes the negative impact of these workers’ misconduct, which can hinder business performance. “Instead of a catastrophic outcome, it reveals that AI struggles to replicate human-like tasks effectively, which is a significant concern,” he concludes.
*Names have been changed to protect privacy
Topics:
Source: www.newscientist.com


