Saturday, April 13, 2024
HomeSECURITYNeural networks train themselves, the crisis of artificial intelligence is near

Neural networks train themselves, the crisis of artificial intelligence is near


Neural networks train themselves, the crisis of artificial intelligence is near

How are employees of crowdsourcing services cunning and why is it bad for the development of AI?

Most people perceive the work of chat bots and AI-based neural networks as almost magic. Many of those who regularly interact with artificial intelligence have probably heard that for the functioning of such projects, they require long training on public data, seasoned with huge computing power.

However, few people know that for the continuous development of neural networks within the framework of the LLM-models are gradually introduced into them more and more new data, designed to significantly improve their performance. But here’s what almost no one really thinks about – where does this data come from?

Most of the routine work of transforming data for training neural networks is done by people. Moreover, no matter how prestigious and highly paid the very fact of interacting with neural networks sounds from the outside, in fact, employees in this area receive mere pennies for their monotonous work.

Crowdsourcing services by type Amazon Mechanical Turk offer employers to combine hundreds and thousands of human minds to quickly and efficiently solve problems that no machine can handle. Or, as in our case, to introduce a new data set into the neural network and its further development.

Researchers at the Federal Polytechnic School of Lausanne in Switzerland (EPFL) decided to hold interesting experiment. They hired 44 people on one of the popular crowdsourcing platforms to summarize the abstracts of 16 medical research papers. The goal of the task was to then enter this data into the neural network, teaching it new variations and ways to isolate the main idea of ​​such literature.

In parallel, the scientists prepared a special classifier to determine whether the workers themselves will perform the task or cheat by resorting to neural networks to significantly increase the speed of the task. In addition to analyzing the received texts, the researchers also recorded the keystrokes of employees in order to determine the fact of fraud for sure.

“We developed a very specific methodology that worked very well for detecting synthetic text in our scenario. While traditional methods try to detect synthetic text “in any context”, our approach is focused on detecting synthetic text in our specific scenario,” said Manuel Ribeiro, study co-author and EPFL PhD student.

As it turned out, from 33% to 46% of the received text passages were generated by a neural network, which, of course, is not a disaster, but demonstrates a certain negative trend. If large language models like the same GPT-4 from Open AI to train on their own data, in the future this may significantly reduce the quality of their work or slow down their development.

Big companies like OpenAI keep exactly how they train their language models a closely guarded secret, and it’s unlikely they use Mechanical Turk at all. However, many other smaller companies may well rely on the labor of employees to train their AI models. And if you do not immediately clearly prescribe the terms of reference for hired workers, and also do not set up tools to control their work (you won’t get far on trust alone), the value of the data obtained will be extremely doubtful.

“The responses generated by today’s AI models are usually pretty tasteless or trivial. They do not reflect the complexity and diversity of human creativity. Sometimes what we want to learn with crowdsourced data is exactly what humans are not good at,” explained Robert West, co-author of the article and assistant professor at the EPFL School of Computer Science and Communications.

As AI continues to improve, it is likely that crowdsourcing will change soon. EPFL researchers suggest that large language models in the future may replace some workers in performing specific tasks, however, paradoxically, human data may soon become much more valuable than ever.

Source link


Please enter your comment!
Please enter your name here

Most Popular