The Dark Side of ChatGPT: Dubious Labor

1

October

2023

No ratings yet.

ChatGPT is a large language model developed by OpenAI. It has enjoyed major success since its launch in November 2022. It achieved 100 million monthly users in merely 2 months, making it the fastest-growing consumer application in history (Hu, 2023). But, how did it get so good? GPT-3, ChatGPT’s underlying large language model, already exhibited a very good understanding of human language at an early stage. A major requirement in the training process is to feed it as many texts as possible because the model needs to extract patterns from it in order to ‘understand’ it, or at least exhibit a form of understanding. This is possible thanks to the enormous amount of open data on the internet. You are probably aware, however, that much content on the internet is not particularly well thought-out and sometimes even downright factually wrong, morally wrong, or biased. So, aside from having a linguistic understanding, the model should also follow instructions on its behavior in order to prevent it from talking like many morally dubious texts on the internet. 

So, how did OpenAI realize this? Well, they took a leaf out of the playbook of social media companies. Experienced players like Meta, X, and Google have been battling against “harmful” content for many years now. Since human intervention cannot nearly cover all the content that gets uploaded on their enormous platforms, they rely on AI as well. Their strategy is to supply these models with texts that are labeled by humans as containing violence, sexual abuse, misinformation, and other forms of harmful content. Once the AI model has extracted a representation (i.e., ‘learned’) of these forms of content, it can greatly support these tech companies in flagging and banning harmful content. 

So, what is the issue? Well, before an AI model can learn what harmful text looks like, it should be supplied with a large number of texts that are labeled as such. And, bigger deep learning models have more parameters (more neurons) that need to be calibrated, meaning that even more training data is needed. So, OpenAI resorted to cheap offshore labor to realize this. A revealing article by Perrigo (2023) exposes that the company hired Kenyan workers for this job, paying them between $1.32 and $2 per hour to label texts. What’s more, is that this concerns very dark, harmful texts from the internet. These texts described awful things like murder, torture, and more. 

These ethical concerns are rarely raised amidst the hype around ChatGPT and similar technologies. But, they do form a serious dilemma. Labeled texts transformed the GPT model from merely understanding language to becoming a docile colleague for everyday use. This improvement was a key development that made the technology commercially viable, thus creating its big impact on business and our everyday lives. The question is, was it worth the method? That is a question that we should answer as a collective. People should at least be aware of it so it can be incorporated in the public discourse on AI. 

Sources

Hu, K. (2023, February 2). ChatGPT sets record for fastest-growing user base – analyst note. Reuters. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/

‌Perrigo, B. (2023, January 18). Exclusive: The $2 Per Hour Workers Who Made ChatGPT Safer. Time. https://time.com/6247678/openai-chatgpt-kenya-workers/

Please rate this

Leave a Reply

Your email address will not be published. Required fields are marked *