Your Profile Is Being Scraped

18

September

2020

4.33/5 (3)

Facial recognition is gaining interest the last few years, all around the internet and also on this forum, more and more is being written about facial recognition itself, the positive and negative effects and the underlying technologies. Major companies are competing on developing better algorithms and are selling their developed technologies as cloud services. Easy API’s make it possible for every tech savvy person to use those services within minutes. But still the subject of facial recognition is still a lot of theory and less action. Current news items often discussed a few local tests or the implementation of video tracking within law enforcements. The major steps made on facial recognition are made within China, were facial identification or payment becomes more mainstream. But over the last year one company’s name popped up several times, gaining interest of several tech journalist, Clearview AI.

A lot of people nowadays have a certain social media profile, often with a public name, profile picture and some basic information. Of course it would be possible to go to every page and collect user information randomly, but no one every took the time to do this or saw the benefits of doing this, expect the startup Clearview AI.

Scraping is the act of automatically extracting public data of the internet. Every website can be scraped, even all data and texts from this blog for example. Clearview AI, performed these scraping operations on a huge level, they started scraping all the public profiles of Facebook and saved this data in one big database. If your profile picture and name are public on one of your social media accounts, which are probably most of the profiles, it is likely that these are included in the database of Clearview AI.

Would not every law enforcement agency be interested in the possibility of finding a suspect with the help of a few clicks? Robbers, fraudsters or cyber bullies are also people, most of the time with a personal social media account. This is exactly what Clearview AI thought while developing their business model, by scraping all public available data, training huge neural networks and selling it worldwide all bundled in a good looking application to law enforcement agencies. According to a graph of the New York Times, this will bring the number of photos the FBI can search from their own database of 411 million photos to a staggering number of 3 billion photos that are included in the Clearview AI application, all supported by an impressive artifical intelligence model.

This brings up some important questions, do we support facial recognition as a way of law enforcement? Is it legal to scrape information from social networks? Does making your profile public also implies that you give permission for your data to be saved and used for AI training purposes?

Next to the negative sides of web scraping, there are also interesting possibilities of using these methods. You could for example scrape this blog and analyze the word usage or identify trends and topics of interest over time. Web scraping also enables new innovations that aggregate data from multiple sources in creative ways creating information that was not available before.

The New York Times has an article going more into depth in the background of Clearview AI. Click here to read the full article or listen to accompanying podcast if your interested.

I would love to hear your opinion about the subject of web scraping and the usage of facial recognition. If you like to have a more technical background on how to implement web scraping techniques please let me know in the comments.

 

Sources

Hill, K. (2020, January 18). The Secretive Company That Might End Privacy as We Know It. The New York Times. https://www.nytimes.com/2020/01/18/technology/clearview-privacy-facial-recognition.html

Matsakis, L. (2020, January 27). Scraping the Web Is a Powerful Tool. Clearview AI Abused It. Wired. https://www.wired.com/story/clearview-ai-scraping-web/

 

 

Please rate this