Facial recognition is gaining interest the last few years, all around the internet and also on this forum, more and more is being written about facial recognition itself, the positive and negative effects and the underlying technologies. Major companies are competing on developing better algorithms and are selling their developed technologies as cloud services. Easy API’s make it possible for every tech savvy person to use those services within minutes. But still the subject of facial recognition is still a lot of theory and less action. Current news items often discussed a few local tests or the implementation of video tracking within law enforcements. The major steps made on facial recognition are made within China, were facial identification or payment becomes more mainstream. But over the last year one company’s name popped up several times, gaining interest of several tech journalist, Clearview AI.
A lot of people nowadays have a certain social media profile, often with a public name, profile picture and some basic information. Of course it would be possible to go to every page and collect user information randomly, but no one every took the time to do this or saw the benefits of doing this, expect the startup Clearview AI.
Scraping is the act of automatically extracting public data of the internet. Every website can be scraped, even all data and texts from this blog for example. Clearview AI, performed these scraping operations on a huge level, they started scraping all the public profiles of Facebook and saved this data in one big database. If your profile picture and name are public on one of your social media accounts, which are probably most of the profiles, it is likely that these are included in the database of Clearview AI.
Would not every law enforcement agency be interested in the possibility of finding a suspect with the help of a few clicks? Robbers, fraudsters or cyber bullies are also people, most of the time with a personal social media account. This is exactly what Clearview AI thought while developing their business model, by scraping all public available data, training huge neural networks and selling it worldwide all bundled in a good looking application to law enforcement agencies. According to a graph of the New York Times, this will bring the number of photos the FBI can search from their own database of 411 million photos to a staggering number of 3 billion photos that are included in the Clearview AI application, all supported by an impressive artifical intelligence model.
This brings up some important questions, do we support facial recognition as a way of law enforcement? Is it legal to scrape information from social networks? Does making your profile public also implies that you give permission for your data to be saved and used for AI training purposes?
Next to the negative sides of web scraping, there are also interesting possibilities of using these methods. You could for example scrape this blog and analyze the word usage or identify trends and topics of interest over time. Web scraping also enables new innovations that aggregate data from multiple sources in creative ways creating information that was not available before.
The New York Times has an article going more into depth in the background of Clearview AI. Click here to read the full article or listen to accompanying podcast if your interested.
I would love to hear your opinion about the subject of web scraping and the usage of facial recognition. If you like to have a more technical background on how to implement web scraping techniques please let me know in the comments.
Sources
Hill, K. (2020, January 18). The Secretive Company That Might End Privacy as We Know It. The New York Times. https://www.nytimes.com/2020/01/18/technology/clearview-privacy-facial-recognition.html
Matsakis, L. (2020, January 27). Scraping the Web Is a Powerful Tool. Clearview AI Abused It. Wired. https://www.wired.com/story/clearview-ai-scraping-web/
Hey Roman, thank you for this very interesting post about web scraping. To be honest with you, Clearview‘s business model sounds quite scary. Although Clearview‘s intention to support law enforcement can be considered as good, the lack of control we as web users have after publically sharing personal data is very concerning. One the one hand, I think it is personal responsibility to only share a reasonable level of personal information with the online world. On the other hand, we are dependent on local governments regulating companies‘ abilities to use and sell our data. Please share more technical information on web scraping with us in the future! Thank you.
As I have done some simple web scraping as well with Python, I think this is an interesting topic. Although it may not be really ethical to scrape everyones information, it’s also not really illegal I guess. The information is already out there so you’re not stealing anything. The FBI could for example do the same thing by manually going over every single Facebook account and find suspects that way. However, that is not feasible of course. A software that scrapes the internet only makes things way easier. As long as people put their information out there, without putting any restrictions on the usage of it, I think it is acceptable for people to do something with that information. Even though it may not be the most ethical thing to do.
As soon as someone is stealing information that should be restricted form other users, like a hidden picture only visible to friends, I think it should be punishable as that’s basically stealing. Since the internet, and especially such implications, are relatively new to the normal people, I think a good public debate should be held to decide how such ‘problems’ should be tackled in the future. But as long as there hasn’t been any proper discussion, I think it should be allowed for companies to simplify using and gathering data, as long as that data is public.
Thanks Jem for your response on this article. I agree with you that indeed people choose to open up their information and scraping can be acceptable from that point of view. Important to take into account is the fact that when data is scraped it can be kept permentantly. People uploading their data to facebook for example are under the impression this data can be modified, updated and deleted when necessary. When data is scraped it becomes impossible for the user to delete his provided data. I would argue that legislation should focus on banning web scraping of public profiles due to this problem.