Web Scraping: The Good, the Bad and the Ugly

19

September

2024

No ratings yet.

From Innovation to Privacy Risks, and How Websites Defend Against It

What once started as an experiment to measure the true size of the internet (Norman, 2020) has long since become an integral part of it. Web scraping is not a new topic, it first emerged and gained popularity in the early 90s. But what exactly is web scraping? In short, it is the extraction of data from a website for analysis or retrieval (Zhao, 2017). The current excitement around large language models (LLMs) like OpenAI’s GPT has renewed the importance of web scraping. These models rely on massive, diverse, and current datasets to improve their performance, which can be aggregated at scale using web scraping.

But is web scraping more helpful or harmful, and what can websites do to prevent it?

The Good, the Bad and the Ugly

The Good

Web scraping can be a valuable tool for research and innovation. For instance, search engines rely on scraping to index websites and provide answers directly on search pages. Beyond this, scholars use web scraping to gather data that would otherwise be inaccessible. For example, monitoring Dark Web activity benefits fields like cybersecurity and social science (Bradley & James, 2019).

The Bad

However, scraping often disregards website terms and conditions, raising ethical and legal questions (Krotov et al., 2020). In the U.S., scraping has been challenged numerous times under laws like the Computer Fraud and Abuse Act (CFAA), with high-profile cases such as LinkedIn vs. hiQ Labs. In Europe, scraping also results in legal risks, especially when performed without consent.

The Ugly

At its worst, scraping can lead to serious breaches of privacy. Scrapers can collect sensitive data, including login credentials and personal information. Worse still, LLMs trained on scraped data may unintentionally memorize and expose this information, creating privacy concerns in AI (Al-Kaswan & Izadi, 2023).

Defending Against Scraping

To protect against web scraping, websites employ various techniques. Common defenses include requiring users to log in, implementing CAPTCHA challenges, and restricting access to private content (Turk et al., 2020). For instance, some websites require registration before allowing access to certain information, while others require the use of multi-factor authentication (MFA). This is intended to make automated logins harder. Additionally, rate limiting is used to block scrapers after a certain number of requests. Other tactics include detecting and blocking IP addresses based on blacklisting.

However, these mechanisms are not foolproof. Scrapers, which are increasingly powered by AI, can now mimic human actions such as typing delays and solving CAPTCHAs (Yu & Darling, 2019). Lastly, proxy networks are used to circumvent rate limiting and IP bans.

This back-and-forth between website hosts and scraping technologies has turned into an ongoing arms race, with AI being leveraged on both sides.

Are you a robot?

Fun fact: CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart (Google, n.d.).

References

Al-Kaswan, A., & Izadi, M. (2023). The (ab)use of Open Source Code to Train Large Language Models. https://api.semanticscholar.org/CorpusID:257219963

Bradley, A., & James, R. (2019). Web scraping using R. https://www.semanticscholar.org/paper/Web-Scraping-Using-R-Bradley-James/f5e8594d28f8425490a17e02b5697a26c5b54d03

Google. (n.d.). What is ReCAPTCHA? Google Support. Retrieved September 19, 2024, from https://support.google.com/recaptcha/?hl=en#:~:text=A%20%E2%80%9CCAPTCHA%E2%80%9D%20is%20a%20turing,users%20to%20enter%20with%20ease.

Krotov, V., Johnson, L., & Silva, L. (2020). Legality and ethics of web scraping. Communications of the Association for Information Systems, 47, 539–563. https://doi.org/10.17705/1cais.04724

Norman, J. (2020, September). Matthew Gray develops the world wide web wanderer. Is this the first web search engine? HistoryofInformation.com. Retrieved September 19, 2024, from https://historyofinformation.com/detail.php?id=1050

Turk, K., Pastrana, S., & Collier, B. (2020). A tight scrape: Methodological Approaches to cybercrime Research data collection in adversarial environments. 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). https://doi.org/10.1109/eurospw51379.2020.00064

Yu, N., & Darling, K. (2019). A Low-Cost approach to crack Python CAPTCHAs using AI-Based Chosen-Plaintext attack. Applied Sciences, 9(10), 2010. https://doi.org/10.3390/app9102010

Zhao, B. (2017). Web scraping. In Springer eBooks (pp. 1–3). https://doi.org/10.1007/978-3-319-32001-4_483-1

Data Privacy and GenAI

16

September

2024

No ratings yet.

When ChatGPT launched at the end of 2022, most data protection professionals had never heard of generative AI and were then certainly not aware of the potential dangers it could bring to data privacy (CEDPO AI Working Group, 2023). Now that AI platforms grow more sophisticated, so do the risks to our privacy, and therefore, it is important to discuss these risks and how to disarm them as effectively as possible.

GenAI systems are built on vast datasets, often including sensitive personal and organizational data. When users interact with these platforms, they unknowingly share information that could be stored, analyzed, and even potentially exposed to malicious actors (Torm, 2023). The AI itself could potentially reveal confidential information learned from previous interactions, leading to privacy breaches. This could have some major implications for the affected individuals or organizations if sensitive information is being shared without proper anonymization or consent.

Continuing on the topic of consent: Giving consent for generative AI platforms to use your data can be tricky, as most platforms provide vague and complex terms and conditions that are difficult for most users to fully understand. These agreements often include legal jargon and technological terminology, making it hard to know exactly what data is being collected, how it’s being used, or who it’s being shared with. This lack of transparency puts users at a disadvantage, as they may unknowingly grant permission for their personal information to be stored, analyzed, or even shared without fully understanding the risks involved.

To reduce the potential dangers of GenAI platforms, several key measures must be implemented. First, transparency should be prioritized by simplifying terms and conditions, making it easier for users to understand what data is being collected and how it is being be used. Clear consent mechanisms should be enforced, requiring explicit user approval for the collection and use of personal information. Additionally, data anonymization must be a standard practice to prevent sensitive information from being traced back to individuals. Furthermore, companies should limit the amount of data they collect and retain only what is necessary for the platform’s operation. Regular audits and compliance with privacy regulations like GDPR or HIPAA are also crucial to ensure that data handling practices align with legal standards (Torm, 2023). Lastly, users should be educated on best practices for protecting their data when using GenAI, starting with being cautious about what they share on AI platforms.

In conclusion, while generative AI offers transformative potential, it also presents significant risks to data privacy. By implementing transparent consent practices, anonymizing sensitive data, and adhering to strict privacy regulations, we can minimize these dangers and ensure a safer, more responsible use of AI technologies. Both organizations and users must work together to strike a balance between innovation and security, creating a future where the benefits of GenAI are harnessed without compromising personal or organizational privacy.

References:

CEDPO AI Working Group. (2023). Generative AI: the Data protection Implications. https://cedpo.eu/wp-content/uploads/generative-ai-the-data-protection-implications-16-10-2023.pdf

Torm, N. (2023, December 11). Steps to safeguarding privacy in the Gen AI era. www.cognizant.com. https://www.cognizant.com/se/en/insights/blog/articles/steps-to-safeguarding-privacy-in-the-gen-ai-era

Twitch Data Leak – Are Platforms Doing Enough To Secure Our Data?

9

October

2021

Data Security: Recent Twitch data leak shows how confidential information can be accessible for anyone through data breaches. Are platforms doing enough to prevent this?

5/5 (1)

Three days ago, another platform and its users became victims of a data leakage. This time it was Twitch, a highly popular (game-)streaming platform owned by Amazon with approximately 8.07 million active streamers/users just last month (Clement, 2021). The top streamers on the platform gather millions of viewers around the world and subsequently get paid by Twitch for providing their users with entertainment through streams. Last Wednesday, for the first time in Twitch history, confidential company information and streamers’ earnings were leaked as it became clear how much the top streamers have earned in revenue. And it was not a small leak either: BBC has reported that it was due to a massive breach of over 100GB in data (Tidy & Molley, 2021).

2021: Record-breaking amount of data leaks?

Unfortunately, this data leak of a widely-used platform is not the first and certainly not the last. According to The Identity Theft Research Center, the number of (publicly announced) data breaches so far this year has already surpassed the total number in 2020 by 17%, with nearly 281.5 million people being affected by these breaches in 2021. There have been 1,291 breaches so far, compared to 1,108 breaches last year. The report also states that we could be headed towards a record-breaking year when it comes to total amount of data leaks, with the current all-time high of 1,529 breaches being set in 2017 (Morris, 2021).

More data = more data security?

Whether this year will mark the most amount of data breaches or not, it illustrates that data security is becoming increasingly more important in order to prevent these breaches from happening. With the growth in data produced and collected by almost every business or organisation, the likelihood of the (increasingly valuable) data being leaked or systems being breached naturally increases. To put the increase of data into perspective: In 2010, the world created about 2 ZB (zettabytes) of digital information. Last year, this increased to a whopping 44 ZB in that year alone (Saha, 2020).

Needless to say, more data requires better data security. Especially considering the increase in breaches/leaks this year, companies should look to invest more in protecting their (users’) data. According to a cybersecurity market report, the global cybersecurity market size is projected to grow from 217.9 billion USD in 2021 to 345.4 billion USD by 2026 (MarketsAndMarkets, 2021). Although the cybersecurity market is increasing, will it be enough to significantly decrease data leaks/breaches?

Data equals money

Not only does a data leak hurt a platform’s reputation or its users’ privacy, it can also cost the concerned organization a lot of money. According to the annual Cost of a Data Breach Report, 2021 had the highest average cost in 17 years as data breach costs rose from 3.86 million USD to 4.24 million USD: “the highest average total cost in the 17-year history of this report” (IBM, n.d.). When looking at the example of Twitch, source code was leaked alongside revenue information of top streamers. Therefore, its competitors (e.g. YouTube Gaming) now have access to their rival’s source code and revenue information about the platform’s most valuable asset: their content providers. With the added privacy aspect of the leak, this might result in a significant loss of competitive advantage and thus loss of revenue for Twitch.

Discussion: is it enough?

Now you know how much is invested in cybersecurity and how much an average data leak actually costs, do you think companies should invest even more? In addition, do you think 2021 will go into the history books as the “least safe” year for online platforms so far? And do you think this particular breach will mark the end of Twitch’s dominant competitive position in its industry?

Let me know your thoughts and perspective.

References

Tidy, J. & Molloy, D. (2021). Twitch confirms massive data breach. Available at: https://www.bbc.com/news/technology-58817658

Clement, J. (2021). Active streamers on Twitch worldwide 2021. Available at: https://www.statista.com/statistics/746173/monthly-active-streamers-on-twitch/

Morris, C. (2021). The number of data breaches in 2021 has already surpassed last year’s total. Available at: https://fortune.com/2021/10/06/data-breach-2021-2020-total-hacks/

Saha, D. (2020). How The World Became Data-Driven, And What’s Next. Available at: https://www.forbes.com/sites/googlecloud/2020/05/20/how-the-world-became-data-driven-and-whats-next/?sh=2161cb1d57fc

MarketsAndMarkets. (2021). Cybersecurity Market with Covid-19 Impact Analysis by Component (Software, Hardware, and Services), Software (IAM, Encryption, APT, Firewall), Security Type, Deployment Mode, Organization Size, Vertical, and Region – Global Forecast to 2026. Available at: https://www.marketsandmarkets.com/Market-Reports/cyber-security-market-505.html#:%7E:text=global%20Cybersecurity%20market%3F-,In%20the%20post%2DCOVID%2D19%20scenario%2C%20the%20global%20cybersecurity,9.7%25%20from%202020%20to%202026.

IBM. (n.d.). How much does a data breach cost? Available at: https://www.ibm.com/nl-en/security/data-breach

Author: Roël van der Valk

MSc Business Information Management student at RSM Erasmus University - Student number: 483426 TA BM01BIM Information Strategy 2022 View all posts by Roël van der Valk

Google’s DeepMind facing data privacy lawsuit

5

October

2021

4/5 (1)

From data to app to lawsuit

2015: Alphabet Inc.’s British artificial intelligence subsidiary DeepMind obtains private health records of 1.6 million patients from the Royal Free London NHS Foundation Trust.

This data was to be used to develop the ‘Streams’ app which aims to alert, detect, and diagnose kidney injuries. The app was being developed for use by doctors to detect acute kidney injury. This app was already being used by the Royal free with great praise.

From DeepMinds point of view, they are making use of valuable data in order to progress healthcare and save lives. From Royal Free’s point of view, they are enabling this by sharing this data and then using the app created by this to treat patients. However, for some citizens, this seems like a breach of data privacy.

The British law firm Mishcon de Reya has filed a class-action lawsuit against DeepMind to represent Andrew Prismall and the other 1.6 million patients whose data was shared.

Who is at fault?

Something I find quite interesting about this case is that DeepMind is accused of being at fault rather than the Royal Free, who shared the data in the first place. Although the Streams app was developed by DeepMind, the app was a collaboration between DeepMind and Royal Free and could not have succeeded without both of their inputs.

I believe that both players are to blame in this situation and that DeepMind can not be put at fault alone. Who do you believe is at fault in this situation?

How can we prevent this in the future?

For such situations, a healthcare system with strong regulations regarding data privacy, and healthcare providers who abide by such regulations, would largely diminish the threat of major tech firms such as Alphabet. However, too many regulations can inhibit innovation in some situations. Finding a balance between innovation and safety is a challenge that many industries and regulators struggle with worldwide.

I believe that it is no easy task to find such a balance. There is a growing number of factors influencing a push for both regulation and free innovation as digital information becomes one of the most important assets for innovative development. Experts on data privacy and innovation must come together to form regulations that can foster safe innovation.

What do you think should be done to foster safe innovation in the information era?

References:

https://www.bbc.com/news/technology-40483202

https://www.bbc.com/news/technology-58761324

https://www.cnbc.com/2021/10/01/google-deepmind-face-lawsuit-over-data-deal-with-britains-nhs.html

https://deepmind.com/

Keep calm and Amazon

30

September

2021

No ratings yet.

Source: https://internetofbusiness.com/amazon-unveils-its-new-alexa-smart-home-car-devices/

These days, consumers are flooded with smart devices for the home. Like Google, there is Amazon, which certainly can deliver good products, too. Following this trend, Amazon is expanding the functions related to personalization of its smart devices to make them more useful (Hautala, 2021). As a user, you can either be happy about this or not. Both the Ring security cameras and Alexa smart speakers are known, for example, for tracking users and their surroundings (Hautala, 2021). Amazon wants to change this by not collecting all the personal data.

Amazon recently revealed updates for both Ring and Echo products that result in incremental improvement in user privacy (Hautala, 2021). As opposed to storing data in Amazon’s cloud, the bottom line is that people will feel more comfortable when their personal information is processed on its security cameras and smart devices.

Users of the smart devices have the option to store and process information locally, meaning that the data will remain on their devices (Hautala, 2021). However, the question is how trustworthy you consider the devices to be as a user. People who are very keen on privacy are on distance from these types of devices, so the new features will not provide a solution about devices that can always keep an eye on you (Hautala, 2021).

Personally, I believe that privacy is indeed a big factor when considering smart devices, but I do reflect on how prominently I want to place the privacy factor in my life. Using Amazon as an example, as opposed to an Alexia speaker I am likely to buy devices that relate to home security, for example Ring.

Now that Amazon is introducing the option to store data locally, knowing that information is still being captured, I am curious to know to what extent you want to make your home smart and whether privacy is a big factor for you to consider if you actually want to buy smart devices.

Reference

Hautala, L. (2021, September 29). Amazon unwraps privacy features as it tries to roll deeper into your home. CNET. https://www.cnet.com/home/smart-home/amazon-rolls-out-privacy-features-as-it-tries-to-get-deeper-into-your-house/

Retrospective Facial Recognition in Policing: 2021 or 1984?

29

September

2021

No ratings yet.

The Metropolitan Police (Met) has new plans of purchasing and implementing retrospective facial recognition (RFR) technology in London (Woodhams, 2021). This technology will enable the MET to process historic photographs from CCTV, social media, and many other sources in order to track down criminal suspects. These plans were made public when the Mayor of London accepted the Mets proposal to increase its surveillance technology (MOPAC, 2021). This proposal showed the Mets plans to have a 4 year, £3 million deal with NEC, a multinational information technology and electronics corporation from Japan.

In the past, similar technologies like Live Facial Recognition (LFR) have seen heavy public criticism. LFR scans the faces of people that walk past a camera and compares these to a database of photos of people who are on a watchlist. Police use of LFR has already been scrutinized to the point where the United Nations High Commissioner for Human Rights has called for a moratorium on LFR use (Woodhams, 2021).

In order to protect the freedom, privacy of citizens, it is important that the public gains an understanding of both LFR and RFR, and the police’s plans of implementing them. As the complexity of policing technology will continue to increase, I believe that citizens will have a harder time understanding these technologies and the implications of their use.

One interesting implication of RFR that I would like to shed light on with this article involves data consent. As mentioned previously, RFR uses historic photographs. In the past, when these photos were taken, citizens did not agree for them to be used in future RFR police investigations. At the time, many citizens did not even know that such use of these photographs could be a possibility in the future. This raises my question to the readers of this article. Should the police be allowed to use photographs you consented to in the past, for new purposes without new consent? Is the police acting in an immoral way?

References:

Woodhams, Samuel. (2021). London is buying heaps of facial recognition tech. Wired, Condé Nast Britain 2021. Retrieved from: https://www.wired.co.uk/article/met-police-facial-recognition-new

MOPAC. (2021). Retrospective Facial Recognition System. The Mayor’s Office for Policing And Crime. Retrieved from: https://www.london.gov.uk/sites/default/files/pcd_1008_retrospective_facial_recognition_system.pdf

Featured photo from:

Macon, K. (2021). London Police to rollout “Retrospective Facial Recognition,” scanning old footage with new invasive face recognition tech. Reclaim The Net. Retrieved from: https://reclaimthenet.org/london-police-to-rollout-retrospective-facial-recognition/

Down the YouTube Rabbit Hole

7

October

2020

5/5 (1)

Over the past few weeks, a lot has been said (including on this blog) about how social media has been impacting the offline world in a negative way. After watching “The Social Dilemma”, which launched on Netflix last September, we started to think about how these platforms are selling our attention as a commodity and leading to an increasingly polarized society, harming democracies around the world. Some people decided to take it one step further and deleted accounts, turned off notifications and stopped clicking on recommended content – just as suggested in the documentary by the whistleblowers who helped creating these platforms. I was one of those people – until I wasn’t anymore!

Interestingly enough, shortly after watching the documentary I started to receive tons of recommendation of content that addressed the same issues, especially on YouTube and Facebook. Isn’t it funny how the algorithm can work against itself? In the beginning, I was decided not to click on any of the suggested videos even though the content seemed quite interesting. Instead, I decided to do my own research on topics such as data privacy, surveillance capitalism or ethical concerns when designing technology. However, the more research I would do the more recommendations I would get – unexpected, uh?

So, one lazy Sunday afternoon I gave in to temptation and clicked on a video that was recommended to me by YouTube – it was a really interesting Ted Talk by techno-sociologist Zeynep Tufekci, which dug a little deeper into some of the question raised in “The Social Dilemma”. Needless to say, one hour later I had already watched 5 more Tedtalks – I admit it, I felt into the Youtube Rabbit Hole!

However, I cannot say that I regret my decision as I gained really interesting insights from these recommendations. After all, that’s how this recommendation system is supposed to work, right? In particular, I was a able to find some answers to a question that had been in my mind for a while: “But what can we do to stop the negative effects of social media while still valuing freedom of speech as a pillar of the internet?”

Even though a lot has been said about the threats arising from the widespread use of social media, I haven’t come across tangible solutions for this issue. Sure, we can turn notifications off, but that won’t tackle the problem at its core! But in two very enlightening Ted Talks by Claire Wardle (misinformation expert) and Yasmin Green (research director a unit of Alphabet focused on solving global security challenges through technology) I was able to find some clarity. According to them, there are three areas that we can act upon to create a better digital and physical world:

Tech Companies – first of all, if any advances are going to be made, we need technology platforms to be on board. As an eternal optimist, I do believe that tech leaders are aware of the challenges they face and are certainly trying to find solutions. As Yasmeen Green explains, Google already successfully developed what they called the “Redirect Method”, which targeted people who made searched related to joining terrorist groups. For example, when a Google search about extremist content was made the first result would be an add inviting them to watch a video about more moderate content. Furthermore, the targeting would not be made based on the user profile, but on the specific question that was asked. What if we could use the “Redirect Method” to stop the spread of conspiracies theories or misinformation about climate change? It would be great for society, although probably not so profitable for the tech giants ?

Governments – Although tech companies have their fair share of responsibilities, at the moment they are “grading their own homework” and regulating themselves, making it impossible for us to know if interventions are working. That’s where governments come in place. But a challenge this big doesn’t simply call on local or even national regulators. What we really need is global response to regulate the information ecosystem. Or, as Brad Smith (Microsoft’s President) puts it, we need a “Digital Geneva Convention” that holds tech platforms accountable and prevents coordinated social attacks on democracy.

We the People – While we would love to place our hopes on Governments to solve this situation for us, it is undeniable that most lawmakers are struggling to keep up with a rapidly changing digital world. From time to time, a US Senate Committee investigating tech companies will originate a few memes as we see that lawmakers have a difficult time understanding what they’re talking about – I will leave you my favorite down below! That’s why we need to take the matter into our own hands and a way to do it is, as Claire Wardle puts it “donate our social data to science”. Millions of datapoints on us are already collected by social media platforms anyway, but what if we could use them to develop a sort of centralized open repository of anonymized data, built on the basis of privacy and ethical concerns? This would create transparency and allow technologists, journalists, academics and society as a whole to better understand the implications of our digital lives.

Overall, I recognize that these solutions are not perfect or complete. But I do believe that they provide a starting point to “build technology as human as the problems we want to solve”.

Sources

Smith, B., 2017. The Need For A Digital Geneva Convention – Microsoft On The Issues. [online] Microsoft on the Issues. Available at: www.blogs.microsoft.com [Accessed 6 October 2020].

Shead, S., 2020. Netflix Documentary ‘The Social Dilemma’ Prompts Social Media Users to Rethink Facebook, Instagram And Others. [online] CNBC. Available at: www.cnbc.com [Accessed 6 October 2020].

Green, Y., 2018. Transcript Of “How technology can fight extremism and online harassment”. [online] Ted.com. Available at: www.ted.com [Accessed 6 October 2020].

Wardle, C., 2019. Transcript Of “How you can help transform the internet into a place of trust” [online] Ted.com. Available at: www.ted.com [Accessed 6 October 2020].

Tufekci, Z., 2017. Transcript Of “We’re building a dystopia just to make people click in ads” [online] Ted.com. Available at: www.ted.com [Accessed 6 October 2020].

The real price of DNA-testing kits

29

September

2019

5/5 (2) The popularity of DNA tests is on the rise. In 2018, the amount of people purchasing DNA tests was the same as that of all previous years combined (Regaldo, 2019). Furthermore, only in the beginning of 2019, 22 million people had already submitted their DNA to one of the main DNA testing companies(Regaldo, 2019). If this trend continues, by the end of the year these firms could have the data on the genetic material of 100 million people (Regaldo, 2019). DNA testing can be an insightful and intriguing experience; through these tests, one can learn about its ancestry, medical predispositions and inherited risks. Nevertheless, many do not think about the potential consequences that it can have on the privacy of one’s genetic data.

For instance, 23andMe, one of the biggest firms, has made available the data of its customers to the pharmaceutical company GlaxoSmithKline. Through this $300 million partnership, the pharmaceutical company can use the genetic information of the more than 5 million customers of 23andme for drug research purposes (Rutherford, 2018). Since 23andme’s customers agreed to having their data used for medical purposes in the terms and conditions (Ducharme, 2018), this is all perfectly legal. Whats more, customers were given the option to opt out of these initiatives, yet, roughly 1 million customers out of the more than 5 million actually chose this option (Rutherford, 2018). However, it is likely that many were not fully aware of the future repercussions it could have on themselves and their blood relatives. In fact, that is what I personally find more worrying about this issue, it is not only one’s privacy that is jeopardized, but also that of living and future family members who have not agreed to sharing their genetic data. The deal between GlaxoSmithKline and 23and me is just one example, there are multiple cases of DNA testing companies sharing the data with other pharmaceuticals, public databases, research institutions and even law enforcement agencies. Apart from the growing privacy concerns, there are also security threats to sharing DNA data. Many believe that the data is kept in secured databases.However, data breaches is not something uncommon and in the case of DNA, it could lead to people being discriminated by employers, insurance providers or banks solely based on their genetics (Martin, 2018). Another issue that can be raised into question is that of the DNA data’s ownership. Companies like Ancestry claim that they do not own the right to their customers’ DNA (which seems quite obvious, since they cannot take it away from a person) or the DNA samples they receive (Brown,2017). However, their license system allows them to essentially treat the data as if they did have an ownership right over it (Brown,2017). Therefore, by not fully reading the fine print, consumers are basically selling away the rights to their DNA samples. The problem is further enhanced by the fact that there are not really any laws addressing these issues at the moment (Martin, 2018). If consumers do not start protecting their genetic data and governments enforcing laws to limit the power that these corporations have over their customers’ data, our genetic anonymity will be compromised.

Therefore, it can be asserted that the true price of DNA-testing kits is not the $50 to $200 that it can cost, but one’s genetic code and the potential negative repercussions it may have. We must all keep in mind that once we share our data, what it is done with it is beyond our control. Our DNA is the most valuable data we own (Baram ,2018) and unlike in the case of a password breach, it is not something we can change.

What do you think? Should we be worried about the growing amount of genetic data that is being shared? Would you still buy a DNA-testing kit?

References:

Brown, K. (2017). ‘What DNA testing companies’ terrifying privacy policies actually mean’. Available at:https://gizmodo.com/what-dna-testing-companies-terrifying-privacy-policies-1819158337

Ducharme, J.(2018).‘A major drug company now has access to 23andMe’s genetic data. Should you be concerned?’. Available at: https://time.com/5349896/23andme-glaxo-smith-kline/

Martin, N.(2018). ‘How DNA Companies like ancestry and 23andMe are using your genetic data’.Available at:https://www.forbes.com/sites/nicolemartin1/2018/12/05/how-dna-companies-like-ancestry-and-23andme-are-using-your-genetic-data/#256265a56189

Regaldo, A.(2019).’More than 26 million people have take an at-home ancestry test’.MIT Technology Review. Available at:https://www.technologyreview.com/s/612880/more-than-26-million-people-have-taken-an-at-home-ancestry-test/

Rutherford, A. (2018). ‘DNA ancestry tests may look cheap. But your data is the price’.The Guardian. Available at:https://www.theguardian.com/commentisfree/2018/aug/10/dna-ancestry-tests-cheap-data-price-companies-23andme

This is technological propaganda. 5/5 (4)

28

September

2019

The results of Brexit or Trump happening were shocking but not surprising. However, a greater concern emerged: the accidental or deliberate propagation of misinformation via social media.

44% of Americans get their news from Facebook (Solon, 2016). Many millions of people saw and believed fake reports that “the pope had endorsed Trump; Democrats had paid and bussed anti-Trump protesters; Hillary Clinton was under criminal investigation for sexually assaulting a minor” (Smith, 2016). If our democracy is built on reliable information, what is real?

The good, the bad and the ugly admission fee

In the Arab Spring campaign, Facebook as well as Twitter were first politicized and used to inspire people as tool for democracy. With Brazil, Brexit, and US we saw the equilibrium shift to the other side. We assume that there is an admission fee to pay before we are allowed to the connected world (Thompson, 2019). How many times a day have you been asked to agree with the terms on a website and clicked accept to only access the data behind it?

The recent Cambridge Analytica scandal exposes Facebook’s rather porous privacy policies and the company’s casual attitude to oversight. By using the platform, Cambridge Analytica, a British data mining firm, was able to extract data of 270.000 people by conducting a survey. People accepted to share details about themselves –and unknowingly about their friends (Economist, 2018). This amounted to information from 50 million Facebook users in overall, which the company happily shared with their customers, including Trump (Economist, 2019).

Full-service propaganda machine and Nazi Germany

In essence, companies like Cambridge Analytica can use Facebook to “target voters who show an interest in the same issues or have similar profiles, packaging them into what it calls ‘lookalike audiences’.” (Economist, 2018). The practice used effectively shaped voting results in several countries such as Argentina, Kenya, Malaysia, and South Africa even before the US presidency in 2016 (Thompson, 2019).

The practice to address certain lookalike audiences with feelings rather than facts, playing up vision to create a fake emotional connection, is not new. Nazi Germany shows this. Yet, we have the internet-driven efficiency (Smith, 2016).

Clickbait

Like the headline of this article, revenue-driven platforms such as Google and Facebook are using news feeds that engage more people, essentially to expose them to more ads. Whether the article is reliable or not does not matter, the algorithm boosts sensational stories that reinforce prejudice in order to draw more clicks (Smith, 2016). As mentioned before, if we use this as our primary information source, how can we assure that we are able to make informed decisions?

To conclude, platforms cannot stand at the sidelines making profit and see how they are used as a stepping stone to the next political victory for the highest bidder. They should be held accountable. Now.

References:

Economist (2018) The Facebook scandal could change politics as well as the internet. Data privacy. Available at: https://www.economist.com/united-states/2018/03/22/the-facebook-scandal-could-change-politics-as-well-as-the-internet

Economist (2019) “The Great Hack” is a misinformed documentary about misinformation. The Facebook scandal. Available at: https://www.economist.com/prospero/2019/07/24/the-great-hack-is-a-misinformed-documentary-about-misinformation

Smith A. (2016) The pedlars of fake news are corroding democracy. The Guardian. Available at: https://www.theguardian.com/commentisfree/2016/nov/25/pedlars-fake-news-corroding-democracy-social-networks

Solon O. (2016). Facebook’s failure: did fake news and polarized politics get Trump elected?. The Guardian. Available at: https://www.theguardian.com/technology/2016/nov/10/facebook-fake-news-election-conspiracy-theories

Thompson A. (2019) The Great Hack terrified Sundance audiences, and then the documentary go even scarier. IndieWire. Available at: https://www.indiewire.com/2019/08/the-great-hack-documentary-oscar-cambridge-analytica-1202162430/

Photograph: Dado Ruvic/Reuters

Web Scraping: The Good, the Bad and the Ugly

19

From Innovation to Privacy Risks, and How Websites Defend Against It

The Good, the Bad and the Ugly

The Good

The Bad

The Ugly

Defending Against Scraping

References

Please rate this

Twitch Data Leak – Are Platforms Doing Enough To Secure Our Data?

9

Please rate this

Author: Roël van der Valk

Google’s DeepMind facing data privacy lawsuit

5

Please rate this

Keep calm and Amazon

30

Reference

Please rate this

Retrospective Facial Recognition in Policing: 2021 or 1984?

29

Please rate this

Down the YouTube Rabbit Hole

7

Please rate this

The real price of DNA-testing kits

29

Please rate this