AI, your new GP?

10

October

2025

No ratings yet.

AI in the mainstream has become synonymous with LLMs, giving students an easy way out of assignments or a tool to generate content. But is there a way in which AI has a tangible benefit? European scientists may have an answer.

A team at the European Molecular Biology Laboratory (EMBL) in Cambridge and the German Cancer Research Center introduced a new AI model for healthcare called Delphi-2M, which can predict more than 1,000 conditions a person might face in the future. Its creators hope that it could predict conditions like Alzheimer’s disease or cancer, which affects millions of people each year.

Authors have taken inspiration from large language models, such as Gemini, which are trained on enormous amounts of text scraped from the internet. These models learn to select the word most likely to come next in any given sentence. Similarly, Delphi-2M AI model analyses data from 400,000 anonymous participants to predict healthcare conditions.

The difference from typical LLMs lies in its ability to account for the time between conditions and the patients’ life events. Creating this feature didn’t come without problems, as early version sometimes predicted diagnoses for people who had already died.

The model was subsequently tested on data from 1.9m Danes, yielding varying results. Events that follow from a specific condition, like diabetes, had more accurate predictions, while more random external factors, like a virus, were harder to predict.

It may take up to ten years before we will see healthcare Gen AI used in daily healthcare checkups. Nonetheless, the model has proven valuable for research, as clustering conditions allows exploration of relationship between diseases. AI is already present in hospitals, mostly assisting with analysing healthcare data. A well-known example, serving over 300,000 patients annually, is Powerful Medical, start-up that interprets electrocardiograms enabling early diagnosis of cardiovascular conditions.

However, there are downsides. A series of recent studies reported that AI models across the healthcare sector led to biased results for women and ethnic minorities. The problem lies in the datasets used for training, content from the internet, which existing societal biases reflected in the LLMs responses. Researchers from MIT have suggested that one way to reduce bias in AI is to filter which data should be used for training.

References:

Heikkilä, M. (2025, September 19). AI medical tools downplay symptoms in women and ethnic minorities. Subscribe to read. https://www.ft.com/content/128ee880-acdb-42fb-8bc0-ea9b71ca11a8

Guardian News and Media. (2025, September 17). New AI tool can predict a person’s risk of more than 1,000 diseases, say experts. The Guardian. https://www.theguardian.com/science/2025/sep/17/new-ai-tool-can-predict-a-persons-risk-of-more-than-1000-diseases-say-experts

A new AI model can forecast a person’s risk of diseases across their life. (n.d.). https://www.economist.com/science-and-technology/2025/09/17/a-new-ai-model-can-forecast-a-persons-risk-of-diseases-across-their-life

Please rate this

Rotterdam’s Digital Twin: Fighting Climate Change with AI

25

September

2025

No ratings yet.

Last week, Roland van der Heijden introduced us to Rotterdam’s Open Urban Platform (OUP) and its Digital Twin, a three-dimensional, real-time copy of the city. This platform is more than just a digital map. It is a shared system where data from sensors, companies, and citizens all come together. From traffic to air quality, everything can be shown in one common model.

But what if this digital version of Rotterdam could also help us prepare for climate change? The Netherlands is very vulnerable to rising sea levels and heavy rainfall. Imagine running flood simulations inside the Digital Twin: testing how storm surges move through neighbourhoods, where sewage systems might fail, and how well our infrastructure can cope.

The OUP already includes tools for flood analysis and heat-stress mapping. These tools can help city leaders explore worst-case scenarios before they happen. During a crisis, live data streams could update the model in real time, helping first responders and guiding evacuation plans.

This is where AI could make a big difference. Traditional flood simulations can take a lot of time, but AI can work like a “fast-forward” button. By learning from earlier simulations, AI can predict outcomes in seconds. This means decision-makers can test many more scenarios, explore different risks, and choose better responses.

The real power comes from the platform logic we discussed in class: the value grows as more people and organizations join. If municipalities, universities, businesses, and citizens all share their data, the OUP becomes not only a better tool for simulations but also a training ground for AI models. The more diverse and continuous the data, the smarter and more reliable these models become.

So here’s my question: could the Digital Twin, combined with AI, become our most powerful defense against climate risks? Not only predicting where floods might hit, but helping us design a safer future city together?

Sources:

  • Municipality of Rotterdam. (2024). Rotterdam in Transformation: Vision on the Digital City 1.0. Open & Agile Smart Cities. https://oascities.org/wp-content/uploads/2024/11/Rotterdam-in-tranformation-vision-on-the-digital-city-1.0.pdf
  • Future Insight. (2025, January 16). The future of Rotterdam starts today: the Open Urban Platform has been launched. Future Insight. https://www.futureinsight.nl/post/the-future-of-rotterdam-starts-today-the-open-urban-platform-has-been-launched?lang=en
  • Bagheri, S., Brandt, T., & van Oosterhout, M. (2021). Digital City Rotterdam: Open Urban Platform — Teaching Case. Erasmus University Rotterdam / ECDA RSM Case. https://ecda.eur.nl/wp-content/uploads/2021/10/Urban-platform-teaching-case-final_.pdf
  • Van der Heijden, R. (2025, September 18). Rotterdam Citiverse & Open Urban Platform [Guest lecture]. Rotterdam School of Management, Erasmus University Rotterdam.

Please rate this

Web Scraping: The Good, the Bad and the Ugly

19

September

2024

No ratings yet.

From Innovation to Privacy Risks, and How Websites Defend Against It

What once started as an experiment to measure the true size of the internet (Norman, 2020) has long since become an integral part of it. Web scraping is not a new topic, it first emerged and gained popularity in the early 90s. But what exactly is web scraping? In short, it is the extraction of data from a website for analysis or retrieval (Zhao, 2017). The current excitement around large language models (LLMs) like OpenAI’s GPT has renewed the importance of web scraping. These models rely on massive, diverse, and current datasets to improve their performance, which can be aggregated at scale using web scraping.

But is web scraping more helpful or harmful, and what can websites do to prevent it?

The Good, the Bad and the Ugly

The Good

Web scraping can be a valuable tool for research and innovation. For instance, search engines rely on scraping to index websites and provide answers directly on search pages. Beyond this, scholars use web scraping to gather data that would otherwise be inaccessible. For example, monitoring Dark Web activity benefits fields like cybersecurity and social science (Bradley & James, 2019).

The Bad

However, scraping often disregards website terms and conditions, raising ethical and legal questions (Krotov et al., 2020). In the U.S., scraping has been challenged numerous times under laws like the Computer Fraud and Abuse Act (CFAA), with high-profile cases such as LinkedIn vs. hiQ Labs. In Europe, scraping also results in legal risks, especially when performed without consent.

The Ugly

At its worst, scraping can lead to serious breaches of privacy. Scrapers can collect sensitive data, including login credentials and personal information. Worse still, LLMs trained on scraped data may unintentionally memorize and expose this information, creating privacy concerns in AI (Al-Kaswan & Izadi, 2023).

Defending Against Scraping

To protect against web scraping, websites employ various techniques. Common defenses include requiring users to log in, implementing CAPTCHA challenges, and restricting access to private content (Turk et al., 2020). For instance, some websites require registration before allowing access to certain information, while others require the use of multi-factor authentication (MFA). This is intended to make automated logins harder. Additionally, rate limiting is used to block scrapers after a certain number of requests. Other tactics include detecting and blocking IP addresses based on blacklisting.

However, these mechanisms are not foolproof. Scrapers, which are increasingly powered by AI, can now mimic human actions such as typing delays and solving CAPTCHAs (Yu & Darling, 2019). Lastly, proxy networks are used to circumvent rate limiting and IP bans.

This back-and-forth between website hosts and scraping technologies has turned into an ongoing arms race, with AI being leveraged on both sides.

Fun fact: CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart (Google, n.d.).

References

Al-Kaswan, A., & Izadi, M. (2023). The (ab)use of Open Source Code to Train Large Language Models. https://api.semanticscholar.org/CorpusID:257219963

Bradley, A., & James, R. (2019). Web scraping using R. https://www.semanticscholar.org/paper/Web-Scraping-Using-R-Bradley-James/f5e8594d28f8425490a17e02b5697a26c5b54d03

Google. (n.d.). What is ReCAPTCHA? Google Support. Retrieved September 19, 2024, from https://support.google.com/recaptcha/?hl=en#:~:text=A%20%E2%80%9CCAPTCHA%E2%80%9D%20is%20a%20turing,users%20to%20enter%20with%20ease.

Krotov, V., Johnson, L., & Silva, L. (2020). Legality and ethics of web scraping. Communications of the Association for Information Systems, 47, 539–563. https://doi.org/10.17705/1cais.04724

Norman, J. (2020, September). Matthew Gray develops the world wide web wanderer. Is this the first web search engine? HistoryofInformation.com. Retrieved September 19, 2024, from https://historyofinformation.com/detail.php?id=1050

Turk, K., Pastrana, S., & Collier, B. (2020). A tight scrape: Methodological Approaches to cybercrime Research data collection in adversarial environments. 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). https://doi.org/10.1109/eurospw51379.2020.00064

Yu, N., & Darling, K. (2019). A Low-Cost approach to crack Python CAPTCHAs using AI-Based Chosen-Plaintext attack. Applied Sciences, 9(10), 2010. https://doi.org/10.3390/app9102010

Zhao, B. (2017). Web scraping. In Springer eBooks (pp. 1–3). https://doi.org/10.1007/978-3-319-32001-4_483-1

Please rate this