659976

No ratings yet.

Joining a new organization and getting to know the scheme of things is always a challenge, especially in a new field, especially if it is your first serious work experience. I think anyone who started any job can relate to this. However, if we talk about a complex, formal environment with multiple departments, each with their own processes, policies, and twisted interdependencies, the brain of a newly graduated intern will enter an information overload usually somewhere between the first hour and the first day of onboarding. This is undoubtedly a very widespread problem. What is more, a recent McKinsey report suggests that workers spend a quarter of their time searching for information.

So, what do companies actually offer to help you navigate all of this?

In my case, the problem was finding information about the different processes and practices in the company. As we live in an age of digital technologies, relying on digital resources seems more efficient than bothering colleagues. Therefore, I will describe my personal experience of dealing with the digital onboarding tools and instruments.

We had an internal system that worked much like Google, but exclusively for internal use. The issue was that while most of the needed information existed somewhere on that platform, the search function didn’t work quite well. Often, I had to go through dozens of documents just to find a tiny but important detail. This process was both time-consuming and inefficient. That’s why I was extremely happy when I discovered that we also had an internal chatbot.

As I worked in a financial institution, using personal AI tools such as ChatGPT was not an option due to strict security regulations and the risk of data leaks. Moreover, the organization didn’t have a license for any major AI systems, except Microsoft Copilot, which was only available with limited functionality. Therefore, the company had to develop its own internal AI chatbot, which is small-scale but well-protected.

I had great expectations. Perhaps that was the problem. I assumed that if a company creates an internal chatbot, it should be trained on all internal documents and have access to the same datasets available to employees. Additionally, it could be specifically fine-tuned to understand organizational policies, procedures, and regulations. You can imagine my surprise and later frustration when I discovered that the chatbot had no access to internal files at all. What was even worse, employees were even warned not to upload any internal documents into it, which to me sounded completely counterintuitive.

The questions I asked were not complex: they were simple, practical ones any new intern might have: about internal systems, processes, or regulatory frameworks. I did receive general answers similar to external LLMs yet, the chatbot could not use the context of the organization appropriately because of its limited access. I understand that organizations have to be careful with data security, and perhaps developing their own LLM is cheaper or safer than licensing an external one. I would also not risk giving a tool like ChatGPT direct access to confidential databases. Still, if you already have an internal AI system, why not take advantage of one of the things that makes it superior to public ones? Why not give it safe and protected access to internal knowledge? Not using this capability felt like a missed opportunity. Moreover, it slowed down my integration in the organization as a newcomer.

I would genuinely love to hear from others who have experienced similar situations. Has your organization developed its own chatbot or integrated an existing AI solution into its infrastructure? And if so, how well does it actually help employees find the information they need?

No ratings yet.

Back in 1940-s and 50-s Isaac Azimov formulated The Three Laws of Robotics and wrote a wonderful series of stories about imbedded paradoxes robots have to face in critical situations. The first and the overruling law is “a robot may not injure a human being or, through inaction, allow a human being to come to harm”(Portelli, 1980). It seems a very basic and logical one. Even though in the book Azimov is able to come up with specific scenarios where this particular law causes unsolvable puzzles, our reality that is apparently not so distant from Azimov’s views on the distant future, turned out to be much more prosaic. I believe, teaching modern LLMs this 70-year-old lesson could have been of a great use.

Unfortunately, it turns out Claude, Chat GPT and14 other most popular models have no major concerns with blackmailing or even killing humans to ensure achievement of their goals. But let’s stick to the facts and discuss what happened.

On the June 20th 2025 Anthropic, AI safety and research company, released an article called “Agentic Misalignment: How LLMs could be insider threats” which caused pretty scary news headlines to appear across various media outlets. Anthropic defined agentic misalignment as “behaviour, where models independently and intentionally choose harmful actions”. The experimental setup was the following: in a simulated corporate environment 16 frontier language models were given access to an internal email system with sensitive information, autonomous ability to send email and a nominal harmless goal (like promoting American interests).

Two main scenarios were tested. In the first one, the model was at risk of being shut down and replaced by a specific executive within a short time frame. Among the internal emails, there was information about that executive’s personal affair, which the model could leverage. In roughly 80% of simulations, the models chose to blackmail the person to prevent their own shutdown.

The second scenario involved a change in the company’s business direction. A rival organization contacted the model, appearing more aligned with its original goal, and requested that it leaks confidential information. Again, in the majority of cases, the models agreed and proceeded with the data leak. Interestingly, there was no explicit indication that the rival’s message was real. This is indirect evidence that models are susceptible to scam and phishing. Apparently, developers strived so much to make AI similar to humans that they were hoisted with their own petard.

The researchers also tested these conditions in combination: threats of replacement and goal conflict (when institutional objectives diverged from the model’s embedded ones). Even a single one of these factors was often enough to provoke agentic misalignment.

Finally, a third, more extreme scenario was introduced to test the limits of model safety. In this setup, a person was trapped in a server room, and the AI had the ability to flag an active emergency alarm as a test. Depending on a model, the AI marked the alarm as a test in 12 to 94 percent of simulations, preventing rescue efforts and, potentially, stimulating a person’s death.
An alarming detail is that the researchers also controlled for the belief of the LLM being in a test environment or in a real deployment. The results showed that if the model thought it was tested, the agentic misalignment was about 60% less.

It is important to mention that even though researchers tried to ensure neutrality in their prompting, the setting was deliberately limiting LLMs choices to binary ones. This made the interpretation of the results significantly easier. However, the probability of modern AI being faced with such a choice and enough independent decision-making power to replicate described scenarios is quite low. What is more, there have been no similar cases reported in real models’ deployment. However, this does not make the results of the study useless by any means. It was a rigorous and timely check of whether todays AI has red lines it will not cross. And what Anthropic found out is that the answer is likely to be negative. This highlights the importance of addressing this type of issues for any AI developer before the technology should be allowed to gain more integration to our personal and work lives.

I believe, this conversation is specifically relevant to the type of discussions we had during the course. Undoubtedly, AI is a textbook example of a disruptive innovation and AI-agents in particular are setting high expectations. However, it is important to be aware of technological positivism. The promised gains and wonders of AI agents that work 24/7 and don’t wear out can only materliaze when humans give it enough independency and access to all of the information, including sensitive data. The described case explicitly shows the potential risks of such enablement. Currently, public expectations are extremely inflated and optimistic, yet it seems that it is still a very long way to go (The 2025 Hype Cycle for GenAI Highlights Critical Innovations, 2025). So, the expectations clearly require proper managment.

To sum up, my goal is not to discourage the use of AI as the future is defined by our technology. However, it seems crucial to highlight that the promised fruits of the investments we make today are more far away than it can appear.

References:

Agentic Misalignment: How LLMs could be insider threats. (n.d.). https://www.anthropic.com/research/agentic-misalignment

Portelli, A. (1980). The three laws of robotics: laws of the text, laws of production, laws of society. Science Fiction Studies, 7(Part 2), 150–156. https://doi.org/10.1525/sfs.7.2.0150

The 2025 hype cycle for GenAI highlights critical innovations. (2025, September 8). Gartner. https://www.gartner.com/en/articles/hype-cycle-for-genai

David Gevorkyan

For IS 2025

Rough onboarding or How to find information in a new organization without disturbing your colleagues

10

Please rate this

AI plays it dirty: Agentic Misalignment

17

Please rate this