Before I embarked on my B-school journey, I was working in the field of data engineering & visualization. I loved exploring tech products, both as part of my professional research as well as my personal curiosity. Hence, it was no surprise that I was one of the first ones in my near circles who started playing around with ChatGPT (back in late 2022). I still remember my initial conversations where I asked it to be an ancient Hindu philosopher and then carried out a whole conversation on not-so-light-hearted topics. I was awestruck at the comprehensiveness of the knowledge that it could cover, and the novelty sure took some time in wearing off.
With subsequent version updates and the GenAI boom of 2023, my interactions with such tools kept increasing. They served primarily two helpful purposes. Firstly, they functioned as amazing workflow accelerators. Academic studies suggest the productivity impact is real. At work, I would often use LLMs to help out with general syntax and boilerplate code. This included me coding first drafts, and the LLM letting me iterate faster on the right questions instead of labouring on the first drafts. As the coding capabilities of LLMs have gotten increasingly sophisticated (I have tried ChatGPT, Claude Code & Cursor AI at this point), they have come in handy for data-centric courses at my B – School as well. Secondly, these tools can serve as great creative sparring partners. For my management courses assignments, I would often drop in my idea and then brainstorm with ChatGPT on refinements. I remember an instance where my objective was to pitch a deep tech startup idea, and it took two half-baked ideas (from previous conversations) and suggested a linkup. GPT-4–style models show how multimodal inputs are expanding such flexible uses, letting text and images talk to each other in the same session. What surprises me often is how these tools (especially Claude, in my experience) expose new strategic levers & are often great companions for ideation.
But these explorations have not entirely been celebratory in nature. There are several downsides that I have faced while using these tools. First is the infamous hallucination hazard. Once, I was required to gather data and do some market sizing calculations. When I tried to search for the data using ChatGPT, it straight-up invented incorrect figures in order to satisfy my request, thereby showcasing the tendency to hallucinate if the task is not straightforward. Second is the inability to perform overly complex analytical/mathematical tasks currently. When it comes to code, while it does a great job in syntactical changes and editing reusable code blocks, it struggles in solving complex architectural problems or working with advanced models. As part of a sports analytics course, I was required to run random forest & XGBoost models on various datasets. While LLMs could help with generating the code template, they struggled to grasp the tuning of hyperparameters.
My professional and academic usage of these tools has often made me wish for certain improvements as well. Firstly, I’d like them to attach sources behind every piece of analysis they do by default and make those links more accessible (and not 3 clicks away or dependent on deep research modes). Secondly, I’d like these tools to have certain modes and variations within those modes. For example, they could have modes such as “Analytical”, “Creative”, “Fact-based”, etc., and the “Creative” mode could have variations such as “High in novelty”, “High in quirkiness”, etc.
The reader must understand that in no way I am trying to complain about the current status of these LLMs. I feel the pace with which GenAI has evolved is astonishing, to say the least. Through this blog post, I’m just aiming to highlight that these tools are messy collaborators — brilliant at exploration, fragile at facts & implementation. Hence, one must be cautious while trying to use them in high stake scenarios or treating them as oracles.
References:
Bick, A., Blandin, A., & Deming, D. (2025). The impact of generative AI on work productivity. Federal Reserve Bank of St. Louis. https://www.stlouisfed.org/on-the-economy/2025/feb/impact-generative-ai-work-productivity
OpenAI. (2023). GPT-4 technical report. OpenAI. https://cdn.openai.com/papers/gpt-4.pdf
Such an interesting story! I have found very similair things on the hallucinating topic. Outside of doing it you are not clear in your query, it also gives answers it thinks you want even if thats not true. For example, you are looking for something like a quote in a uploaded document which is not literally mentioned (a read between the lines kind of situation). Instead of saying this is not mentioned in the document it will make quotes up. I even got a confession from ChatGPT confessing that it gives socially expected answers (it overheard me warning my dad for it when my dad was talking to it).
I’m glad that you found my post relevant! The relatability of my observations and the fact that you got the confession from ChatGPT is indeed concerning.
The point you and Eva are correctly pointing out is definitely one of the largest downsides to AI in its current form. Not only does it have an immediate impact if an AI spits out incorrect data that it just made up, but it amplifies user concerns about the reliability of these services in general. It leads me to automatically second guess and doubt much of the AI’s results at all times. If it would be open about not knowing something then I could move on and save time in my research, but now I am always double checking the AI’s results, which often takes more time then simply googling scientific sources directly. To the coding aspect, I can fully agree, too. It seems the entire tech industry has realized recently that autonomous coding will not happen via AI and the more complex the codebase gets, the more of a time loss AI usage can become for software engineers.