Justus

No ratings yet.

Many blog articles and conversations around Artificial Intelligence (AI) and more specifically Large Language Models evolve around ChatGPT and for a good reason. However, through an architecture like Retrieval Augmented Generation short RAG, the performance of Large Language Models (LLMs) can be vastly improved, making it especially interesting for company applications. RAG functions as an information retrieval system acting between the LLM like ChatGPT and the enterprise content like documents, images or audio. Based on the input prompt, only the relevant snippets of context out of the whole database are fetched and provided to ChatGPT to ground the response. This leads to increased accuracy and can furthermore be used as a stand-alone solution without input from outside data. (HeidiSteen, 2023)

*Simplified function of Retrieval Augmented Generation (Barak, 2023)*

RAG can therefore transform companies access to their own data, by leveraging this knowledge to generate more relevant responses for employees or even customers. Additionally, this reduces common drawbacks of LLMs like hallucinations or lack of domain-specific knowledge (Jiang et al., 2023). The great part here is that through Microsoft’s partnership with OpenAI, all OpenAI models, like gpt-35-turbo or gpt-4 are embedded into the Azure ecosystem. The Azure OpenAI ecosystem allows users to ground ChatGPT models with RAG, through the Azure Cognitive Search service with just a few clicks, given that the data is stored in one of Azures cloud solution like Data Lake v2, Blob storage or an SQL database. (Balapv, 2023)

From my own experience of implementing such a solution to answer thousands of support tickets every week with the use of a gpt-35-turbo model grounded in a extensive FAQ database, the results are quite impressive, outperforming well trained human support agents. Even role-based access is possible, allowing for a differentiation in access levels to different types of company documents.

In conclusion, LLMs like ChatGPT are just the beginning and augmented with the right architectures like RAG in the right infrastructure solution like Microsoft Azure, companies can quickly transform their access to their own data and even build business applications. Through the service Microsoft 365 Copilot which will be available to the general public this November, the RAG method using GPT-4 will come standard for every account. Giving each user these tools throughout the whole Microsoft suit including apps like OneDrive, Outlook and Teams. I would appreciate your experience with Microsoft’s AI solutions and possibly you own thoughts or even projects with Retrieval Augmented Generation.

References

Balapv. (2023, July 31). Retrieval Augmented Generation using Azure Machine Learning prompt flow (preview) – Azure Machine Learning. Microsoft Learn. https://learn.microsoft.com/en-us/azure/machine-learning/concept-retrieval-augmented-generation?view=azureml-api-2

Barak, N. (2023, July 19). Information Retrieval For Retrieval Augmented Generation | Towards AI. Medium. https://pub.towardsai.net/information-retrieval-for-retrieval-augmented-generation-eaa713e45735

HeidiSteen. (2023, September 13). RAG and generative AI – Azure Cognitive Search. Microsoft Learn. https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview

Jiang, Z., Xu, F. F., Gao, L., Sun, Z., Liu, Q., Dwivedi-Yu, J., Yang, Y., Callan, J., & Neubig, G. (2023). Active Retrieval Augmented generation. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2305.06983

No ratings yet.

While OpenAIs ChatGPT has found great popularity with the general public, companies have also been increasing the integration of OpenAI models into business applications. The use of Large Language Models (LLMs) offered by OpenAI provide great benefit to a variety of tasks, especially in the field of customer service, where LLMs help overcome limitations of current chatbots, like scripted responses or fixed decision trees. (Bilan, 2023) As I find myself developing business applications for a company using OpenAIs largue language models, I was surprised to learn that the GPT-3.5 and GPT-4 models seem to be getting worse over time according to some metrics (Chen, Zaharia, & Zou, 2023). Given these findings I would like to use this blog article to share my thoughts as well as those of experts on the issue.

For context, the research paper evaluated the performance of the March 2023 and the June 2023 model versions, by looking at seven different tasks including math problems, generating code and visual reasoning. The below shown excerpt of the results show that, differences in the March and June version can differ for GPT-4 and GPT-3.5 as seen by test e. Here GPT-4s performance increased significantly, while GPT-3.5s performance decreased.

Excerpt of research results by Chen, Zaharia, & Zou, (2023)

Through these findings over a relatively short period of time, a lack of transparency regarding model updates and their performance become evident. This lack of transparency can be a big issue for companies developing and operating business application with OpenAi’s models. For example, adopting and fine tuning a model for code generation in March of 2023 only to find the models performance to be not sufficient anymore in June of 2023. Even with continuous monitoring of the model’s quality, a feedback loop would need to exist to mitigate the effects of quality decline.

While the results do seem to have strong consequences for certain business applications, some experts critique the methodology used in the paper like the temperature setting when executing the prompt (Wilson, 2023) or the difference between capabilities and behavior of a LLM (Narayanan, & Kapoor, 2023). I personally believe that applications build using LLMs are not a build-it-and-forget-it tool. They must be constantly monitored for quality of output, which is to be defined per specific use case. Given the black box nature of LLMs this can be a challenging task. However, given the rapid releases of models within the last year, I believe more work will be done regarding stable models and quality control as well as a definition of certain performance metrics. I would appreciate your thoughts on the issue, where you aware of the degrading and do you think it has an effect on companies’ ability to implement LLMs into business applications?

Resources

Bilan, M. (2023, September 26). Statistics of ChatGPT & Generative AI in business: 2023 Report. Master of Code Global. https://masterofcode.com/blog/statistics-of-chatgpt-generative-ai-in-business-2023-report#:~:text=49%25%20of%20companies%20presently%20use,than%20%2475%2C000%20with%20the%20technology.

Chen, L., Zaharia, M., & Zou, J. (2023). How is ChatGPT’s behavior changing over time?. arXiv preprint arXiv:2307.09009

Narayanan, A., & Kapoor, S. (2023, July 19). Is GPT-4 getting worse over time? AI Snake Oil. https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-time

Wilson, S., (2023, July, 19). https://twitter.com/simonw/status/1681733169182277632

How Retrieval Augmented Generation is transforming companies access to its own knowledge

12

Please rate this

ChatGPTs Performance Decline: A Wake-Up Call for Businesses?

11

Please rate this