GPT-4 and Other LLMs: What Large Language Models Bring to the Party

Published July 13, 2023

To continue with our previous blog post, let’s dive into LLMs and their capabilities and limitations.

Large language models like GPT have revolutionized natural language processing in AI. With their remarkable reasoning capabilities, they surpass earlier models in comprehending complex questions. They also facilitate natural conversational interactions and contextual understanding, resembling human-analyst interactions. Moreover, the incorporation of extensions expands their utility. However, alongside their benefits, these models have limitations that restrict their effectiveness in certain domains, including Business Intelligence analysis. Recognizing these strengths and limitations is essential for harnessing the potential of LLMs while acknowledging the unique expertise of human analysts.

Reasoning Engine

For reasons that are not yet fully understood, large transformer-based language models demonstrate a high capacity for comprehension and reasoning. They go far beyond the simple parsing and tokenization capabilities of first-generation language models and can understand the intent and meaning behind a question to apply reasoning to answer novel questions that were not a part of their training dataset. This remarkable capability makes this technology particularly suitable for research questions where a nuanced and complex reasoning process is necessary to formulate a meaningful answer.

Natural Conversational Interactions

Newly developed large language models like GPT also possess the ability to conduct true conversations with a user in which follow-up questions are raised and answered. This allows the user to ask the model to refine the answer based on additional criteria not presented in the original question. This type of back-and-forth interaction dynamic mirrors how users interact with human analysts. They ask an initial question, the analyst responds, and then they often provide follow-ups.

Understanding of Context

The GPT experience is particularly compelling because of the creativity and thoughtfulness in the answers provided and because it can set the proper context for questions. A question can be asked broadly in a general fashion, or the model can be told to constrain its answer to a specific set of source materials or to present results in a certain voice.

Tools & Extensions

One of the most interesting new capabilities of GPT is its ability to incorporate extensions. Much like the App in the Ecosystem significantly increases the utility of a mobile phone, the incorporation of extensions promises to increase the utility of the language model. The extension can provide the model access to additional tools. For example, Wolfram Alpha has made an extensive computational library available as an extension allowing GPT to incorporate complex computations in formulating its answers. Other extensions can be used to perform external actions such as sending an email or a Slack message.

Limitations of Large Language Models

For all their many benefits, large language models also come with some key limitations which significantly limit their effectiveness as a BI Analyst:

High-Volume Data Analysis

While large language models are trained using trillions of tokens with text and conversation from the entire internet, there are significant limitations posed on how much additional information can be provided as the context in forming a question. While the model can process smaller datasets, such as the content of a spreadsheet as input, it is impossible to have the model directly reference the full dataset in a large table, let alone the data in an entire data warehouse, to answer a question.

Applying Data Security

Applying a security model so users only receive results authorized to view is essential for any enterprise data solution. However, language models like GPT do not have the notion of user or role-based security for information access. As a result, data security must be applied to the information that is passed to GPT before it performs its analysis. This provides a further significant constraint on the amount of custom information (i.e., information not already part of the language model) that can be used by language models in formulating responses to queries.

Accuracy & Consistency

While reasoning and creativity are remarkable emergent behaviors of large language models, it has yet to be possible to train the models to generate accuracy and consistency in results. Results are non-deterministic in that if you ask the model the same question multiple times, you are not guaranteed to get the same result. Additionally, large language models continue to be plagued by hallucinations, a phenomenon where the model sometimes gets creative and makes up an answer to a question instead of always grounding responses in real fact. These are attributes that no business user wants in their BI Analyst.

Intuition & Judgement

A good analyst develops an intuition for the data in his organization. He needs to carefully regenerate answers to questions by running SQL statements or reviewing report results. Instead, he uses his internal intuition first to determine if a generated response is consistent with prior known results. If it is inconsistent, he applies additional rigor to the analysis by investigating whether there are data quality issues with the data source or by confirming the results against a different data source. Exercising this type of rigor and judgment in formulating answers to queries is well outside the capability set of today’s language models.

Incorporation of Externalities

A competent human analyst will always consider externalities in forming their response to an inquiry. If a number seems particularly low or high, he might consult with the Data Ops team to see if there are any known issues with the data pipeline. If there are no known issues with the data, he will consider what changes have occurred in the business recently that could affect the results. Finally, he might look at the competitive landscape to explain the results. This aspect of the analysis is crucial to provide meaningful answers to complex questions, and it is something that is not possible with today’s large language models.

Conclusion

Large language models like GPT-4 have significantly advanced natural language processing by offering key capabilities. They possess a remarkable reasoning engine that goes beyond simple parsing, comprehends the intent behind questions, and applies reasoning to answer novel inquiries. These models can engage in natural conversational interactions, allowing users to refine answers through follow-up questions, resembling interactions with human analysts. Additionally, they demonstrate an understanding of context, providing creative and thoughtful responses while accommodating specific criteria or source materials. The incorporation of extensions further enhances their utility, granting access to additional tools and enabling actions like complex computations or communication. These advancements make large language models valuable for research, conversations, and improved decision-making.

Thought Piece