Large Language Models Do Not Constitute "Artificial Intelligence"

Introduction

The ChatGPT models developed by OpenAI, which have popularized Large Language Models (LLMs) among end users, are commonly simplified under the term "artificial intelligence" by most end users, and in some cases are even regarded as artificial intelligence itself. This is a fundamentally flawed approach; Large Language Models, which utilize the relatively simple infrastructure called Transformers, represent merely one type of artificial intelligence. However, this misconception is understandable, as Large Language Models, being designed with human-like characteristics, can rapidly replace, are in the process of replacing, and in some positions have already replaced what we might call "legacy" artificial intelligences and automated processes. Large Language Models are anthropomorphic, and understanding and integrating them into work processes is no different from training a human being.

AGI Endeavors and Prevailing Misconceptions

Consequently, I find the contemporary efforts to invent Artificial General Intelligence (AGI)—essentially recreating God in the world—to be misguided. Regarding Large Language Models, or in popular parlance "Artificial Intelligence," an academic paper states: "Users have observed that Artificial Intelligence differs significantly from traditional search engines, with ChatGPT offering more than a chatbot by performing various functions and engaging with people for hours."¹ We observe a similar consciousness infrastructure here: a model of divine power that has emerged suddenly, causing astonishment at its invention. Large Language Models, being anthropomorphic, cannot attain divine status; they are flawed like every human being. To understand this, we must first examine the historical development of Large Language Models.

Transformer Architecture and the Attention Mechanism

"In this work we propose the Transformer, a model architecture eschewing recurrence and instead relying entirely on an attention mechanism to draw global dependencies between input and output."² The paper "Attention is All You Need," published in 2017, can be considered the genesis of modern Large Language Models, particularly when compared to inefficient previous attempts such as RNN (Recurrent Neural Networks) and LSTM (Long Short-Term Memory), through its undeniably effective utilization of the "Attention" mechanism and feature. Our first definitive understanding emerges here: when a human is born, they pay attention to their surroundings and learn; Large Language Models are not fundamentally different in this regard.

Completion Mechanism and Hallucination Phenomena

Examining this seminal paper, we can arrive at our second definitive understanding about Large Language Models: Large Language Models operate according to a "completion" mechanism, much like an infant and a child beginning to speak. They identify relationships between input and output, and continue generating words and sentences toward a determined conclusion by carefully predicting the next token (symbol). However, these inputs and outputs may not always be logical. For instance, the paper "Language Models are Unsupervised Multitask Learners," which introduced the GPT-2 model, contains the following notable observations: "Investigating GPT-2's errors showed most predictions are valid continuations of the sentence, but are not valid final words."³ and "While GPT-2's performance is exciting for a system without any supervised training, some inspection of its answers and errors suggests GPT-2 often uses simple retrieval based heuristics such as answer with a name from the document in response to a who question."⁴ As evidenced by these statements, Large Language Models are affected by what we term hallucination. This effect resembles the information that a child's brain learns and processes—it increases as learning occurs, as reinforcement takes place, and as diversity improves qualitatively, while hallucination (the fabrication of information, the process whereby models produce "logical"-appearing but incorrect results without sufficient information) decreases (but never disappears entirely).

GPT-3 and Fine-Tuning Methodologies

This phenomenon was reinforced by the paper "Language Models are Few-Shot Learners," which introduced GPT-3: "Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning." The quoted passage mentions "fine-tuning." Fine-tuning, though it may sound like a technical term, can be directly compared to a human's ability to construct and synthesize logical sentences through their own volition, and to comprehend meaning. DialoGPT represents one of the first publicly available examples of this (based on GPT-2), but the models of contemporary significance are the proprietary InstructGPT and ChatGPT 3.5.

Addressing Misinformation and Establishing Accurate Chronology

Returning to the paper we initially referenced and critiqued: "The introduction of ChatGPT's 3.5 version as a free service in 2023 enabled this concept to rapidly gain prominence both in our country and globally."⁵ This information is both incorrect and misleading. Primarily, GPT 3.5 was made available to end users in late 2022. Prior to this, fields including NLP (Natural Language Processing), along with HuggingFace Transformers (recalling the Transformers library introduced in 2017), were already established—though experimental—technologies that had not yet reached the necessary technological maturity for widespread adoption.

The Educational Development Analogy

Large Language Models have undergone development in fine-tuning since the 2020s, much like a human beginning school. Examining the following quotations: "To make our models safer, more helpful, and more aligned, we use an existing technique called reinforcement learning from human feedback (RLHF). On prompts submitted by our customers to the API, our labelers provide demonstrations of the desired model behavior, and rank several outputs from our models. We then use this data to fine-tune GPT-3."⁶ "Supervised fine-tuning (SFT). We fine-tune GPT-3 on our labeler demonstrations using supervised learning. We trained for 16 epochs, using a cosine learning rate decay, and residual dropout of 0.2. [...] Despite overfitting after 1 epoch, additional training improves both RM scores and human preference ratings."⁷ We observe that InstructGPT's fine-tuning situation perfectly aligns with the "beginning school" analogy. It should be noted that InstructGPT is not the sole Large Language Model; other Large Language Models such as BLOOM (we might use the analogy of classmates) emerged during the same period, and developments in Large Language Models began to gain momentum and diversity. These diversifications are known today through LlaMa and Mistral.

Achieving Doctoral-Level Performance

Quoting from a contemporary blog post: "In our tests, the next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions."⁸ In our analogy that "Large Language Models are anthropomorphic," we observe that Large Language Models have now reached the doctoral student level, and this comparison is also made by OpenAI itself.

Conclusion

Large Language Models today perform their anthropomorphic tasks with their anthropomorphic origins through techniques such as Fine-Tuning, Retrieval Augmented Generation (RAG), and Chain of Thought in a flawed manner, yet with considerable success. However, there is not yet an AGI model based on or evolved from Large Language Models, nor a singular "ARTIFICIAL INTELLIGENCE"; because Large Language Models are anthropomorphic. They are not divine. Consequently, Large Language Models do not constitute artificial intelligence; because they are flawed like human beings.

References

1. Aksu, Ferhat. "The Future of Professions with Artificial Intelligence: ChatGPT Artificial Intelligence Chatbot Example," Anatolia Science and Technology Journal, 2024, p. 25.
2. Vaswani, Ashish, et al. Attention Is All You Need. 2017. arXiv, https://arxiv.org/abs/1706.03762.
3. Radford, Alec, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019.
4. Radford, Alec, et al. Language Models are Unsupervised Multitask Learners. OpenAI, 2019.
5. Aksu, Ferhat. "The Future of Professions with Artificial Intelligence: ChatGPT Artificial Intelligence Chatbot Example," Anatolia Science and Technology Journal, 2024, p. 28.
6. OpenAI. "Training Language Models to Follow Instructions with Human Feedback." 2022. OpenAI.
7. OpenAI. "Training Language Models to Follow Instructions with Human Feedback." 2022. OpenAI.
8. OpenAI. "Introducing OpenAI O1 Preview." OpenAI, 2024.