Large language models (LLMs) such as OpenAI’s ChatGPT continue to revolutionize our relationship with technology as they expand in their capacities to generate digital content for a variety of user applications. Quite famously, however, they often struggle with math problems, hallucinating verifiably false answers with unquestioning confidence. This seems like odd behavior to the many lay users who (rightly) understand that computers operate on deterministic principles, just like arithmetic. If that’s true, then why does this problem continue to plague even the newest models? The answer is that treating every string of text (even ones that look like math problems) as a deterministic mathematical calculation is at odds with how the stochastic machinery that underpins modern LLM architecture actually works.
This series will attempt to answer the question of how we got from deterministic to probabilistic approaches (part 1), from probabilistic ones back to the problem of hallucinating math answers (part 2), and will end by sketching some ways in which contemporary AI research has attempted to address these issues (part 3).
To begin, let’s look at how AI research has shifted from deterministic, traditional algorithmic methods (“symbolic” AI) to the probabilistic, stochastic approach that underlies modern LLM’s like ChatGPT. This shift was precipitated by both a change from purely symbolic to statistical, “subsymbolic” approaches, as well as resultant cycles of downturns in investment and waning academic interest. These eras have been posthumously referred to as “AI Winters.”
AI Winters: A Brief History of Symbolic Approaches to AI
The term "AI winter" refers to a period of reduced funding and interest in artificial intelligence research. Despite its more recent appearance in the limelight, the history of AI research has been marked by extended periods of stagnation. The first “AI Winters” followed the boom in funding and research that surrounded work using rule-based, traditional algorithmic approaches that were intended to produce knowledge about relatively well-constrained and specific domains, with the ultimate goal of reverse engineering the knowledge available from human experts.
Beginning in the late 1950s and early 1960s and culminating with the broad adoption in various industries of so-called “expert systems” that were developed in the 1970s and 1980s, these systems were designed to take “knowledge bases” — consisting largely of if-then conditionals reducible to first-order logic — and then use “inference engines” to further deduce conclusions about a given knowledge base. Importantly, the very versions of a “chatbot” (like 1966’s ELIZA) was one built using this symbolic approach.
By the '70s and '80s, such systems were bought and deployed by a number of companies quite successfully, but became increasingly expensive and limited as personal computers became more popular and affordable, thus replacing the need for single use hardware and software solutions designed to run these expert systems. In addition, researchers increasingly encountered limitations in hardware for the compute required for increasingly larger datasets for knowledge bases. Finally, increasing acknowledgement of issues brought on by the commonsense knowledge problem exacerbated the threat of future issues with computational complexity, as it became clear that a number of problems (computer vision, robotics) required far too much data to be modeled with purely rules-based systems, or that the problem domains themselves were too poorly understood to be modeled in the top-down way required by the symbolic, purely-rules based approach.
The cumulative effects of these developments resulted in disillusionment by investors — no doubt also fueled by overly optimistic evaluations of the research that had proliferated since the 60s that had failed to materialize, including a number of pronouncements from Marvin Minsky himself. In 1975, the National Research Council ultimately cut funding for DARPA’s AI research spearheaded by Minsky following the passage of the Mansfield Amendments of 1969 and 1973, which specified that military research funding that had previously been available with fewer limitations was now required to be explicitly connected to short-term, specific military applications. Combined with the devastating publication of the Lighthill Report in the UK, these and other related developments led to the stagnation of funding and public and private interest in AI research that ultimately led to the first “AI Winter” in the mid-70s.
A second “winter” followed the subsequent widespread adoption of the aforementioned “expert systems” in the late '80s and early '90s as issues piled up again with purely symbolic approaches. These issues included the threat of combinatorial explosion (exacerbated by increasing recognition of the commonsense knowledge problem), research in robotics and embodied cognition (such as that of Rodney Brooks and Hans Moravec), and the increasing feasibility of using personal computers in lieu of more specific machines (like Lisp machines) for business applications.
Attacks on the symbolic approach were not limited to its public and institutional failures, either. Technical successes based on the new theories in cognitive science (such as Yann LeCun’s application of neural networks for handwritten digit recognition) heralded a larger shift towards understanding machine intelligence in a way that did not share the assumptions of the older view that equated intelligence with agents who engage purely in rules-based, symbolic manipulation that yielded deductive certainty. That shift would be largely complete by 2017 following the landmark publication of “Attention Is All You Need.”
For more of the story of how we’ve gotten to where things are now, stay tuned for part 2 of this series.