Inscrivez-vous maintenant pour un meilleur devis personnalisé!

Resisting the urge to be impressed, knowing what we talk about when we talk about AI

01 juin 2022 Hi-network.com
By greenbutterfly -- Shutterstock

The barrage of new AI models released by the likes of DeepMind, Google, Meta and OpenAI is intensifying. Each of them is different in some way, each of them renewing the conversation about their achievements, applications, and implications.

Imagen, like DALLE-2, Gato, GPT-3 and other AI models before them are all impressive, but maybe not for the reasons you think. Here's a brief account of where we are in the AI race, and what we have learned so far.

The strengths and weaknesses of large language models

At this pace, it's getting harder to even keep track of releases, let alone analyze them. Let's start this timeline of sorts with GPT-3. We choose GPT-3 as the baseline and the starting point for this timeline for a number of reasons.

Innovation

  • I tried Apple Vision Pro and it's far ahead of where I expected
  • This tiny satellite communicator is packed full of features and peace of mind
  • How to use ChatGPT: Everything you need to know
  • These are my 5 favorite AI tools for work

OpenAI's creation was announced in May 2020, which already looks like a lifetime ago. That is enough time for OpenAI to have created a commercial service around GPT-3, exposing it as an API via a partnership with Microsoft.

By now, there is a growing number of applications that utilize GPT-3 under the hood to offer services to end-users. Some of these applications are not much more than glorified marketing copy generators -- thin wrappers around GPT-3's API. Others, like Viable, have customized GPT-3 to tailor it to their use and bypass its flaws.

GPT-3 is a Large Language Model (LLM), with "Large" referring to the number of parameters the model features. The consensus currently among AI experts seems to be that the larger the model, i.e. the more parameters, the better it will perform. As a point of reference, let us note that GPT-3 has 175 billion parameters, while BERT, the iconic LLM released by Google in 2018 and used to power its search engine today, had 110 million parameters.

The idea for LLMs is simple: using massive datasets of human-produced knowledge to train machine learning algorithms, with the goal of producing models that simulate how humans use language. The fact that GPT-3 is made accessible to a broader audience, as well as commercially, used has made it the target of both praise and criticism.

As Steven Johnson wrote onThe New York Times, GPT-3 can "write original prose with mind-boggling fluency". That seems to tempt people, Johnson included, to wonder whether there actually is a "ghost in the shell". GPT-3 seems to be manipulating higher-order concepts and putting them into new combinations, rather than just mimicking patterns of text, Johnson writes. The keyword here, however, is "seems".

Critics like Gary Marcus, Gary N. Smith and Emily Bender, some of which Johnson also quotes, have pointed out GPT-3's fundamental flaws on the most basic level. To use the words that Bender and her co-authors used to title the now famous research paper that got Timnit Gebru and Margeret Mitchell expelled from Google, LLMs are "stochastic parrots".

The mechanism by which LLMs predict word after word to derive their prose is essentially regurgitation, writes Marcus, citing his exchanges with acclaimed linguist Noam Chomsky. Such systems, Marcus elaborates, are trained on literally billions of words of digital text; their gift is in finding patterns that match what they have been trained on. This is a superlative feat of statistics, but not one that means, for example, that the system knows what the words that it uses as predictive tools mean.

Can the frequency of language, and qualities such as polysemy, affect whether a neural network can suddenly solve tasks for which it was not specifically developed, known as "few-shot learning"? DeepMind says yes.

Tiernan Ray for ZDNet

Another strand of criticism aimed at GPT-3 and other LLMs is that the results they produce often tend to display toxicity and reproduce ethnic, racial, and other bias. This really comes as no surprise, keeping in mind where the data used to train LLMs is coming from: the data is all generated by people, and to a large extent it has been collected from the web. Unless corrective action is taken, it's entirely expectable that LLMs will produce such output.

Last but not least, LLMs take lots of resources to train and operate. Chomsky's aphorism about GPT-3 is that "its only achievement is to use up a lot of California's energy". But Chomsky is not alone in pointing this out. In 2022, DeepMind published a paper, "Training Compute-Optimal Large Language Models," in which analysts claim that training LLMs has been done with a deeply suboptimal use of compute.

That all said, GPT-3 is old news, in a way. The last few months have seen a number of new LLMs being announced. In October 2021, Microsoft and Nvidia announced Megatron -- Turing NLG with 530 billion parameters. In December 2021, DeepMind announced Gopher with 280 billion parameters, and Google announced GLaM with 1,2 trillion parameters.

In January 2022, Google announced LaMDA with 137 billion parameters. In April 2022, DeepMind announced Chinchilla with 70 billion parameters, and Google announced PaLM with 540 billion parameters. In May 2022, Meta announced OPT-175B with 175 billion parameters.

Whether it's size, performance, efficiency, transparency, training dataset composition, or novelty, each of these LLMs is remarkable and unique in some ways. While most of these LLMs remain inaccessible to the general public, insiders have occasionally waxed lyrical about the purported ability of those models to "understand" language. Such claims, however, seem rather exaggerated.

Pushing the limits of AI beyond language

While LLMs have come a long way in terms of their ability to scale, and the quality of the results they produce, their basic premises remain the same. As a result, their fundamental weaknesses remain the same, too. However, LLMs are not the only game in town when it comes to the cutting edge in AI.

While LLMs focus on processing text data, there are other AI models which focus on visual and audio data. These are utilized in applications such as computer vision and speech recognition. However, the last few years have seen a blurring of the boundaries between AI model modalities.

So-called multimodal learning is about consolidating independent data from various sources into a single AI model. The hope of developing multimodal AI models is to be able to process multiple datasets, using learning-based methods to generate more intelligent insights.

OpenAI identifies multimodality as a long-term objective in AI and has been very active in this field. In its latest research announcements, OpenAI presents two models that it claims to bring this goal closer.

The first AI model, DALL

tag-icon Tags chauds: Intelligence artificielle Innovation et Innovation

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.