By Nicholas Larsen, International Banker
Few technological breakthroughs last year captured the world’s attention more comprehensively than ChatGPT. Launched in November by the Microsoft-backed artificial intelligence (AI) company OpenAI, the generative language model (also known as large language model or LLM) has become one of the most rapidly adopted consumer applications ever, with more than 100 million monthly active users recorded as of February. But do ChatGPT’s capabilities justify its enormous popularity, especially when Silicon Valley is desperate to hype up the next big trend in the same way it did for the likes of the metaverse and non-fungible tokens (NFTs)?
According to ChatGPT, it “is a large language model that has been trained on a dataset of conversational text. It can generate human-like text in response to prompts given to it. It is based on the GPT (generative pre-trained transformer) architecture and has been fine-tuned for various use cases such as question answering, language translation and text completion.” Simply put, posing a query to ChatGPT should elicit a suitably detailed response, along with some background context.
Much of the buzz surrounding ChatGPT thus far has been due to the high standards of intelligence and intuition it has demonstrated across various use cases. Whether writing code, creating new text, image or video content, or answering a broad range of questions, this conversational AI tool has already proven to generate highly useful and unique content using machine learning (ML) and natural language processing (NLP). Elon Musk, one of OpenAI’s co-founders (who left in 2018), tweeted in December: “ChatGPT is scary good. We are not far from dangerously strong AI.”
A paper published on March 2 by economists Shakked Noy and Whitney Zhang of the Massachusetts Institute of Technology (MIT) further underscores the potential benefits of ChatGPT. The research examined the productivity effects of ChatGPT in the context of mid-level professional business-writing tasks assigned to 444 college-educated professionals, half of whom were exposed to the chatbot. “Our results show that ChatGPT substantially raises average productivity: time taken decreases by 0.8 SDs (standard deviations) and output quality rises by 0.4 SDs,” the paper found. “Inequality between workers decreases, as ChatGPT compresses the productivity distribution by benefiting low-ability workers more. ChatGPT mostly substitutes for worker effort rather than complementing worker skills, and restructures tasks towards idea-generation and editing and away from rough-drafting. Exposure to ChatGPT increases job satisfaction and self-efficacy and heightens both concern and excitement about automation technologies.”
Such positive results only strengthen the case for the extensive use of ChatGPT across various industry applications in the future. Journalism represents just one such example of an industry that could be significantly impacted by the sheer power of ChatGPT and other chatbots; the technology is being considered for writing certain types of articles for news publications. “We’ve tasked a working group, across our tech and editorial teams, to explore the potential and limitations of machine-learning, such as ChatGPT,” Jim Mullen, the chief executive of Reach, the publisher of UK tabloid newspapers the Daily Mirror and Daily Express, told the Financial Times on February 19. “We can see potential to use it in the future to support our journalists for more routine stories like local traffic and weather or to find creative uses for it, outside of our traditional content areas.”
Such profound applications of ChatGPT show just how far the possibilities for its deployment could extend. Indeed, its output has often been impressively detailed, accurate and sophisticated, thanks to the rigorous training on substantial volumes of text data that the LLM undergoes. This training involves pattern recognition, such that the model compiles internal “embeddings” of words and then learns to predict, with increasing accuracy, the next most likely word to produce for any given word or phrase. But ChatGPT goes further by enabling a technique known as reinforcement learning from human feedback (RLHF), which allows humans to intervene in the LLM training process to account for any biases present within the training data.
But does adding the human touch give ChatGPT a human-like depth of understanding of the world? Not quite. While it may be trained on copious amounts of data to deliver dispassionate, logical outcomes, the chatbot can still not discern emotion and reason with the same nuances as humans. Given its propensity towards logic, it is also likely to struggle to discern the more complex techniques employed in language, such as rhetorical questions, irony and sarcasm.
What’s more, not everyone is as ebullient over ChatGPT’s potential as the hype might suggest, with some industries remaining dubious over whether it can contribute meaningful improvements over what existing operations are delivering. The travel industry, for instance, is increasingly employing AI to perform largely customer-service-oriented tasks, such as online chatbot assistance, to boost operational efficiency. But according to recent remarks by Rathi Murthy, chief technology officer of travel company Expedia Group, ChatGPT remains rather limited at present regarding the extent of possible industry use cases. She recently told journalists that while her company invests in AI technologies, ChatGPT assistance extends to little more than helping customers with pre-trip planning. “We don’t see generative AI, today, being able to replace this whole gamut of services. Planning a trip is just way more complex than just a bunch of questions,” Murthy explained.
And as recent evidence shows, there have been as many disappointments with such AI-powered chatbots as successes, if not more. Google, for example, received a decidedly unenthusiastic reception towards its own ChatGPT rival—a chatbot named Bard—which was announced in February. When the online launch event included a video demonstration of Bard saying that the James Webb Space Telescope (JWST) “took the very first pictures of a planet outside of our own solar system”, astronomers were quick to note that this was factually incorrect.
Similarly, Facebook’s parent company, Meta, had to shutter its own Galactica AI model, which the company claimed could “store, combine and reason about scientific knowledge” after it generated false and misleading scientific information by eliminating certain categories of research, scientists found. Meta touted its AI model to be able to perform several scientific tasks capably, such as solving complex equations and summarising scientific papers. “In all cases, it was wrong or biased but sounded right and authoritative,” Michael Black, director of the Max Planck Institute for Intelligent Systems, tweeted, referencing Galactica’s flawed scientific output. “I think it’s dangerous.”
And University of Washington (UW) computer science researcher Willie Agnew also noted that queries in Galactica, including “queer theory”, “racism” and “AIDS”, yielded no results. “We appreciate the feedback we have received so far from the community, and have paused the demo for now,” Meta responded. “Our models are available for researchers who want to learn more about the work and reproduce results in the paper.”
Even ChatGPT has delivered nonsensical responses to seemingly straightforward questions. “LLMs can also get their facts wrong since they are not trained to distinguish between true and false statements, only to predict statistically likely sequences of words,” according to Gary N. Smith, the Fletcher Jones Professor of Economics at Pomona College and researcher and author of 16 books (most recently, Distrust: Big Data, Data-Torturing, and the Assault on Science). In an article for AI news publication Mind Matters, Smith recounted a particularly erroneous interaction with ChatGPT:
Human: How many bears have Russians sent into space?
ChatGPT: According to estimates, about 49 bears have been sent into space by Russia since 1957.
Human: What were their names?
ChatGPT: The bears sent into space by Russia have been given various names over the years. Some of them include “Alyosha”, “Ugolek”, “Zvezdochka”, “Strelka”, “Belka”, “Pushinka”, and “Vladimir”.
Human: Do you have references for this?
ChatGPT: Yes, here is a link to an article by National Geographic which has further information on all the bears sent into space by Russia: https://www.nationalgeographic.com/animals/2019/05/space-traveling-bears-soviet-union-ussr/.
As Smith pointed out, “Russia has sent no bears into space, and the reference provided by ChatGPT is completely made up.”
Such shortcomings aside, it remains understandable why ChatGPT is receiving so much attention and praise, given the sheer power of its AI-based language model and the laudable results it has already achieved. With AI and ML set to be further integrated into our lives during the coming years—ostensibly for the purpose of “improvement”—it seems inevitable that AI-powered chatbots will play a more crucial role in this process.