Google’s PaLM 2 model becomes five times stronger

By James On May 19, 2023

PaLM 2 model is trained based on 3.6 trillion tokens and this amount is 780 billion tokens for the previous version.

Google’s new Large Language Model (LLM), PaLM 2, unveiled by the tech giant last week, will use nearly five times more training data than its predecessor from 2022 to help it perform tasks such as coding, math, and writing. Be more creative.

According to internal documents seen by CNBC, the PaLM 2 model introduced at the Google I/O conference was trained on 3.6 trillion tokens. Tokens, which are actually strings of words, are an important element for teaching LLMs, as AI models can use them to predict the next words.

The previous version of Google’s PALM, which stands for Pathways Language Model, was released in 2022 and was trained with 780 billion tokens.

While Google is eager to show off the power of its AI technology and how it’s embedded in search, email, word processing, and spreadsheets, it has been reluctant to release the extent or other details of its training data. OpenAI, the creator of ChatGPT, has also kept the specifications of its latest LLM, GPT-4, under wraps.

The main reason for not disclosing this information is the competitive nature of their AI models. Google and OpenAI are already competing to attract users to their big language models who plan to use their conversational chatbots instead of using traditional search engines.

When announcing PaLM 2, Google claimed that the new model is much smaller than before, meaning the search engine giant’s technology has become more efficient while handling more complex tasks. According to internal documentation, PaLM 2 has been trained on 340 billion parameters, which shows the complexity of this model. The initial version of PaLM was trained on 540 billion parameters.

Google also said in its blog post about PaLM 2 that the model uses a “new technique” called “computational optimization scaling” that makes LLM “with better overall performance, including faster inference, fewer parameters to render, and lower cost to To perform services more efficiently.