Compare the performance of artificial intelligence chatbots with this platform

By James On Jun 23, 2023

Compare the performance of artificial intelligence chatbots with this platform| The existence of many different chatbots has made it difficult to choose the best possible option, and this platform can simplify their comparison.

Since the popularity of ChatGPT last November, many other chatbots have been launched that act as competitors to ChatGPT. These chatbots differ in terms of LLM (large language model), price, user interface, internet access, and more, and to make it easier to compare them, a research organization called Model Systems Organization was founded by students and professors of “The University of California, Berkeley” has unveiled Chatbot Arena.

Chatbot Arena is a benchmarking platform for large language models that allows users to compare chatbots against each other. To test chatbots with this platform, users first have to submit a request, and then two models randomly provide answers, and users choose the best answer without knowing the LLM of each model.

Compare the performance of artificial intelligence chatbots with this platform

After users select the best answer provided by one of the two chatbots, its name is displayed.

For example, in the following experiment, two chatbots are asked to write a leave request letter. After providing different answers and selecting the desired option, you will notice that one of the chatbots is called koala-13b and the other is called vicuna-13b.

The best chatbot

Then, on the leaderboard page, the ranking of all LLMs is displayed, which is greatly influenced by the results of user tests and uses the Elo rating system, which is used in the field of calculating the skill level of sports players. According to this page, OpenAI’s GPT-4 is currently the most advanced LLM and has an Elo score of 1227. Claude-v1, developed by Anthropic, is in second place with a score of 1227.

In the eighth place of this list is the name PaLM-Chat-Bison-001, which is one of the subsets of PaLM 2, the large language model of Google’s Bard artificial intelligence.

Additionally, the “ChatBot Arena” website has another section where you can test a specific chatbot or compare two specific models. This feature can be useful if you want to test a specific LLM.