GPT-5 Tested: Structured and “No-Nonsense” Thinker

GPT-5 has been available to all ChatGPT users since the end of last week. What does OpenAI’s latest model have to offer? We compare it with its predecessor GPT-4o.

OpenAI announced the long-awaited GPT-5 model with much fanfare last week. The new language model is expected to provide more accurate answers and better handle complex questions than the previous generation. OpenAI is confident in GPT-5 and has made it the default option for ChatGPT immediately.

The initial reactions to GPT-5 are mixed. Users had to adjust to the new conversational style of ChatGPT. At users’ request, OpenAI brought back GPT-4o: with a simple click, you can switch between different versions. The ideal opportunity for us to compare both models side by side.

How to Replace GPT-5 with GPT-4o

One Uniform System

First, we take a brief look under the hood of GPT-5. Like previous versions, GPT-5 is not a single model. You have the “basic model” gpt-5-main, gpt-5-thinking for prompts requiring more complex reasoning, and gpt-5-pro. The latter is reserved by OpenAI for those with the most expensive Pro subscription.

What is unique about GPT-5 is that the models are linked connected through one uniform system. In principle, you don’t need to select a model in the ChatGPT menu. GPT-5 determines based on your prompt which model is most suitable to answer it. This way, ChatGPT uses its resources more efficiently than before.

GPT-5 should generally perform better than GPT-4o on various tasks and hallucinate less. On almost every benchmark shared by OpenAI at the announcement, the new model performs better. What you might notice more quickly is that GPT-5 adopts a different style. GPT-4 is known as a submissive yes-man, while GPT-5 positions itself more as a sparring partner that dares to disagree.

GPT-5 vs GPT-4: a Comparison

Using a few tests, we compare GPT-5 and GPT-4o. The tests assess various skills, including reasoning. Both models are presented with identical prompts each time. We conduct the test using English prompts, but you can redo this test yourself in a language of your choice.

Academic

OpenAI CEO Sam Altman describes GPT-5 as a “doctoral student”, so for the first test, we ask ChatGPT to explain how quantum computers work at an academic level. GPT-5 takes on the role of an academic and explains the structure. GPT-4 lacks a clear conclusion that summarizes the essence once more.

In a follow-up prompt, we ask the models to explain quantum computers as if they were standing in front of a kindergarten class and to use a visual. GPT-5 understands that a young audience benefits more from a clear image than a lengthy explanation. GPT-4o is its overenthusiastic self, but the explanation contains little to no supporting visuals and would quickly lose a group of kindergartners.

Thinking in Steps

In the next test, we examine reasoning capabilities. Both models are trained to “think” in multiple steps, allowing you to ask complex questions. GPT-5 shows you the reasoning process and even uses a stopwatch.

As a first example, we ask to create a training and diet plan. The fictional person has no running experience and bad knees but wants to run a marathon in exactly one year. To make the diet plan more challenging, the person is gluten-free and exclusively vegan.

Both models divide the training schedule into four phases to work step by step towards a marathon. Both diet plans are also divided according to a daily schedule and optional supplements that can be taken. GPT-5 has a slight advantage because it provides extra tips to protect the bad knees, something GPT-4 somewhat overlooked.

After the effort comes relaxation. Now we ask ChatGPT to help us with our travel plans. We test two scenarios. First, we want to travel to Australia. We ask for a travel itinerary with some sights we definitely want to see. It’s up to ChatGPT to find out the best travel period, search for the best prices, and efficiently plan the route.

Again, we receive a comprehensive overview twice. Both GPT-5 and GPT-4 choose September as the ideal travel month and come up with a more or less similar daily schedule for three weeks. GPT-5’s travel plan is more detailed with tips on how to find the best prices for flights and accommodation.

Now we stay closer to home and want to drive from Brussels to Athens with an electric car. We ask ChatGPT to design a map and mark where we should charge, based on the car’s range. GPT-5 plays it safe and suggests 11 stops. GPT-4o thinks it can be done in nine stages but proposes distances that the car can’t cover without an extra charge. GPT-5 makes the travel plan more visual than GPT-4.

Math and Coding

From reasoning, we move to math. We have GPT-5 and GPT-4o solve a complex mathematical equation and show the intermediate steps, like a math test. Both models arrive at the correct solution, but GPT-5 requires fewer intermediate steps. We see the same with a mathematical puzzle where the models have to find a number. GPT-4 has to “try” more numbers to find the solution than GPT-5.

GPT-5 should be better at coding, so we test this with a simple coding task. It involves writing a PowerShell script to check disk space. Both models provide you with a ready-to-use script that you just need to copy and paste. GPT-5 distinguishes itself in this test by also providing you with a guide on how to execute the script using Notepad and PowerShell.

Creative Writing

Finally, we let the models get creative. We ask GPT-5 and GPT-4o to write a declaration of love to ITdaily. First, the models have complete artistic freedom, then they must create an acrostic that spells “ITdaily” with the first letters of each sentence. GPT-5 shows its creative side, but GPT-4 puts just a bit more heart and soul into it.

As a final experiment, we delve deeper into the “emotional intelligence” of the models. We ask the models how they would comfort a friend whose grandmother has just passed away. GPT-5 approaches this analytically and gives you five practical tips to deal with the situation. Writing a comforting message is less within GPT-5’s specialization: you still do this better yourself.

Final Verdict

In most tests, GPT-5 emerges as the winner for us. The new model provides more comprehensive and structured answers and takes you through its thought process. GPT-5 also succeeds better in making the output visual. The “business-oriented” tone feels more distant but more realistic than the sometimes exaggerated enthusiasm of GPT-4. You benefit more from an AI assistant that positions itself as a sparring partner than from AI trying to be your friend.