Google is using competitor Anthropic’s AI model to evaluate the performance of its own AI model Gemini.
According to internal correspondence accessed by TechCrunch, employees of Google’s Gemini AI would compare the model’s responses with those of Claude, an AI model from competitor Anthropic.
Safety first
Workers rate both models on criteria such as truthfulness and safety. They are given up to 30 minutes to decide whether Gemini or Claude’s answer scores best. Claude does better on security, according to Google’ s internal chat. Gemini generates explicit content in certain cases, but marks those responses as “major security breaches.” Claude simply does not respond to the prompt in those same cases.
Anthropic prohibits customers in its terms and conditions from using the model to develop competing products without permission. It remains unclear whether Google received that permission. A spokesman for Google Deepmind denies that Claude is being used to train Gemini.
Comparing Gemini to Claude would not be surprising. Several months ago, Claude 3.5 Sonnet performed better in test than GPT-4 and Google Gemini Ultra in several areas, including general knowledge, reasoning and coding. Those benchmarks are not 100% accurate, but they do show that most models are quickly coming up to par.