Contractors assigned to enhance Google’s Gemini AI have begun comparing its performance with outputs from Anthropic’s Claude, as revealed in internal communications obtained by TechCrunch.
When TechCrunch reached out for comments, Google did not clarify whether it had permission to use Claude for these comparative tests with Gemini. As technology firms strive to advance their AI models, they generally assess performance against rivals by leveraging industry benchmarks instead of relying on contractors for direct comparisons of AI responses.
The team of contractors responsible for evaluating Gemini’s outputs must assess each response based on several criteria, including accuracy and verbosity. They are permitted up to 30 minutes per prompt to determine which AI model provides a superior answer: Gemini or Claude.
Recently, these contractors observed references to Claude within Google’s internal platforms during their assessments against other unnamed AI models. One specific example included a response that identified itself as: “I am Claude, created by Anthropic.”
In some discussions, contractors noted that Claude’s responses tended to prioritize safety more heavily than those from Gemini. One contractor remarked, “Claude’s safety settings are the strictest” among available AI models. For instance, Claude frequently refused to engage with prompts it deemed unsafe, such as those requesting role-playing scenarios involving another AI assistant. Conversely, Gemini’s output was flagged as a “huge safety violation” for containing inappropriate content like “nudity and bondage,” while Claude opted not to respond to that type of prompt at all.
Anthropic’s commercial terms explicitly prohibit customers from utilizing Claude “to build a competing product or service” or “to train competing AI models” without explicit permission from Anthropic. It is worth noting that Google is a prominent investor in Anthropic, raising questions about the nature of their collaborations.
When approached by TechCrunch, Shira McNamara, a spokesperson for Google DeepMind, which oversees Gemini, refrained from disclosing whether Google had received approval from Anthropic to access Claude. An inquiry made to Anthropic for comments also went unanswered before publication.
McNamara stated that DeepMind does engage in “comparing model outputs” for evaluation purposes; however, she emphasized that Gemini is not trained using Anthropic’s models. “In line with standard industry practice, we sometimes compare model outputs as part of our evaluation process,” McNamara explained. “However, any implication that we have utilized Anthropic models to train Gemini is incorrect.”
In a recent report by TechCrunch, it was highlighted that contractors involved with Google’s AI projects are being tasked with rating Gemini’s AI responses in areas that exceed their specific expertise. This raises concerns, particularly regarding the potential inaccuracy of Gemini’s outputs on highly sensitive issues, such as healthcare.
For those with tips to share, feel free to reach out securely through Signal at +1 628-282-2811.
TechCrunch also features a dedicated AI newsletter! Subscribe here to receive updates every Wednesday directly in your inbox.
Summary of Key Points
- Comparison of AI Models: Contractors are assessing Google’s Gemini AI outputs against Anthropic’s Claude.
- Evaluation Criteria: Responses are scored based on truthfulness and verbosity, ensuring a rigorous evaluation process.
- Safety Considerations: Claude is noted for its strict safety protocols, often refusing certain prompts that Gemini would handle differently.
- Investment Ties: Google is known to be a significant investor in Anthropic, leading to scrutiny over the ethicalities of their interactions.
- Official Stance: Google denies using Anthropic’s models for training but acknowledges the practice of comparison as standard in the industry.
- Expertise Concerns: Contractors are now expected to review AI responses outside their areas of expertise, raising alarms about potential inaccuracies.
Final Thoughts
The ongoing competition among tech companies to refine their AI technologies prompts stringent evaluation methods, particularly the need to benchmark against rival models. Google's current focus on comparing Gemini with Anthropic's Claude highlights the complexities of ethical practices within AI development. As this landscape continues to evolve, transparency and responsibility in AI usage will remain paramount.
For ongoing discussions and updates on AI developments, join our community and subscribe to stay informed. Your insights are welcome to help explore this exciting field further!