Inscrivez-vous maintenant pour un meilleur devis personnalisé!

Nouvelles chaudes

Meta's AI benchmarking practices under scrutiny

May, 23, 2025 Hi-network.com

Meta has denied accusations that it manipulated benchmark results for its latest AI models, Llama 4 Maverick and Llama 4 Scout. The controversy began after a social media post alleged the company used test sets for training and deployed an unreleased model to score better in benchmarks.

Ahmad Al-Dahle, Meta's VP of generative AI, called the claims 'simply not true' and acknowledged inconsistent model performance due to differing cloud implementations. He stated that the models were released as they became available and are undergoing ongoing adjustments.

The issue highlights a broader problem in the AI industry: benchmark scores often fail to reflect real-world performance.

Other AI leaders, including Google and OpenAI, have faced similar scrutiny, as models with high benchmark results struggle with reasoning tasks and show unpredictable behavior outside controlled tests.

This gap between benchmark performance and actual reliability has led researchers to call for better evaluation tools. Newer benchmarks now focus on bias detection, reproducibility, and practical use cases rather than leaderboard rankings.

Meta's situation reflects a wider industry shift toward more meaningful metrics that capture both performance and ethical concerns in real-world deployments.

Would you like to learn more about AI, tech and digital diplomacy? If so, ask our Diplo chatbot!

tag-icon Tags chauds: Intelligence artificielle Le développement Protection des consommateurs Digital aspects and the environment publish

Copyright © 2014-2024 Hi-Network.com | HAILIAN TECHNOLOGY CO., LIMITED | All Rights Reserved.
Our company's operations and information are independent of the manufacturers' positions, nor a part of any listed trademarks company.