Model Evaluation - Search News

23d

Micro1 Shows Why AI’s Hardest Problem Is Evaluation, Not Intelligence

Micro1 is building the evaluation layer for AI agents providing contextual, human-led tests that decide when models are ready ...

1dOpinion

India's AI Sovereignty Needs A Scoreboard, Not Just A Model

Every Indian AI model is graded on benchmarks built in San Francisco. GPT-5 scores below 40% on Indian cultural reasoning.

Communications of the ACM

LLM Evaluation is Key to Accurate, Reliable, Effective GenAI

Enter large language model (LLM) evaluation. The purpose of LLM evaluation is to analyze and refine GenAI outputs to improve their accuracy and reliability while avoiding bias. The evaluation process ...

ZDNet

OpenAI and Anthropic evaluated each others' models - which ones came out on top

Anthropic and OpenAI ran their own tests on each other's models. The two labs published findings in separate reports. The goal was to identify gaps in order to build better and safer models. The AI ...

Frontiers

Large Language Models for Knowledge-Intensive Domains: Methods, Systems, and Evaluation

The rapid emergence of Large Language Models (LLMs) and generative AI is reshaping how people and organizations access, synthesize, and apply knowledge.

Some results have been hidden because they may be inaccessible to you

Show inaccessible results