Destination
OpenAI says its latest models outperform doctors in medical benchmark

OpenAI has released a new benchmark for testing AI systems in healthcare. Called HealthBench, it's designed to evaluate how well language models handle realistic medical conversations. According to OpenAI, its latest models outperform doctors on the test.<br /> The article OpenAI says its latest models outperform doctors in medical benchmark appeared first on THE DECODER. [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
Corti's new Symphony for Speech-to-Text model beats OpenAI at medical terminology accuracy, highlighting the value of specialized AI

Today, Copenhagen-based healthcare AI Corti is launching Symphony for Speech-to-Text, a new generation of clinical-grade speech recognition models engineered specifically for real-time dictation, conv [...]

Match Score: 98.44

venturebeat
Microsoft and OpenAI gut their exclusive deal, freeing OpenAI to sell on AWS and Google Cloud

Microsoft and OpenAI on Monday announced a sweeping overhaul of the partnership that has defined the commercial AI era, dismantling key pillars of exclusivity and revenue-sharing that bound the two co [...]

Match Score: 94.15

venturebeat
Anthropic brings Mythos to the masses with Claude Fable 5, its most powerful generally available model ever

Anthropic today launched two new AI models — Claude Fable 5 and Claude Mythos 5 — marking the company’s first broad release of the powerful “Mythos-class” AI capabilities it previously kept [...]

Match Score: 76.36

venturebeat
OpenAI launches GPT-5.4 with native computer use mode, financial plugins for Microsoft Excel, Google Sheets

The AI updates aren't slowing down. Literally two days after OpenAI launched a new underlying AI model for ChatGPT called GPT-5.3 Instant, the company has unveiled another, even more massive upgr [...]

Match Score: 69.93

venturebeat
OpenAI deploys Cerebras chips for 15x faster code generation in first major move beyond Nvidia

OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company's first significant inference partnership outsi [...]

Match Score: 68.34

venturebeat
DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole

For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI's GPT-5 family, Anthropic's Claude [...]

Match Score: 66.44

venturebeat
Frontier models are failing one in three production attempts — and getting harder to audit

AI agents are now embedded in real enterprise workflows, and they're still failing roughly one in three attempts on structured benchmarks. That gap between capability and reliability is the defin [...]

Match Score: 62.01

venturebeat
Why Weibo’s tiny VibeThinker-3B has the AI world arguing over benchmarks again

On Sunday, a team of nine researchers at Sina Weibo — the Chinese social media giant better known for its microblogging platform than for cutting-edge artificial intelligence — quietly posted a 14 [...]

Match Score: 58.79

venturebeat
OpenAI’s GPT-5.3-Codex drops as Anthropic upgrades Claude — AI coding wars heat up ahead of Super Bowl ads

OpenAI on Wednesday released GPT-5.3-Codex, which the company calls its most capable coding agent to date, in an announcement timed to land at the exact same moment Anthropic unveiled its own flagship [...]

Match Score: 58.07