Destination

2025-11-19

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Stylized illustration of a hollow book in front of a square grid and curved wire frame


A new benchmark from Artificial Analysis reveals alarming weaknesses in the factual reliability of large language models. Out of 40 models tested, only four achieved a positive score - with Google's Gemini 3 Pro clearly in the lead.


The article Gemini 3 Pro tops new AI reliability benchmark, but hal [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-12-17

Gemini 3 Flash arrives with reduced costs and latency — a powerful combo for enterprises

Enterprises can now harness the power of a large language model that's near that of the state-of-the-art Google’s Gemini 3 Pro, but at a fraction of the cost and with increased speed, thanks to [...]

Match Score: 139.47

venturebeat

2025-11-18

Google unveils Gemini 3 claiming the lead in math, science, multimodal and agentic AI benchmarks

After more than a month of rumors and feverish speculation — including Polymarket wagering on the release date — Google today unveiled Gemini 3, its newest proprietary frontier model family and th [...]

Match Score: 132.38

venturebeat

2025-11-20

Grok 4.1 Fast's compelling dev access and Agent Tools API overshadowed by Musk glazing

Elon Musk's frontier generative AI startup xAI formally opened developer access to its Grok 4.1 Fast models last night and introduced a new Agent Tools API—but the technical milestones were imm [...]

Match Score: 106.99

venturebeat

2025-11-18

Musk's xAI launches Grok 4.1 with lower hallucination rate on the web and apps — no API access (for now)

In what appeared to be a bid to soak up some of Google's limelight prior to the launch of its new Gemini 3 flagship AI model — now recorded as the most powerful LLM in the world by multiple ind [...]

Match Score: 91.97

venturebeat

2025-12-10

The 70% factuality ceiling: why Google’s new ‘FACTS’ benchmark is a wake-up call for enterprise AI

There's no shortage of generative AI benchmarks designed to measure the performance and accuracy of a given model on completing various helpful enterprise tasks — from coding to instruction fol [...]

Match Score: 89.23

Destination

2025-11-18

Google's new Gemini 3 model arrives in AI Mode and the Gemini app

A few weeks short of Gemini 2's first birthday, Google has announced Gemini 3 Pro. Naturally, the company claims the new system is its most intelligent AI model yet, offering state-of-the-art rea [...]

Match Score: 71.00

venturebeat

2025-11-20

Google's upgraded Nano Banana Pro AI image model hailed as 'absolutely bonkers' for enterprises and users

Infographics rendered without a single spelling error. Complex diagrams one-shotted from paragraph prompts. Logos restored from fragments. And visual outputs so sharp with so much text density and acc [...]

Match Score: 63.41

venturebeat

2025-10-07

Google's AI can now surf the web for you, click on buttons, and fill out forms with Gemini 2.5 Computer Use

Some of the largest providers of large language models (LLMs) have sought to move beyond multimodal chatbots — extending their models out into "agents" that can actually take more actions [...]

Match Score: 60.19

venturebeat

2025-11-13

Upwork study shows AI agents excel with human partners but fail independently

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to groundbreaking re [...]

Match Score: 60.01