Destination

2025-07-02

SciArena lets scientists compare LLMs on real research questions


A new open platform called SciArena is now available for evaluating large language models (LLMs) on scientific literature tasks based on human preferences. Early results reveal clear performance gaps between different models.


The article SciArena lets scientists compare LLMs on real research questions appeared first on THE DECODER.

[...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

Destination

2025-02-15

Perplexity has its own ‘Deep Research’ tool now too

In a blog post on Friday, Perplexity introduced a new tool called Deep Research that it says can conduct “in-depth research and analysis” to deliver detailed reports in response to your questions, [...]

Match Score: 43.55

blogspot

2024-11-08

Ahrefs vs SEMrush: Which SEO Tool Should You Use?

SEMrush and Ahrefs are among<br /> the most popular tools in the SEO industry. Both companies have been in<br /> business for years and have thousands of customers per month.<br /> & [...]

Match Score: 41.76

Destination

2025-02-03

ChatGPT's Deep Research tool can create reports from hundreds of online sources

There’s no two ways about it, there’s a newfound sense of urgency at OpenAI. Two days after releasing o3-mini to the world, the company made a surprise announcement on Sunday evening, revealing De [...]

Match Score: 35.56

Destination

2025-06-02

How Good Are AI Agents at Real Research? Inside the Deep Research Bench Report

As large language models (LLMs) rapidly evolve, so does their promise as powerful research assistants. Increasingly, they’re not just answering simple factual questions—they’re tackling “deep [...]

Match Score: 35.33

Destination

2025-03-13

Google's Gemini Deep Research is now available to everyone

After being one of the first companies to roll out a Deep Research feature at the end of last year, Google is now making that same tool available to everyone. Starting today, Gemini users can try Deep [...]

Match Score: 34.83

Destination

2025-05-11

Scientists find lead really can be turned into gold (with help from the Large Hadron Collider)

One of the ultimate goals of medieval alchemy has been realized, but only for a fraction of a second. Scientists with the European Organization for Nuclear Research, better known as CERN, were able to [...]

Match Score: 34.20

Destination

2025-05-05

Research Suggests LLMs Willing to Assist in Malicious ‘Vibe Coding’

Over the past few years, Large language models (LLMs) have drawn scrutiny for their potential misuse in offensive cybersecurity, particularly in generating software exploits. The recent trend towards [...]

Match Score: 30.93

Destination

2025-03-23

Microsoft Research has developed a new way to feed knowledge into LLMs

Microsoft Research has developed a more efficient way to incorporate external knowledge into language models. The new system, called Knowledge Base-Augmented Language Models (KBLaM), takes a plug-and- [...]

Match Score: 30.36

Destination

2025-05-20

Google I/O 2025 recap: AI updates, Android XR, Google Beam and everything else announced at the annual keynote

Today is one of the most important days on the tech calendar as Google kicked off its I/O developer event with its annual keynote. As ever, the company had many updates for a wide range of products to [...]

Match Score: 30.25