Destination

2025-10-07

TOUCAN is the largest open training dataset for AI agents


A research team from MIT, IBM, and the University of Washington has released TOUCAN, the largest open dataset to date for training AI agents. The dataset contains 1.5 million real tool interactions, aiming to help open models handle external tools more effectively.


The article TOUCAN is the largest open training dataset for AI agents appeared first on THE DECODER.

[...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat

2025-11-10

Baseten takes on hyperscalers with new AI training platform that lets you own your model weights

Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean [...]

Match Score: 148.37

venturebeat

2025-11-17

Phi-4 proves that a 'data-first' SFT methodology is the new differentiator

AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]

Match Score: 115.57

venturebeat

2025-10-17

World's largest open-source multimodal dataset delivers 17x training efficiency, unlocking enterprise AI that connects documents, audio and video

AI models are only as good as the data they're trained on. That data generally needs to be labeled, curated and organized before models can learn from it in an effective way.One of the big missin [...]

Match Score: 111.12

venturebeat

2025-11-19

Meta’s DreamGym framework trains AI agents in a simulated world to cut reinforcement learning costs

Researchers at Meta, the University of Chicago, and UC Berkeley have developed a new framework that addresses the high costs, infrastructure complexity, and unreliable feedback associated with using r [...]

Match Score: 95.40

venturebeat

2025-12-02

Arcee aims to reboot U.S. open source AI with new Trinity models released under Apache 2.0

For much of 2025, the frontier of open-weight language models has been defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.Chinese research labs including Alibaba's Qwen, [...]

Match Score: 91.80

venturebeat

2025-11-13

Upwork study shows AI agents excel with human partners but fail independently

Artificial intelligence agents powered by the world's most advanced language models routinely fail to complete even straightforward professional tasks on their own, according to groundbreaking re [...]

Match Score: 89.60

venturebeat

2025-12-02

New training method boosts AI multimodal reasoning with smaller, smarter datasets

Researchers at MiroMind AI and several Chinese universities have released OpenMMReasoner, a new training framework that improves the capabilities of language models in multimodal reasoning.The framewo [...]

Match Score: 88.58

venturebeat

2025-11-10

Meta returns to open source AI with Omnilingual ASR models that can transcribe 1,600+ languages natively

Meta has just released a new multilingual automatic speech recognition (ASR) system supporting 1,600+ languages — dwarfing OpenAI’s open source Whisper model, which supports just 99. Is architectu [...]

Match Score: 88.02

Destination

2025-04-17

Wikipedia offers AI developers a training dataset to maybe get scraper bots off its back

Wikipedia has been struggling with the impact that AI crawlers — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been hav [...]

Match Score: 84.54