2025-10-07
A research team from MIT, IBM, and the University of Washington has released TOUCAN, the largest open dataset to date for training AI agents. The dataset contains 1.5 million real tool interactions, aiming to help open models handle external tools more effectively.
The article TOUCAN is the largest open training dataset for AI agents appeared first on THE DECODER.
[...]2025-04-17
Wikipedia has been struggling with the impact that AI crawlers — bots that are scraping text and multimedia from the encyclopedia to train generative artificial intelligence models — have been hav [...]
2025-10-02
A new study by Shanghai Jiao Tong University and SII Generative AI Research Lab (GAIR) shows that training large language models (LLMs) for complex, autonomous tasks does not require massive datasets. [...]
2025-06-06
The Common Pile is the first large-scale text dataset built entirely from openly licensed sources, offering an alternative to web data restricted by copyright.<br /> The article Researchers buil [...]
2025-10-01
Thinking Machines, the AI startup founded earlier this year by former OpenAI CTO Mira Murati, has launched its first product: Tinker, a Python-based API designed to make large language model (LLM) fin [...]
2025-10-01
It’s been a rough time for Peloton. Last year was marred by deep staff cuts, a change of CEO and a reckoning of where the home fitness company belonged, post-Pandemic boom. The answer is, unfortunat [...]
2025-10-02
IBM today announced the release of Granite 4.0, the newest generation of its homemade family of open source large language models (LLMs) designed to balance high performance with lower memory and cost [...]
2025-10-07
For many enterprises, there continue to be barriers to fully adopting and benefiting from agentic AI.IBM is betting the blocker isn't building AI agents but governing them in production.At its Te [...]