The developers of Terminal-Bench, a benchmark suite for evaluating the performance of autonomous AI agents on real-world terminal-based tasks, have released version 2.0 alongside Harbor, a new framewo [...]
As LLMs have continued to improve, there has been some discussion in the industry about the continued need for standalone data labeling tools, as LLMs are increasingly able to work with all types of d [...]
Model providers want to prove the security and robustness of their models, releasing system cards and conducting red-team exercises with each new release. But it can be difficult for enterprises to pa [...]
Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean [...]
Mistral AI on Monday launched Forge, an enterprise model training platform that allows organizations to build, customize, and continuously improve AI models using their own proprietary data — a move [...]
Patronus AI, the artificial intelligence evaluation startup backed by $20 million from investors including Lightspeed Venture Partners and Datadog, unveiled a new training architecture Tuesday that it [...]
Two days after releasing what analysts call the most powerful open-source AI model ever created, researchers from China's Moonshot AI logged onto Reddit to face a restless audience. The Beijing-b [...]
Nous Research, the open-source artificial intelligence startup backed by crypto venture firm Paradigm, released a new competitive programming model on Monday that it says matches or exceeds several la [...]