Test-time scaling (TTS) has emerged as a proven method to improve the performance of large language models in real-world applications by giving them extra compute cycles at inference time. However, TTS strategies have historically been handcrafted, relying heavily on human intuition to dictate the rules of the model’s reasoning. To address this bottleneck, researchers from Meta, Google, and several universities have introduced AutoTTS, a framework that automatically discovers optimal TTS strategies. This automated approach allows enterprise organizations to dynamically optimize compute allocation without manually tuning heuristics. By implementing the optimal strategies discovered by AutoTTS, organizations can directly reduce the token usage and operational costs of deploying advanced [...]
Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while co [...]
DeepSeek’s announcement over the weekend that it has made its 75% price cut permanent on its flagship V4 Pro model is a disruptive assault on the capital-heavy business models of Silicon Valley’s [...]
AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]
When Anthropic quietly released Claude Design in April as a "research preview," it generated the kind of instant traction most product teams dream about: more than one million users in its f [...]
Deploying AI agents for repository-scale tasks like bug detection, patch verification, and code review requires overcoming significant technical hurdles. One major bottleneck: the need to set up dynam [...]
Researchers at MiroMind AI and several Chinese universities have released OpenMMReasoner, a new training framework that improves the capabilities of language models in multimodal reasoning.The framewo [...]
Training a foundation LLM from scratch costs millions and requires internet-scale data — which is why most enterprises don't bother. Sapient thinks it has a cheaper path.To overcome this brute- [...]
Researchers at Meta FAIR and the University of Edinburgh have developed a new technique that can predict the correctness of a large language model's (LLM) reasoning and even intervene to fix its [...]
As agentic AI workflows multiply the cost and latency of long reasoning chains, a team from the University of Maryland, Lawrence Livermore National Labs, Columbia University and TogetherAI has found a [...]