Training AI reasoning models demands resources that most enterprise teams do not have. Engineering teams are often forced to choose between distilling knowledge from large, expensive models or relying on reinforcement learning techniques that provide sparse feedback.Researchers at JD.com and several academic institutions recently introduced a new training paradigm that sidesteps this dilemma. The technique, called Reinforcement Learning with Verifiable Rewards with Self-Distillation (RLSD), combines the reliable performance tracking of reinforcement learning with the granular feedback of self-distillation. Experiments indicate that models trained with RLSD outperform those built on classic distillation and reinforcement learning algorithms. For enterprise teams, this approach lowers the t [...]
Microsoft on Tuesday released Phi-4-reasoning-vision-15B, a compact open-weight multimodal AI model that the company says matches or exceeds the performance of systems many times its size — while co [...]
AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]
Deploying AI agents for repository-scale tasks like bug detection, patch verification, and code review requires overcoming significant technical hurdles. One major bottleneck: the need to set up dynam [...]
OpenAI introduced a new paradigm and product today that is likely to have huge implications for enterprises seeking to adopt and control fleets of AI agent workers.Called "Workspace Agents," [...]
Researchers at MiroMind AI and several Chinese universities have released OpenMMReasoner, a new training framework that improves the capabilities of language models in multimodal reasoning.The framewo [...]
The standard guidelines for building large language models (LLMs) optimize only for training costs and ignore inference costs. This poses a challenge for real-world applications that use inference-tim [...]
Enterprise teams building multi-agent AI systems may be paying a compute premium for gains that don't hold up under equal-budget conditions. New Stanford University research finds that single-age [...]
Jensen Huang walked onto the GTC stage Monday wearing his trademark leather jacket and carrying, as it turned out, the blueprints for a new kind of monopoly.The Nvidia CEO unveiled the Agent Toolkit, [...]
Even as concern and skepticism grows over U.S. AI startup OpenAI's buildout strategy and high spending commitments, Chinese open source AI providers are escalating their competition and one has e [...]