2025-10-09
Researchers at Nvidia have developed a new technique that flips the script on how large language models (LLMs) learn to reason.
The method, called reinforcement learning pre-training (RLP), integrates RL into the initial training phase rather than saving it for the end.
This approach encourages the model to “think for itself before predicting what comes next, thus teaching an independent thinking behavior earlier in the pretraining,” the researchers state in their paper.
By learning to reason on plain text without needing external verifiers, models trained with RLP show significant improvements in learning complex reasoning tasks downstream, hinting at a future of more capable and adaptable AI fo [...]
2025-06-12
Proton VPN stands out for two main reasons: it's one of the only virtual private networks (VPNs) to include a free plan with no data limits, and it's one of the few services majority-owned b [...]
2025-11-17
AI engineers often chase performance by scaling up LLM parameters and data, but the trend toward smaller, more efficient, and better-focused models has accelerated. The Phi-4 fine-tuning methodology [...]
2025-11-14
Researchers at Google Cloud and UCLA have proposed a new reinforcement learning framework that significantly improves the ability of language models to learn very challenging multi-step reasoning task [...]
2025-11-10
Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean [...]
2025-10-30
Researchers at Meta FAIR and the University of Edinburgh have developed a new technique that can predict the correctness of a large language model's (LLM) reasoning and even intervene to fix its [...]