venturebeat
TTT-Discover optimizes GPU kernels 2x faster than human experts — by training during inference

Researchers from Stanford, Nvidia, and Together AI have developed a new technique that can discover new solutions to very complex problems. For example, they managed to optimize a critical GPU kernel to run 2x faster than the previous state-of-the-art written by human experts.Their technique, called “Test-Time Training to Discover” (TTT-Discover), challenges the current paradigm of letting models “think longer” for reasoning problems. TTT-Discover allows the model to continue training during the inference process and update its weights for the problem at hand.The limits of 'frozen' reasoningCurrent enterprise AI strategies often rely on "frozen" models. Whether you use a closed or open reasoning model, the model's parameters are static. When you prompt thes [...]

Rating

Innovation

Pricing

Technology

Usability

We have discovered similar tools to what you are looking for. Check out our suggestions for similar AI tools.

venturebeat
New ‘Test-Time Training’ method lets AI keep learning without exploding inference costs

A new study from researchers at Stanford University and Nvidia proposes a way for AI models to keep learning after deployment — without increasing inference costs. For enterprise agents that have to [...]

Match Score: 448.48

venturebeat
Baseten takes on hyperscalers with new AI training platform that lets you own your model weights

Baseten, the AI infrastructure company recently valued at $2.15 billion, is making its most significant product pivot yet: a full-scale push into model training that could reshape how enterprises wean [...]

Match Score: 219.08

venturebeat
5% GPU utilization: The $401 billion AI infrastructure problem enterprises can't keep ignoring

For the last 24 months, one narrative justified every over-provisioned data center and bloated IT budget: the GPU scramble. Silicon was the new oil, and H100s traded like contraband. Reserve capacity [...]

Match Score: 178.42

venturebeat
FOMO is why enterprises pay for GPUs they don't use — and why prices keep climbing

Enterprises can't fix their GPU waste problem because the fix makes the problem worse. Releasing idle capacity would improve utilization, but the same shortage driving GPU prices up is exactly wh [...]

Match Score: 140.77

venturebeat
The team behind continuous batching says your idle GPUs should be running inference, not sitting dark

Every GPU cluster has dead time. Training jobs finish, workloads shift and hardware sits dark while power and cooling costs keep running. For neocloud operators, those empty cycles are lost margin.The [...]

Match Score: 114.81

venturebeat
Attention ISN'T all you need?! New Qwen3 variant Brumby-14B-Base leverages Power Retention technique

When the transformer architecture was introduced in 2017 in the now seminal Google paper "Attention Is All You Need," it became an instant cornerstone of modern artificial intelligence. Ever [...]

Match Score: 114.69

venturebeat
Together AI's ATLAS adaptive speculator delivers 400% inference speedup by learning from workloads in real-time

Enterprises expanding AI deployments are hitting an invisible performance wall. The culprit? Static speculators that can't keep up with shifting workloads.Speculators are smaller AI models that w [...]

Match Score: 110.24

venturebeat
Train-to-Test scaling explained: How to optimize your end-to-end AI compute budget for inference

The standard guidelines for building large language models (LLMs) optimize only for training costs and ignore inference costs. This poses a challenge for real-world applications that use inference-tim [...]

Match Score: 109.95

venturebeat
AI inference costs dropped up to 10x on Nvidia's Blackwell — but hardware is only half the equation

Lowering the cost of inference is typically a combination of hardware and software. A new analysis released Thursday by Nvidia details how four leading inference providers are reporting 4x to 10x redu [...]

Match Score: 92.43