NVDA and the Nasdaq are way down today as news over the weekend of software (rather than hardware) improvements in AI technology. Some China AI firm put out their models publicly and they are both 1) really good, comparable to the best, and 2) apparently done using 2% of the hardware resources.
Good article here:
Excerpts
Perhaps the most extraordinary thing about this [revolutionary Chain-of-Thought (“COT”) models] approach, beyond the fact that it works at all, is that the more logic/COT tokens you use, the better it works. Suddenly, you now have an additional dial you can turn so that, as you increase the amount of COT reasoning tokens (which uses a lot more inference compute, both in terms of FLOPS and memory), the higher the probability is that you will give a correct response— code that runs the first time without errors, or a solution to a logic problem without an obviously wrong deductive step.
The first time I tried the O1 model from OpenAI was like a revelation: I was amazed how often the [python programming] code would be perfect the very first time. And that’s because the COT process automatically finds and fixes problems before they ever make it to a final response
The number of researchers showing up at the big research conferences like Neurips and ICML went up very, very dramatically. All the smart students who might have previously studied financial derivatives were instead studying Transformers, and $1mm+ compensation packages for non-executive engineering roles (i.e., for independent contributors not managing a team) became the norm at the leading AI labs.
a small Chinese startup called DeepSeek released two new models that have basically world-competitive performance levels on par with the best models from OpenAI and Anthropic… this model is absolutely legit, DeepSeek has made profound advancements not just in model quality but more importantly in model training and inference efficiency. Another major breakthrough is their multi-token prediction system. Most Transformer based LLM models do inference by predicting the next token— one token at a time. DeepSeek figured out how to predict multiple tokens while maintaining the quality you’d get from single-token prediction…One very strong indicator that it’s true is the cost of DeepSeek’s API: despite this nearly best-in-class model performance, DeepSeek charges something like 95% less money for inference requests via its API than comparable models from OpenAI and Anthropic. using pure reinforcement learning with carefully crafted reward functions, they managed to get models to develop sophisticated reasoning capabilities completely autonomously. This wasn’t just about solving problems— the model organically learned to generate long chains of thought, self-verify its work, and allocate more computation time to harder problems.