In 2025 the contour of AI research shifted from raw scale to structural intelligence: engineers and researchers moved from "making models larger" to "making models smarter", concentrating breakthroughs on fluid reasoning, long-term memory, spatial intelligence and meta-learning. According to the report by 36Kr, this year marked the end of what it calls the "brute force aesthetics" era and a return to basic research aimed at closing the gap between knowledge and cognitive ability. [1]

The most visible advance was the emergence of Test‑Time Compute (TTC) as a practical paradigm for fluid reasoning. Researchers demonstrated that by trading latency for iterative internal computation , effectively allowing models to "think slowly" , large language models could markedly improve in tasks that demand multi‑step deduction. Microsoft Research's work on "Thinking‑Optimal Scaling" framed how different reasoning efforts should be allocated, while other studies documented both gains and novel failure modes when lengthening chain‑of‑thoughts, underscoring that more compute at test time is powerful but must be applied selectively. These findings mirror 36Kr's account of a year in which reinforcement learning and post‑training strategies were central to improving immediate reasoning. [1][4][5]

That debate about how reasoning improvements arise also sharpened around reinforcement learning. Industry practice in 2025 emphasised sampling strategies, verifiable reward signals and new update algorithms: RL driven by verifiable rewards (RLVR) and sparse objective rewards (ORM) proved especially effective in domains with objective correctness such as mathematics and code, and the GPRO family of algorithms emerged as a cost‑effective alternative to PPO by replacing an explicit critic with population scoring. At the same time, academic analyses argued that RL often amplifies reasoning trajectories already present in base models rather than inventing wholly new cognitive primitives , although deep RL can chain asymmetric skills into novel problem‑solving behaviours when taken far enough. 36Kr summarised these tensions and the pragmatic engineering practices that nonetheless produced measurable benchmark gains. [1]

Parallel to reasoning gains, 2025 saw substantial progress on the memory problem that long constrained continual learning and personalised agents. Google Research's Titans architecture introduced a neural long‑term memory module that can update its parameters during inference, allowing models to store and retrieve vast historical context beyond fixed transformer windows while preserving accuracy across millions of tokens. Complementary work on Nested Learning reframes architecture and optimisation as nested, interacting problems and aims to mitigate catastrophic forgetting by unifying model structure and learning algorithms into a self‑improving system. Both advances challenge the transformer assumption of statelessness and point toward models that accumulate persistent, usable memory. [2][3][1]

The technical design choices behind these memory systems matter for deployment and efficiency. Titans uses a Surprise Metric to decide what to store, updating neural memory where gradients indicate novelty or importance; Nested Learning proposes nested optimisation loops to stabilise parameter updates and reduce destructive interference. These approaches convert external retrieval buffers into internalised, differentiable memory that can be read and written during reasoning , a move that, 36Kr argues, gives models an emergent "hippocampus" and a pathway to cure "goldfish memory". Practical constraints remain , online updates require careful engineering to control compute and stability , but the scientific direction is clear. [2][3][1]

Spatial intelligence and embodied world modelling also advanced beyond pixel‑stacking. Video generation systems in 2025 increasingly incorporate physical priors and temporal coherence, moving towards generative models that capture dynamics and physical plausibility rather than only per‑frame fidelity. Hardware and systems efforts echoed this trend: Nvidia's Rubin CPX and disaggregated inference designs target inference throughput and bandwidth for long‑context and video workloads, signalling industry preparation for persistent, context‑heavy agentic applications. Independent work modelling hierarchical, multi‑timescale brain‑like processing reported improved reasoning efficiency, suggesting that biologically inspired architectures can outperform parameter‑heavy LLMs on selected benchmarks. These threads together point to a practical convergence of improved model algorithms and specialised inference hardware. [6][7][1]

Despite rapid progress, several papers flagged practical limits. Empirical studies show that indiscriminate scaling of test‑time compute can produce inverse gains, with failure modes including distraction by irrelevant context and overfitting to problem framings, and meta‑analyses indicate RL improvements follow a sigmoid rather than unbounded power law , implying ceilings to what post‑training alone can extract from a base model. The consensus in 2025 became one of calibrated optimism: TTC, memory modules and RL engineering can unlock large gains today, but sustaining the trajectory toward AGI will require continued base‑model and architectural innovation. [5][4][1]

Looking ahead, the architecture and optimisation advances of 2025 set a new baseline for capable, contextual and persistent AI systems. The year demonstrated that engineering ingenuity , smarter scoring, population‑based policy updates, surprise‑driven memory and differentiated hardware , can compensate for diminishing returns from parameter scale. As 36Kr framed it, the field has moved from brute force to reconstruction: the near term will be defined by integrating fluid reasoning, living memory and spatially aware models into deployed systems and by confronting the practical trade‑offs of compute, robustness and verifiability that those systems entail. [1]

📌 Reference Map:

  • [1] (36Kr) - Paragraph 1, Paragraph 2, Paragraph 3, Paragraph 4, Paragraph 6, Paragraph 7
  • [4] (Microsoft Research) - Paragraph 2, Paragraph 6
  • [5] (arXiv paper on inverse scaling) - Paragraph 2, Paragraph 6
  • [2] (Google Research - Titans paper) - Paragraph 4, Paragraph 5
  • [3] (Google Research - Nested Learning paper) - Paragraph 4, Paragraph 5
  • [6] (Tom's Hardware on Nvidia Rubin CPX) - Paragraph 6
  • [7] (LiveScience reporting on Sapient HRM) - Paragraph 6

Source: Noah Wire Services