Orthogonalized Policy Optimization:Policy Optimization as Orthogonal Projection in Hilbert Space
Wang Zixian ยท Jan 18, 2026
Citations: 0
Automatic Metrics Long Horizon MathLaw
- Experiments on MATH benchmarks show that the Hilbert projection formulation prevents gradient saturation typical of KL-constrained methods.