Foundational Lemmas for Bellman Optimality and Anti-Optimality Operators

Written by anchoring | Published 2025/01/15
Tech Story Tags: reinforcement-learning | dynamic-programming | nesterov-acceleration | machine-learning-optimization | value-iteration | value-iteration-convergence | bellman-error | rl-convergence-lemmas

TLDRThese foundational lemmas establish key properties of Bellman optimality and anti-optimality operators, with insights into their fixed points and convergence in reinforcement learning.via the TL;DR App

Authors:

(1) Jongmin Lee, Department of Mathematical Science, Seoul National University;

(2) Ernest K. Ryu, Department of Mathematical Science, Seoul National University and Interdisciplinary Program in Artificial Intelligence, Seoul National University.

Abstract and 1 Introduction

1.1 Notations and preliminaries

1.2 Prior works

2 Anchored Value Iteration

2.1 Accelerated rate for Bellman consistency operator

2.2 Accelerated rate for Bellman optimality opera

3 Convergence when y=1

4 Complexity lower bound

5 Approximate Anchored Value Iteration

6 Gauss–Seidel Anchored Value Iteration

7 Conclusion, Acknowledgments and Disclosure of Funding and References

A Preliminaries

B Omitted proofs in Section 2

C Omitted proofs in Section 3

D Omitted proofs in Section 4

E Omitted proofs in Section 5

F Omitted proofs in Section 6

G Broader Impacts

H Limitations

A Preliminaries

For notational unity, we use the symbol U when both V and Q can be used.

This paper is available on arxiv under CC BY 4.0 DEED license.


Written by anchoring | Anchoring provides a steady start, grounding decisions and perspectives in clarity and confidence.
Published by HackerNoon on 2025/01/15