本文主要是介绍n-armed bandit_Gittins index,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
The complexity of solving MAB (multi-armed bandit) using Markov decision theory increases exponentially with the number of bandit processes.
Instead of solving the n-dimensional MDP with the state-space ∏ni=1χi , the optimal solution(Gittins Index) is obtained by solving n 1-dimensional optimization problems.
The index is given as,
Off-Line Algorithm for computing Gittins Index
1. Largest-Remaining-Index Algorithm
Initialization: identify the state α1 with the highest Gittins index.
S(α1)=χ , ν(α1)=r(α1)=rα1
choose: α1=argmaxα∈χrα
corresponding Gittins index is: ν(α1)=rα1
Recursion step:
Define the m×m matrix by ∀a,b∈χ
and define the m×1 vectors:
d
这篇关于n-armed bandit_Gittins index的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!