本文主要是介绍DQN+Active Learning,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
关于MarkDown公式详细编辑可以参考博客
Initialize replay memory M M to capacity
Initialize action-value function Q Q with random weights
for episode = do
Dl D l ← model and shuffle D D
← Random
for i=1 i = 1 , |D| | D | do
Construct the state si s i using xi x i
With probability ϵ ϵ select a random action ai a i
Otherwise select ai=arg a i = a r g maxQπ(si,a;θ) m a x Q π ( s i , a ; θ )
if ai a i = 1 then
Obtain the annotation yi y i
Dl D l ← Dl+(xi,yi) D l + ( x i , y i )
Updata model ϕ ϕ based on Dl D l
end if
Receive a reward ri r i from test data
if |Dl| | D l | = B B then
Store transition in M M
Break
end if
Construct the new state
Store transition (si,si,ri,si+1) ( s i , s i , r i , s i + 1 ) in M M
Sample random minibatch of transitionsfrom , and perform gradient descent step on L(θ) L ( θ )
Update policy with θ θ
end for
end for
这篇关于DQN+Active Learning的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!