Q-learning和Sarsa

一、Q-learning

Q-Learning的目的是学习特定state下,特定action的价值。是建立一个Q-table,以state为行、action为列,通过每个动作带来的奖赏更新Q-table。
是异策略,行动策略和评估策略不是一个策略。
在这里插入图片描述

def update():
    for episode in range(100):
        # initial observation
        observation = env.reset()
        while True:
            # fresh env
            env.render()
            # RL choose action based on observation
            action = RL.choose_action(str(observation))
            # RL take action and get next observation and reward
            observation_, reward, done = env.step(action)
            # RL learn from this transition
            RL.learn(str(observation), action, reward, str(observation_))
            # swap observation
            observation = observation_
            # break while loop when end of this episode
            if done:
                break
    # end of game
    print('game over')
    env.destroy()

在这里插入图片描述

def update():
    for episode in range(100):
        # initial observation
        observation = env.reset()
        # RL choose action based on observation
        action = RL.choose_action(str(observation))
        while True:
            # fresh env
            env.render()
            # RL take action and get next observation and reward
            observation_, reward, done = env.step(action)
            # RL choose action based on next observation
            action_ = RL.choose_action(str(observation_))
            # RL learn from this transition (s, a, r, s, a) ==> Sarsa
            RL.learn(str(observation), action, reward, str(observation_), action_)
            # swap observation and action
            observation = observation_
            action = action_
            # break while loop when end of this episode
            if done:
                break
    # end of game
    print('game over')
    env.destroy()

二、SARSA

其行动策略和评估策略一致,先做出动作再进行更新,并且做出的动作和更新时采用的动作一致。
Q-learning,先假设下一步选取最大奖赏的动作,更新值函数,再通过随机策略选取动作

参考:https://www.zhihu.com/column/c_1291396094732595200


版权声明:本文为m0_51607165原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。