Reinforcement Learning - Understanding reward concepts

In the earlier post we are discussed about the some key terms and definition of reinforcement learning.In this article you will get an idea about the reward concepts in reinforcement learning.

Consider an agent in an environment at a time t, the current state is S(t), R(t) is the reward received by the agent to reach in the state S(t) . For proceeding to the next state it should take an action, A(t) is the action taken by the agent and as a result of this the agent receives a reward of R(t+1) and attains a new state S(t+1)

The above process continues until we reach the target.As we discussed in the previous article , the main aim of the agent is to maximize the reward.The cumulative reward at a given time is the sum of all possible expected rewards in the future and can be written as

G_t = R_t+1 + R_t+2 + R_t+3 . . . .

And it can be written as,

But adding the rewards like this doesn't make any sense.Sometimes we don't care about future rewards ,sometimes we care about future rewards .So we can modify the above equation as follows,

And this equals to,

As the equation suggests we are multiplying the equation with gamma to the exponent of time step. Gamma is called the discount factor in which value of gamma lies in [0,1] . This means that we are discounting the future rewards. As the time steps proceeds the relevance of future rewards decreases.Note that the above equation doesn't mean that future rewards are not relevant, it is taken in that way.

Let's analyse how the gamma value affect the cumulative reward and the agent.

If the gamma value is larger then the agent focusing on long term rewards and if the gamma value is smaller discounted rate will be more ,then the agent will more focusing on immediate reward.

Approaches to Reinforcement Learning

The main 3 approaches of Reinforcement learning are Value based, Policy based and Model based approach.

Value Based Approach

We know that value of each state is the sum of total expected future rewards starting from that state. The agent uses this value function to select the next state and it is the state with high value.

Policy Based Approach

In this approach we are choosing the optimal policy that achieves maximum future expected reward.There are two types of policy . One is deterministic policy ,which choose an already determined action all the time and the other is Stochastic policy , which chose the output in a probabilistic manner.The agent learns a policy function to act in the state.

Model Based Approach

In this approach we are creating agent for each environment and the agent learns to perform in that specific environment.

Tasks in reinforcement learning

There are two reinforcement learning tasks Episodic and Continuous task :-

Episodic task

In this if an agent start from a particular starting point as the time proceeds it will stop.

For example, if an agent plays chess game there will be starting point and ending point, ending point is where the agent win ,loose or draw the game .

Continuous Task

In this the agent continue his interaction with the environment until we decide to stop it.

For example , the agent in stock market prediction.

Exploration and Exploitation

At the initial condition the agent doesn't have the full information about the environment. So the agent first explore the environment for learning about the environment, it tries to explore new actions instead of doing the already explored actions. This is called Exploration.

After a certain interval of time, when the agent explored almost all the state, the agent starts to exploit the already explored actions. This is called Exploitation

This is all about taking suitable actions at times , interacting with the environment, learning from it, getting rewards and maximizing them to take better actions. Reinforcement might be something new for you, but it can we very interesting once you get ideas .Do not forget to read our previous post on RL.

Keep Reading !!!

Ticker

Reinforcement Learning - Understanding reward concepts

Post a Comment

0 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget

Ticker

Reinforcement Learning - Understanding reward concepts

You may like these posts

Post a Comment

0 Comments

Popular Posts

What is Machine Learning

What Do You Know About WEB DEVELOPMENT?

Support Vector Machine (SVM) - A Practice Problem || Machine Learning Series

Report Abuse

Social Plugin

A technically versatile learning platform

Random Posts

Translate

Labels

Popular Posts

What is Internet of Things(IoT)?

Data Preprocessing in Machine Learning

Introduction to Regression techniques in machine learning - Part 1

Menu Footer Widget