Reinforcement Learning "RL" | Learn Beneficial

In the name of Allah, most gracious and most merciful,

You can see the full Mind Map in more details by clicking here

Table of Contents

1. Definition

Given:

A set of actions to choose from (ex: Move up, down, right, or left)
A virtual or real environment to perform these set of actions in
An agent who performs these actions
A reward on performing any sequence of actions that leads to a desired output

In simple words, Reinforcement learning is a machine learning branch that teaches an agent how to choose a suitable sequence of actions, from a given set of actions, in a virtual or real environment such that this sequence of actions will lead to a final result. When these results are desired, then the agent will be rewarded to encourage this desirable set of actions.

Therefore by rewarding the agent, the agent will know which set of actions are good and which set of actions are not good. Therefore by repeated trials of maximizing the reward over time through exploration and exploitation is what makes the agent learns.

In addition to this, Reinforcement Learning “RL” is a computational neuroscience framework that is modeling the decision-making process.

More clear terminology definitions:

Agent: It is the program you train to take specific actions or to do specific jobs.
Environment: The real or virtual environment in which the agent performs actions.
Action: A movement of the agent which leads to a change in the environment status.
Reward: The positive or negative measurement or evaluation of the action.

1.1 Important Distinctions between Reinforcement Learning and other Machine Learning subfields

It does not learn from data like supervised, and unsupervised learning. It uses cause and effect as explained in the definition. Therefore its only reference for learning is the reward it gets.
It could learn better strategies than humans since there is no training data it is trying to imitate. Therefore, it could beat humans in games for instance thanks to having to get some balance between exploitation and exploration. Exploitation means exploiting the reward sources it has obtained previously, and exploration means finding new ways for getting rewards by exploring the environment.

1.2 Approaches for Implementation

Value-Based
Policy-Based
Model-Based

2. Topics you Expect to Learn (Foundations of Deep Reinforcement Learning Table of Contents)

Policy-Based and Value-Based Algorithms
1. REINFORCE
2. SARSA
3. Deep Q-Networks (DQN)
4. Improving DQN
Combined Methods
1. Advantage Actor-Critic (A2C)
2. Proximal Policy Optimization (PPO)
3. Parallelization Methods
Practical Details
1. Getting Deep RL to Work
2. SLM Lab
3. Network Architectures
4. Hardware
Environment Design
1. States
2. Actions
3. Rewards
4. Transition Function

3. When the Reinforcement Learning “RL” Term was Coined

The earliest trial-and-error learning computational investigations maybe were by Minsky and by Farley and Clark, both in 1954. The first time the terms “Reinforcement” and “Reinforcement Learning” were used was in the 1960s. (e.g., Waltz and Fu, 1965; Mendel, 1966; Fu, 1970; Mendel and McClaren, 1970).

4. Foundations or Origins of Reinforcement Learning

Animal Psychology
Control Theory
RL has very close relationship with Psychology, Biology, and neuroscience. (Note that this is not necessarily a foundation or origin but maybe just an intuition)

5. Applications or Examples of Problems solved by Reinforcement Learning

“If one of the goals that we work for here is AI then it is at the core of that. Reinforcement Learning is a very general framework for learning sequential decision making tasks. And Deep Learning, on the other hand, is of course the best set of algorithms we have to learn representations. And combinations of these two different models is the best answer so far we have in terms of learning very good state representations of very challenging tasks that are not just for solving toy domains but actually to solve challenging real world problems.”
Koray Kavukcuoglu, the director of research at Deepmind

Self-driving Cars
Industry Automation
NLP “Natural Language Processing”: Text Summarization, Question Answering, and Machine Translation
Healthcare: patients could receive treatment from RL systems learned-policies
Computer Clusters Resources Management
Traffic Light Control
Robotics: Robotics manipulation, and Mapping raw video images to robot’s actions
Web System Configuration
Personalized Recommendations
Games (sometimes they surpass the human performance)

When to apply Reinforcement Learning

Basically your problem should have the following characteristics (These are just some of the characteristics and not all of them):

Could be solved by Trial and Error by getting feedback from the environment.
You could set a delayed reward.
You could model your problem as a Markov Decision Process “MDP” which is a discrete-time stochastic control process that provides a mathematical framework for decision-making modeling in situations where outcomes are partly controllable by a decision-maker and partly random.
It is a control problem.
You could have a simulated environment to prevent dangerous outcomes that happen when the agent is learning in the environment.
They could also be used when it is difficult to define a task for a machine to perform like how to walk.

6. Reinforcement Learning Engineer Skills

After searching I didn’t find job titles as a Reinforcement Learning Engineer or specific skills to become a Reinforcement Learning engineer maybe because it is already a Machine Learning branch. Therefore, I think most of the machine learning skills I have mentioned in this post are the same for Reinforcement Learning except for the Reinforcement Learning Skill is now a must and not just preferable if you want to specialize in Reinforcement Learning. But if you really want to know if there are any missing skills or very important special skills for Reinforcement Learning, you would better ask someone already working in that field especially if he is an expert. You could also check this post: 3 skills to master before reinforcement learning (RL).

Finally

Thank you. I hope this post has been beneficial to you. I would appreciate any comments if anyone needed more clarifications or if anyone has seen something wrong in what I have written in order to modify it, and I would also appreciate any possible enhancements or suggestions. We are humans, and mistakes are expected from us, but we could also minimize those mistakes by learning from them and by seeking to improve what we do and how we do it.

Allah bless our master Muhammad and his family.

References

https://medium.com/ai³-theory-practice-business/reinforcement-learning-part-1-a-brief-introduction-a53a849771cf

https://www.datacamp.com/community/tutorials/introduction-reinforcement-learning

http://incompleteideas.net/book/first/ebook/node12.html

https://www.youtube.com/watch?v=e3Jy2vShroE

https://towardsdatascience.com/applications-of-reinforcement-learning-in-real-world-1a94955bcd12

https://en.wikipedia.org/wiki/Markov_decision_process

https://neptune.ai/blog/reinforcement-learning-applications

https://www.youtube.com/watch?v=nIgIv4IfJ6s&t=541s

https://www.amazon.com/Deep-Reinforcement-Learning-Python-Hands/dp/0135172381

Share on Facebook

2 Comments

Most Voted

Newest Oldest

Inline Feedbacks

View all comments

Andrew Peters

2 years ago

I have to thank you for the efforts you’ve put in penning this blog. I am hoping to view the same high-grade blog posts by you in the future as well. In truth, your creative writing abilities has inspired me to get my very own site now

Author

Bahgat

Reply to Andrew Peters

Thank you for this positive feedback, Mr. Andrew. I hope that I could continue delivering high-quality posts for you and others to benefit from.