In the name of Allah, most gracious and most merciful,
- A set of actions to choose from (ex: Move up, down, right, or left)
- A virtual or real environment to perform these set of actions in
- An agent who performs these actions
- A reward on performing any sequence of actions that leads to a desired output
In simple words, Reinforcement learning is a machine learning branch that teaches an agent how to choose a suitable sequence of actions, from a given set of actions, in a virtual or real environment such that this sequence of actions will lead to a final result. When these results are desired, then the agent will be rewarded to encourage this desirable set of actions.
Therefore by rewarding the agent, the agent will know which set of actions are good and which set of actions are not good. Therefore by repeated trials of maximizing the reward over time through exploration and exploitation is what makes the agent learns.
In addition to this, Reinforcement Learning “RL” is a computational neuroscience framework that is modeling the decision-making process.
More clear terminology definitions:
- Agent: It is the program you train to take specific actions or to do specific jobs.
- Environment: The real or virtual environment in which the agent performs actions.
- Action: A movement of the agent which leads to a change in the environment status.
- Reward: The positive or negative measurement or evaluation of the action.
1.1 Important Distinctions between Reinforcement Learning and other Machine Learning subfields
- It does not learn from data like supervised, and unsupervised learning. It uses cause and effect as explained in the definition. Therefore its only reference for learning is the reward it gets.
- It could learn better strategies than humans since there is no training data it is trying to imitate. Therefore, it could beat humans in games for instance thanks to having to get some balance between exploitation and exploration. Exploitation means exploiting the reward sources it has obtained previously, and exploration means finding new ways for getting rewards by exploring the environment.
1.2 Approaches for Implementation
2. Topics you Expect to Learn (Foundations of Deep Reinforcement Learning Table of Contents)
- Policy-Based and Value-Based Algorithms
- Deep Q-Networks (DQN)
- Improving DQN
- Combined Methods
- Advantage Actor-Critic (A2C)
- Proximal Policy Optimization (PPO)
- Parallelization Methods
- Practical Details
- Getting Deep RL to Work
- SLM Lab
- Network Architectures
- Environment Design
- Transition Function
3. When the Reinforcement Learning “RL” Term was Coined
The earliest trial-and-error learning computational investigations maybe were by Minsky and by Farley and Clark, both in 1954. The first time the terms “Reinforcement” and “Reinforcement Learning” were used was in the 1960s. (e.g., Waltz and Fu, 1965; Mendel, 1966; Fu, 1970; Mendel and McClaren, 1970).
4. Foundations or Origins of Reinforcement Learning
- Animal Psychology
- Control Theory
- RL has very close relationship with Psychology, Biology, and neuroscience. (Note that this is not necessarily a foundation or origin but maybe just an intuition)
5. Applications or Examples of Problems solved by Reinforcement Learning
“If one of the goals that we work for here is AI then it is at the core of that. Reinforcement Learning is a very general framework for learning sequential decision making tasks. And Deep Learning, on the other hand, is of course the best set of algorithms we have to learn representations. And combinations of these two different models is the best answer so far we have in terms of learning very good state representations of very challenging tasks that are not just for solving toy domains but actually to solve challenging real world problems.”Koray Kavukcuoglu, the director of research at Deepmind
- Self-driving Cars
- Industry Automation
- NLP “Natural Language Processing”: Text Summarization, Question Answering, and Machine Translation
- Healthcare: patients could receive treatment from RL systems learned-policies
- Computer Clusters Resources Management
- Traffic Light Control
- Robotics: Robotics manipulation, and Mapping raw video images to robot’s actions
- Web System Configuration
- Personalized Recommendations
- Games (sometimes they surpass the human performance)
When to apply Reinforcement Learning
Basically your problem should have the following characteristics (These are just some of the characteristics and not all of them):
- Could be solved by Trial and Error by getting feedback from the environment.
- You could set a delayed reward.
- You could model your problem as a Markov Decision Process “MDP” which is a discrete-time stochastic control process that provides a mathematical framework for decision-making modeling in situations where outcomes are partly controllable by a decision-maker and partly random.
- It is a control problem.
- You could have a simulated environment to prevent dangerous outcomes that happen when the agent is learning in the environment.
- They could also be used when it is difficult to define a task for a machine to perform like how to walk.
6. Reinforcement Learning Engineer Skills
After searching I didn’t find job titles as a Reinforcement Learning Engineer or specific skills to become a Reinforcement Learning engineer maybe because it is already a Machine Learning branch. Therefore, I think most of the machine learning skills I have mentioned in this post are the same for Reinforcement Learning except for the Reinforcement Learning Skill is now a must and not just preferable if you want to specialize in Reinforcement Learning. But if you really want to know if there are any missing skills or very important special skills for Reinforcement Learning, you would better ask someone already working in that field especially if he is an expert. You could also check this post: 3 skills to master before reinforcement learning (RL).
Thank you. I hope this post has been beneficial to you. I would appreciate any comments if anyone needed more clarifications or if anyone has seen something wrong in what I have written in order to modify it, and I would also appreciate any possible enhancements or suggestions. We are humans, and mistakes are expected from us, but we could also minimize those mistakes by learning from them and by seeking to improve what we do and how we do it.
Allah bless our master Muhammad and his family.