Chuyển đến nội dung chính

Bài đăng

Đang hiển thị bài đăng từ Tháng 8, 2024

[AI] Reinforcement learning: Find Optimal Policy

Two method to find optimal policy 👉 Model Dynamic 👉 Model Free 

[AI] Fundamental concepts of Reinforcement Learning

Agent : The agent is the software program that learns to make intelligent decisions, such as a software program that plays chess intelligently. Environment : The environment is the world of the agent. If we continue with the chess example, a chessboard is the environment where the agent plays chess. State : A state is a position or a moment in the environment that the agent can be in. For example, all the positions on the chessboard are called states. Action : The agent interacts with the environment by performing an action and moves from one state to another, for example, moves made by chessmen are actions. Reward : A reward is a numerical value that the agent receives based on its action. Consider a reward as a point. For instance, an agent receives +1 point (reward) for a good action and -1 point (reward) for a bad action. Action space: The set of all possible actions in the environment is called the action space. The action space is called a discrete action space when our action

[AI] Generative Adversarial Networks (GANs)

 Generative Adversarial Networks (GANs):  - Type of deep neural network architecture that uses unsupervised machine learning - Made up by generator and a discriminator network. Both networks train each other, while simultaneously trying to outwit each other. Generator network - Generate new data  from a randomly generated vector of numbers, called a latent space.  Discriminator network - Tries to differentiate between the real data and the data generated. - It can either perform multi-class classification or binary classification. Important concepts related to GANs - Important measure quality of the models use divergence (KL divergence,  JS divergence...). - Nash equilibrium, which is a state that we try to achieve during training. - Objective functions: To measure the similarity. - Scoring algorithms: Calculating the accuracy of a GAN is simple. Some scoring algorithms: some scoring algorithms, some scoring algorithms,  Mode Score... Problems with training GANs: - Mode collapse: gen