# Improvement of Decision Tree Learning in Hierarchical Reinforcement Learning Using Fourier Series Function Approximation

State discretization is a common approach for solving Reinforcement Learning (RL) problems with continuous state-spaces. Based on this approach we can employ conventional tabular learning methods in continuous state-spaces. Although tabular methods perform well in small problems, they do not have a good scalability and their learning time will grow exponentially with the size of dimensions of the state-space. Heterogeneous discretization tries to reduce the number of states by having course discretization and more accurate discretization where it is needed. Decision trees are a common way of heterogeneous discretization. However, in RL they cannot easily be used due to the interactive nature and improving policy of RL. Current methods work by storing samples from agent's transactions in the environment and defining a value for each sample, in the next step they split those samples according to differences in distributions of the defined values. These methods have two problem, first learning the true value of a sample requires a lot of leaning and the estimated value is also related to the previous splits. Secondly, split criteria are based on the value of samples, and at some point there will be differences in what the tree structure describes and the distribution of the values, this will cause more splits to correct the description, in other word we have more splits and more accuracy in wrong places because the value of samples are not mature enough.
In this thesis, first we improve the speed of learning and the amount of generalizations in the learning process by employing linear function approximation with Fourier basis. Later we propose a heterogeneous discretization method according to the learned values. The conventional linear function approximation methods require a predefined degrees of freedom which will define how complex the approximated function can be. However, in our approach we overcome this, by identifying an area with a complex value function of the state-space and splitting it into two less complex sub-spaces, then we put a simple linear function approximator with small degrees of freedom in each sub-space. We repeat this process for each sup-space until the value function in the sub-space is not more complex than its corresponding simple linear function approximator.
Using this idea, we hierarchically split a big and complex state-space into a small and simple sub-states in a decision tree. The method is tested on the ``mountain car'' problem and it outperformed the conventional methods in terms of accumulated reward, reward in episode, and calculation time.

## Provider امین نیازی email: amin.niazi@gmail.com

## Supervisor مسعود اسدپور email: asadpour [AT] ut.ac.ir

آدرس کوتاه :