Abstract:
The primary goal of Demand Response (DR) is to lower the system’s max imum demand. The introduction of smart grid and bidirectional communications
make the implementation easier. A common way of cost minimization is shifting
the loads from peak hours to off-peak hours. Reinforcement Learning (RL) is used
for solving various optimization problem. Since the nature of the power system is
stochastic, implementation of DR using RL techniques makes it more suitable. Here
a scenario of scheduling residential loads with flexible devices is considered with the
aim of minimization in energy consumption and minimum discomfort to the con sumers. Q-learning which is a variant of RL is used for implementing the scheduling.
The main concern is to find a balance between exploration and exploitation. One
of the traditional RL methods is used for balancing exploration and exploitation
is epsilon-greedy algorithm. The main challenge in the implementation of ϵ-greedy
algorithm is to obtain the cooling schedule for balancing the exploration and ex ploitation. In this project, we propose an efficient algorithm for the action selection
that is Pursuit algorithm. Here the performance of epsilon greedy is analyzed for
various cooling schedule methods. The performance of RL algorithm using ϵ-greedy
and pursuit algorithm is compared. The only parameter that depends on the per formance of pursuit algorithm is the convergence rate β. As the dependency of
pursuit algorithm on the hyperparameters is less and also no predefined episodes
are required the convergence rate is faster than in ϵ-greedy. The performance of the
algorithm is also analyzed using various tariff structures