Abstract:
The eighth most common cause of death and the top cause of death for people aged
5-29 are traffic accidents. Automated vehicles and driver-assistance systems have emerged
as a promising alternative to lower the number of fatalities in traffic accidents and provide
a safer and more effective transportation system. Accurate driver maneuver prediction in a
dynamic traffic scene remains a difficult topic due to its complexity, despite the considerable
attention of researchers and industry. Driver maneuver prediction is a technique used by
advanced driving assistance systems (ADAS) systems to give the driver early warnings and
help. For instance, the system can inform the user if the driver is about to make a lane
change without indicating. This project deals with accurately predicting the maneuver of
the driver with the help of a deep learning model that utilizes feature fusion from various
data. The dataset is simulated from the Car Learning to Act simulator (CARLA) from which
driver face video, road video, and data from 12 sensors of the car are logged at 20 Hertz. The
model incorporates U-shaped encoder-decoder network architecture (UNET) trained with the
DRI(EYE)VE dataset to gauge the points of attention of the driver which is then used to
extract the points of interest from the road video. A face landmark model is also incorporated
by the model to retrieve essential features from the driver face video. These three features
are fused and are fed into an long short term memory (LSTM) model for contextual learning
and finally, the maneuver at a certain time ahead is classified. Experimental results have
shown that the feature fusion model obtained an accuracy of 80.19%, an overall precision of
87.93%, an overall recall of 87.95%, and an F1 score of 87.80%.