IRUS Total

Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation

File Description SizeFormat 
fpl17ss.pdfAccepted version221.39 kBAdobe PDFView/Open
Title: Customised pearlmutter propagation: A hardware architecture for trust region policy optimisation
Authors: Shao, S
Luk, W
Item Type: Conference Paper
Abstract: Reinforcement Learning (RL) is an area of machine learning in which an agent interacts with the environment by making sequential decisions. The agent receives reward from the environment to find an optimal policy that maximises the reward. Trust Region Policy Optimisation (TRPO) is a recent policy optimisation algorithm that achieves superior results in various RL benchmarks, but is computationally expensive. This paper proposes Customised Pearlmutter Propagation (CPP), a novel hardware architecture that accelerates TRPO on FPGA. We use the Pearlmutter Algorithm to address the key computational bottleneck of TRPO in a hardware efficient manner, avoiding symbolic differentiation with change of variables. Experimental evaluation using robotic locomotion benchmarks demonstrates that the proposed CPP architecture implemented on Stratix-V FPGA can achieve up to 20 times speed-up against 6-threaded Keras deep learning library with Theano backend running on a Core i7-5930K CPU.
Issue Date: 5-Oct-2017
Date of Acceptance: 4-Sep-2017
URI: http://hdl.handle.net/10044/1/56419
DOI: https://dx.doi.org/10.23919/FPL.2017.8056789
ISBN: 9789090304281
ISSN: 1946-1488
Publisher: IEEE
Journal / Book Title: Field Programmable Logic and Applications (FPL), 2017 27th International Conference on
Copyright Statement: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor/Funder: Engineering & Physical Science Research Council (EPSRC)
Engineering & Physical Science Research Council (E
Commission of the European Communities
Engineering & Physical Science Research Council (E
Funder's Grant Number: EP/I012036/1
PO 1553380
516075101 (EP/N031768/1)
Conference Name: Field Programmable Logic and Applications (FPL), 2017
Publication Status: Published
Start Date: 2017-09-04
Finish Date: 2017-09-08
Conference Place: Ghent, Belgium
Appears in Collections:Faculty of Engineering