Show simple item record

dc.contributor.author
Li, Yueshan
dc.contributor.supervisor
Tsiamis, Anastasios
dc.contributor.supervisor
Karapetyan, Aren
dc.contributor.supervisor
Balta, Efe C.
dc.contributor.supervisor
Lygeros, John
dc.date.accessioned
2023-10-20T07:30:26Z
dc.date.available
2023-10-18T05:56:11Z
dc.date.available
2023-10-18T09:11:23Z
dc.date.available
2023-10-20T07:30:26Z
dc.date.issued
2023-08
dc.identifier.uri
http://hdl.handle.net/20.500.11850/637248
dc.identifier.doi
10.3929/ethz-b-000637248
dc.description.abstract
In the optimal control of unknown systems, offline approaches, such as reinforcement learning or system identification can be helpful in a number of scenarios and have proven themselves over decades. These, however, hinge on a number of crucial assumptions that often fail to hold in practice, most importantly, the availability of a reliable simulator and the possibility of offline learning/identification. When these do not hold or hold partially, the need of online or ’on the-go’ algorithms becomes apparent; these control the system while aiming to stay as close as possible to a performance objective which is revealed only sequentially. The tracking problem of an unknown reference signal is an example of such a problem. It is a challenging task, yet it appears frequently in practice, for example in tracking of a flock of wild animals or in pursuit of malicious agents. To achieve close tracking of an unknown/adversarial target, it is critical for the controller to learn online from collected data during the operation and to adapt to changes fast. We consider the linear quadratic tracking case and propose an online algorithm, RLS MPC, that uses recursive least squares to learn the time-varying dynamic model of the target and solves for the optimal policy under the framework of receding horizon control. We show that its dynamic regret scales with the rate of change of the target dynamics, as opposed to rate of change of target states as in previous works. We prove that for slow-changing target dynamics, like periodic targets, the dynamic regret of RLS-MPC is bounded by O(log T). For general targets, the algorithm achieves a bound of O(1 + √V T), where V is the path length of the target dynamics. We implement the proposed controller on a quadrotor model and validate it both in simulations and on a real mini-quadrotor.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
ETH Zurich
en_US
dc.rights.uri
http://rightsstatements.org/page/InC-NC/1.0/
dc.subject
Online Control
en_US
dc.title
Online Learning and Control for Tracking Unknown Targets
en_US
dc.type
Master Thesis
dc.rights.license
In Copyright - Non-Commercial Use Permitted
ethz.size
49 p.
en_US
ethz.publication.place
Zurich
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02140 - Dep. Inf.technologie und Elektrotechnik / Dep. of Inform.Technol. Electrical Eng.::02650 - Institut für Automatik / Automatic Control Laboratory::03751 - Lygeros, John / Lygeros, John
en_US
ethz.date.deposited
2023-10-18T05:56:11Z
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2023-10-20T07:30:27Z
ethz.rosetta.lastUpdated
2023-10-20T07:30:27Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Online%20Learning%20and%20Control%20for%20Tracking%20Unknown%20Targets&rft.date=2023-08&rft.au=Li,%20Yueshan&rft.genre=unknown&rft.btitle=Online%20Learning%20and%20Control%20for%20Tracking%20Unknown%20Targets
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record