Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
ZiweiGuan_Dissertation.pdf (3.71 MB)
ETD Abstract Container
Abstract Header
Theoretical Analysis of Online Learning with Limited Feedback
Author Info
Guan, Ziwei
ORCID® Identifier
http://orcid.org/0000-0003-1967-7663
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu1701995430583419
Abstract Details
Year and Degree
2024, Doctor of Philosophy, Ohio State University, Electrical and Computer Engineering.
Abstract
Online learning encompasses various practical applications where data and learning occur sequentially, such as online optimization, bandit learning, and reinforcement learning. The core challenges of these problems are centered around the efficient utilization of accumulated data. In this thesis, we delve into several intriguing online learning problems with different limitations on the feedback, e.g., the number of oracle feedback is limited per iteration, the feedback is perturbed by an adversary, the reward signal is not available, and on-policy feedback is not available. For each problem, we design efficient algorithms and theoretically capture their performance. Particularly, this thesis makes the following contributions. Firstly, we explore the online nonconvex optimization problem with different oracle feedbacks. To begin with, we consider the settings where only a limited number of oracles are available. We provide the lower bounds of the window-smoothed local regret for the class of linear-span algorithms and show the exact algorithms that achieve the (near-)optimal regret. Subsequently, we study a more challenging setting, where access to only a single oracle is allowed per time step, and take the local regret of the original (i.e., unsmoothed) objective functions as the performance metric. Specifically, we derive lower bounds on the local regret for the class of linear-span algorithms when the single gradient oracle is available and show the (near-)optimal algorithms. Secondly, we investigate the bandit learning under unbounded probabilistic adversarial attack and propose two robust algorithms, median-based $\epsilon$-greedy (med-$\epsilon$-greedy) and median-based exploration-aided UCB (med-E-UCB), and prove that both of them can achieve the logarithmic pseudo-regret, which attains the regret lower bound for the setting without adversarial attack. Our theoretic findings are further validated through computational simulations and experimental results. Thirdly, we study imitation learning, where the reward signal is absent and the algorithms mainly learn from expert demonstrations. Particularly, our focus is the generative adversarial imitation learning (GAIL) algorithm under general MDP with nonlinear reward function classes. We characterize the global convergence with a sublinear rate for a broad range of commonly used policy gradient algorithms, including projected policy gradient (PPG)-GAIL, Frank-Wolfe policy gradient (FWPG)-GAIL, trust region policy optimization (TRPO)-GAIL and natural policy gradient (NPG)-GAIL. This gives the first systematic analysis for GAIL under general MDP. Lastly, we focus on the function estimation problem in reinforcement learning, where only off-policy samples are available. Without on-policy data, we introduce the PER-ETD (i.e., PEriodically Restarted-Emphatic Temporal-Difference), which restarts and updates the follow-on trace only for a finite period for each iteration of the evaluation parameter. Further, PER-ETD features a design of the logarithmical increase of the restart period with the number of iterations, which guarantees the best trade-off between the variance and bias and keeps both vanishing sublinearly. We show that PER-ETD improves Emphatic Temporal-Difference (ETD) from both theoretical and experimental sides.
Committee
Yingbin Liang (Advisor)
Atilla Eryilmaz (Committee Member)
Kiryung Lee (Committee Member)
Zhihui Zhu (Committee Member)
Pages
274 p.
Subject Headings
Computer Engineering
;
Electrical Engineering
Keywords
online learning, regret analysis, convergence rate analysis, limited feedback
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Guan, Z. (2024).
Theoretical Analysis of Online Learning with Limited Feedback
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1701995430583419
APA Style (7th edition)
Guan, Ziwei.
Theoretical Analysis of Online Learning with Limited Feedback.
2024. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu1701995430583419.
MLA Style (8th edition)
Guan, Ziwei. "Theoretical Analysis of Online Learning with Limited Feedback." Doctoral dissertation, Ohio State University, 2024. http://rave.ohiolink.edu/etdc/view?acc_num=osu1701995430583419
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu1701995430583419
Download Count:
32
Copyright Info
© 2023, all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.