Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Theoretical Analysis of Online Learning with Limited Feedback

Abstract Details

2024, Doctor of Philosophy, Ohio State University, Electrical and Computer Engineering.
Online learning encompasses various practical applications where data and learning occur sequentially, such as online optimization, bandit learning, and reinforcement learning. The core challenges of these problems are centered around the efficient utilization of accumulated data. In this thesis, we delve into several intriguing online learning problems with different limitations on the feedback, e.g., the number of oracle feedback is limited per iteration, the feedback is perturbed by an adversary, the reward signal is not available, and on-policy feedback is not available. For each problem, we design efficient algorithms and theoretically capture their performance. Particularly, this thesis makes the following contributions. Firstly, we explore the online nonconvex optimization problem with different oracle feedbacks. To begin with, we consider the settings where only a limited number of oracles are available. We provide the lower bounds of the window-smoothed local regret for the class of linear-span algorithms and show the exact algorithms that achieve the (near-)optimal regret. Subsequently, we study a more challenging setting, where access to only a single oracle is allowed per time step, and take the local regret of the original (i.e., unsmoothed) objective functions as the performance metric. Specifically, we derive lower bounds on the local regret for the class of linear-span algorithms when the single gradient oracle is available and show the (near-)optimal algorithms. Secondly, we investigate the bandit learning under unbounded probabilistic adversarial attack and propose two robust algorithms, median-based $\epsilon$-greedy (med-$\epsilon$-greedy) and median-based exploration-aided UCB (med-E-UCB), and prove that both of them can achieve the logarithmic pseudo-regret, which attains the regret lower bound for the setting without adversarial attack. Our theoretic findings are further validated through computational simulations and experimental results. Thirdly, we study imitation learning, where the reward signal is absent and the algorithms mainly learn from expert demonstrations. Particularly, our focus is the generative adversarial imitation learning (GAIL) algorithm under general MDP with nonlinear reward function classes. We characterize the global convergence with a sublinear rate for a broad range of commonly used policy gradient algorithms, including projected policy gradient (PPG)-GAIL, Frank-Wolfe policy gradient (FWPG)-GAIL, trust region policy optimization (TRPO)-GAIL and natural policy gradient (NPG)-GAIL. This gives the first systematic analysis for GAIL under general MDP. Lastly, we focus on the function estimation problem in reinforcement learning, where only off-policy samples are available. Without on-policy data, we introduce the PER-ETD (i.e., PEriodically Restarted-Emphatic Temporal-Difference), which restarts and updates the follow-on trace only for a finite period for each iteration of the evaluation parameter. Further, PER-ETD features a design of the logarithmical increase of the restart period with the number of iterations, which guarantees the best trade-off between the variance and bias and keeps both vanishing sublinearly. We show that PER-ETD improves Emphatic Temporal-Difference (ETD) from both theoretical and experimental sides.
Yingbin Liang (Advisor)
Atilla Eryilmaz (Committee Member)
Kiryung Lee (Committee Member)
Zhihui Zhu (Committee Member)
274 p.

Recommended Citations

Citations

  • Guan, Z. (2024). Theoretical Analysis of Online Learning with Limited Feedback [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1701995430583419

    APA Style (7th edition)

  • Guan, Ziwei. Theoretical Analysis of Online Learning with Limited Feedback. 2024. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1701995430583419.

    MLA Style (8th edition)

  • Guan, Ziwei. "Theoretical Analysis of Online Learning with Limited Feedback." Doctoral dissertation, Ohio State University, 2024. http://rave.ohiolink.edu/etdc/view?acc_num=osu1701995430583419

    Chicago Manual of Style (17th edition)