Search Results (1 - 15 of 15 Results)

Sort By  
Sort Dir
 
Results per page  

Ewing, GabrielKnowledge Transfer from Expert Demonstrations in Continuous State-Action Spaces
Master of Sciences, Case Western Reserve University, 2018, EECS - Computer and Information Sciences
In this thesis, we address the task of reinforcement learning in continuous state and action spaces. Specifically, we consider multi-task reinforcement learning, where a sequence of reinforcement learning tasks have to be solved, and inverse reinforcement learning, where a reward function has to be learned from expert demonstrations. We also use trees to represent models, rewards, and value functions in our domains. First, we design an algorithm to learn from demonstration in the presence of a non- smooth reward function. Second, we design another algorithm to perform offline reinforcement learning in the same scenario. This allows us to re-use experiences to help with new tasks. Third, we introduce a method to incorporate weak knowledge about policies with online learning in policy gradient algorithms. These contributions allow us to create a pipeline that efficiently learns and transfers knowledge across a sequence of tasks. We demonstrate our approaches on the task of learning control of a prosthetic arm from expert demonstrations under various scenarios in simulation.

Committee:

Soumya Ray, Dr. (Advisor); Michael Fu, Dr. (Committee Member); M. Cenk Cavusoglu, Dr. (Committee Member); Michael Lewicki, Dr. (Committee Member)

Subjects:

Computer Science

Keywords:

Machine learning; reinforcement learning; continuous actions; knowledge transfer; prostheses

Fung, ShamStochastic Game Theory Applications for Power Management in Cognitive Networks
MDS, Kent State University, 2014, School of Digital Sciences
Power allocation is a challenging issue in wireless networks and mobile communication systems, which is mainly due to the scarcity of available spectrum while trying to maximize the frequency reuse factor. Power allocation is a decision making process by a device that could severely encroach the date rate of other devices within its transmission range. Game theory is a mathematical tool that can be used to solve a multi-user decision making problem. In this thesis, we consider a cognitive network in which primary users have fixed data rate and can use various power levels, while secondary users can adjust their data rate and power level, accordingly, to maximize the spectrum utilization. The contribution of this thesis is 3-fold. First, we propose a stochastic game theory framework in which capture peculiarity that is carried by secondary users is offset by choosing appropriate power. Second, we prove the existence of achieving an equilibrium state as transmissions proceed. Third, a distributed power management algorithm, based on value iteration method, is thrived to solve the stochastic game and yields an optimal policy for each secondary user. Finally, we have developed a simulation model to test the algorithm. The simulation result shows how the total power consumption evolves with respect to the non-cooperative (misbehaving) users. The results also show how the algorithm allows users to self-adapt to changes and to equalize fast in the slow mobility environment.

Committee:

Hassan Peyravi (Advisor)

Subjects:

Computer Science

Keywords:

wireless; cognitive; reinforcement learning; game theory

Goeringer, TylerMassively Parallel Reinforcement Learning With an Application to Video Games
Master of Sciences, Case Western Reserve University, 2013, EECS - Computer and Information Sciences
We propose a framework for periodic policy updates of computer controlled agents in an interactive scenario. We use the graphics processing unit (GPU) to accelerate an offline reinforcement learning algorithm which periodically updates an online agent's policy. The main contributions of this work are the use of GPU acceleration combined with a periodic update model to provide reinforcement learning in a performance constrained environment. We show empirically that given certain environment properties, the GPU accelerated implementation provides better performance than a traditional implementation utilizing the central processing unit (CPU). In addition, we show that while an online machine learning algorithm can function in some performance constrained environments, an offline algorithm reduces the performance constraints allowing for application to a wider variety of environments. Finally, we demonstrate combining these techniques to control an agent in the world of Quake III Arena, resulting in a computer controlled agent capable of adapting to different opponents.

Committee:

Soumya Ray (Advisor); Swarup Bhunia (Committee Member); Michael Lewicki (Committee Member); Frank Merat (Committee Member)

Subjects:

Computer Science

Keywords:

gpu; artificial intelligence; parallel programming; reinforcement learning; machine learning; least-squares policy iteration; lspi; quake iii

James, James PenceEffectiveness of learned reinforcement as a function of primary reward percentage and number of primary pairings /
Doctor of Philosophy, The Ohio State University, 1966, Graduate School

Committee:

Not Provided (Other)

Subjects:

Psychology

Keywords:

Reinforcement ;Learning

Thomas, Philip S.A Reinforcement Learning Controller for Functional Electrical Stimulation of a Human Arm
Master of Sciences, Case Western Reserve University, 2009, EECS - Computer and Information Sciences
This thesis demonstrates the feasibility of using reinforcement learning (RL) for functional electrical stimulation (FES) control of a human arm as an improvement over (i) previous closed-loop controllers for upper extremities that are unable to adapt to changing system dynamics and (ii) previous RL controllers that required thousands of arm movements to learn. We describe the relevance of the control task and how it can be applied to help people with spinal cord injuries. We also provide simulations that show previous closed-loop controllers are insufficient. We provide background on possible RL techniques for control, focusing on a continuous actor-critic architecture that uses function approximators for its mappings. We test various function approximators, including Artificial Neural Networks (ANNs) and Locally Weighted Regression (LWR) for this purpose. Next, we introduce a novel function approximator, Incremental Locally Weighted Regression (ILWR), which is particularly suited for use in our RL architecture. We then design, implement, and perform clinically relevant tests using ANNs for the two mappings in the continuous actor-critic. During these trials, unexpected behavior is observed and eventually used to create a hybrid controller (that switches among different learning parameter sets) that can both adapt to changes in arm dynamics in 200 to 300 arm movements and remain stable in the long-term. A non-switching controller with similar performance is achieved using ILWR in place of an ANN for the controller's critic mapping.

Committee:

Michael Branicky, PhD (Advisor); Antonie van den Bogert, PhD (Committee Member); Soumya Ray, PhD (Committee Member)

Subjects:

Computer Science

Keywords:

reinforcement learning; locally wieghted regression; incremental locally weighted regression; LWR; ILWR; ANN; actor-critic; continuous actor-critic; functional electrical stimulation; FES; adaptive control

Rumberger, Jessica LThe Effects of Interspersal and Reinforcement on Math Fact Accuracy and Learning Rate
PhD, University of Cincinnati, 2013, Education, Criminal Justice, and Human Services: School Psychology
Mathematics skill acquisition is a crucial component of education and ongoing research is needed to determine quality instructional techniques. A ubiquitous instructional question is how to manage time. This study investigated several flashcard presentation methods to determine the one that would provide the most learning in a set amount of time. A multi-element design was planned for a study comparing the effects of the interspersal (IS) and traditional drill (TD) flashcard methods with and without additional contingent reinforcement on learning rate and accuracy of unknown math facts. Participants included 5 third-grade students selected based on teacher concern with their math fact knowledge. While the areas of IS and positive reinforcement are well researched, less research exists for combining and comparing the two variables for math skill acquisition. The reinforcement conditions (TD+R and IS+R) consisted of access to a prize box based on improvement of learning rate. The TD and TD+R conditions were comprised of the presentation of seven unknown facts per trial, while the IS and IS+R conditions were comprised of the addition of known facts folded in after the presentation of two unknown facts for a total of seven facts per trial (i.e., K, U, U, K, U, U, K). A consistent amount of total facts was used to control for instructional time to determine the technique that would yield the best results in that time frame. Accuracy and learning rate data were visually analyzed by looking at trend and level of data across the four conditions. The traditional drill + reinforcement (TD+R) condition led to the highest accuracy and learning rate levels for 4 out of 5 students. Practical implications, limitations and suggestions for future research are discussed.

Committee:

David Barnett, Ph.D. (Committee Chair); Renee Oliver Hawkins, Ph.D. (Committee Member); Todd Haydon, Ph.D. (Committee Member)

Subjects:

Education

Keywords:

Traditional Drill;Interspersal;Flashcards;Positive Reinforcement;Learning Rate;Accuracy;

Schwab, DevinHierarchical Sampling for Least-Squares Policy Iteration
Master of Sciences, Case Western Reserve University, 2016, EECS - Computer and Information Sciences
For large Sequential Decision Making tasks, an agent may need to make lots of exploratory interactions within the environment in order to learn the optimal policy. Large amounts of exploration can be costly in terms of computation, time for interactions, and physical resources. This thesis studies approaches to incorporate prior knowledge to reduce the amount of exploration. Specifically, I propose an approach that uses a hierarchical decomposition of the Markov Decision Process to guide an agent's sampling process, in which the hierarchy is treated as a set of constraints on the sampling process. I show theoretically that, in terms of distributions of state-action pairs sampled with respect to hierarchical states, variants of my approach have good convergence properties. Next, I perform an extensive empirical validation of my approach by comparing my methods to baselines which do not use the prior information during the sampling process. I show that using my approach, not only will irrelevant state-action pairs be avoided while sampling, but that the agent can learn a hierarchically optimal policy with far fewer samples than the baseline techniques.

Committee:

Soumya Ray (Advisor); Cenk Cavusoglu (Committee Member); Michael Lewicki (Committee Member); Harold Connamacher (Committee Member)

Subjects:

Computer Science

Keywords:

reinforcement learning; MaxQ; LSPI; Least-Squares Policy Iteration

Riemer, Helmut HerbertRelationship of cognitive style and reinforcement learning in counseling /
Doctor of Philosophy, The Ohio State University, 1967, Graduate School

Committee:

Not Provided (Other)

Subjects:

Psychology

Keywords:

Counseling;Reinforcement learning

Krothapalli, Ujwal KarthikDynamic Programming: An Optimization tool Applied to Mobile Robot Navigation and Resource Allocation for Wildfire Fighting
MS, University of Cincinnati, 2010, Engineering and Applied Science: Mechanical Engineering

This thesis employs the classical theory of Dynamic programming especially Reinforcement learning to solve two research problems: i) Mobile Robot navigation in an unknown environment; ii) Resource allocation in wildfire fighting.

The first part of this thesis deals with mobile robot navigation in an unknown environment. Robot navigation in an unknown environment is a very challenging problem. To navigate to a desired goal position in a given environment optimally, the robot needs to interact with the environment, remember the results of such interactions and understand the environment. Reinforcement learning is a classical way to solve this problem. This thesis is focused on improving the reinforcement learning approach applied for robot navigation using a variable grid size strategy. Also, we focus on a novel way to map and understand the obstacles of the environment. The variable grid size algorithm is an extension of reinforcement learning. We describe the methods of the uniform (fixed) and the variable resolution grid size based reinforcement learning. A test bed is used to evaluate the real world scenario and a variety of simulation environments are used to evaluate the improved algorithm.

The second part of the thesis is focused on resource allocation in fighting forest wildfires. Wildfires have known to wreak havoc and their ever increasing intensity have made us rethink about fire fighting strategies. Many decision support systems have been deployed in the recent past. However, most of them lack the ability to provide strategies to optimally contain the fire and their ability to adapt to dynamic conditions in the case of multiple wildfire sites in the same vicinity is limited. A dynamic programming based approach for optimal resource allocation to minimize the total burnt area has been employed in this thesis. For the sake of simplicity only homogeneous scenarios have been considered. We subject the proposed algorithm to Monte-Carlo simulations to obtain optimal strategies applicable when dealing with multiple fire sites. We conclude by discussing the results obtained in both the applications and the other possible applications of the above discussed methodology.

Committee:

Manish Kumar, PhD (Committee Chair); David Thompson, PhD (Committee Member); Masoud Ghaffari, PhD (Committee Member)

Subjects:

Robots

Keywords:

Dynamic Programming;Reinforcement Learning;Mobile Robot Navigation;Path Planning;Wildfire Fighting

Konovalov, ArkadyEssays in Behavioral Economics
Doctor of Philosophy, The Ohio State University, 2017, Economics
In this thesis, I show how non-choice data (response times and gaze data) can be used in economics, specifically in the individual preferences domain, strategic settings, and value learning. Particularly, the first two chapters demonstrate how response times (RTs) can be used to infer individual preferences or and thus can be considered and/or manipulated in strategic settings. In the last chapter, I use gaze data to identify attention effects in reinforcement learning. In the first chapter of the dissertation, “Revealed Indifference: Using Response Times to Infer Preferences”, response time data are used to estimate individual utility functions. Revealed preference is the dominant approach for inferring preferences, but it relies on discrete, stochastic choices. The choice process also produces response times which are continuous and can often be observed in the absence of informative choice outcomes. Moreover, there is a consistent relationship between RTs and strength-of-preference, namely that people make slower decisions as they approach indifference. This relationship arises from optimal solutions to sequential information sampling problems. We investigate several ways in which this relationship can be used to infer preferences when choice outcomes are uninformative or unavailable. We show that RTs from a single binary-choice problem are enough to usefully rank people according to their risk preferences. The second chapter, titled “On The Strategic Use of Response Times”, further investigates the role of response times in strategic settings. We designed a laboratory experiment with a the bargaining game has two periods, where a seller with zero marginal costs makes two price offers to a buyer with a value randomly drawn from a uniform distribution, and profits are discounted if a deal is made in the second period. We found that the RTs were negatively correlated with the buyers’ values: low value buyers were faster to say “no” in the first round. In the second part of the experiment, participants were able to correctly infer the values and earn higher profits. In the final part of the experiment, subjects in the role of buyers were more likely to pick an offer that was made in response to a faster RT, as if they were manipulating their RTs to get better offers. The last chapter, “Attention Effects in Model-Based and Model-Free Reinforcement Learning” uses eye-tracking to study learning strategies known as model-free and model-based value learning; the former is mere reinforcement of previously rewarded actions and the latter is a forward-looking strategy that involves evaluation of action-state transition probabilities. Prior work has used neural data to argue that both model-based and model-free learners implement a value comparison process at trial onset, but model-based learners assign more weight to forward-looking computations. Using eye-tracking, we report evidence for a different interpretation of prior results: model-based subjects make their choices prior to trial onset. In contrast, model-free subjects tend to ignore model-based aspects of the task and instead seem to treat the decision problem as a simple comparison process.

Committee:

Ian Krajbich (Advisor); Paul Healy (Committee Member); John Kagel (Committee Member)

Subjects:

Economics

Keywords:

economics, bargaining, response times, game theory, learning, model-based learning, reinforcement learning

Graham, James T.Development of Functional Requirements for Cognitive Motivated Machines
Doctor of Philosophy (PhD), Ohio University, 2016, Electrical Engineering & Computer Science (Engineering and Technology)
Machine Intelligence, and all of its associated fields and specialties, is a wide and complex area actively researched in laboratories around the world. This work aims to address some of the critical problems inherent in such research, from the most basic neural network structures, to handling of information, to higher level cognitive processes. All of these components and more are needed to construct a functioning intelligent machine. However, creating and implementing machine intelligence is easier said than done, especially when working from the ground up as many researchers have attempted. Instead, it is proposed that the problem be approached from both bottom-up and top-down level design paradigms, so that the two approaches will benefit from and support one another. To clarify, my research looks at both low level learning, and high level cognitive models and attempts to work toward a middle ground where the two approaches are combined into a single cognitive system. Specifically, this work covers the development of the Motivated Learning Embodied Cognition (MLECOG) model, and the associated components required for it to function. These consist of the Motivated Learning approach, various types of memory, action monitoring, visual and mental saccades, focus of attention, attention switching, planning, etc. Additionally, some elements needed for processing sensory data will be briefly examined because they are relevant to the eventual creation of a full cognitive model with proper sensory/motor I/O. The development of the Motivated Learning cognitive architecture is covered from its initial beginnings as a simple Motivated Learning algorithm to its advancement to a more complex architecture and eventually the proposed MLECOG model. The objective of this research is to show that a cognitive architecture that uses motivated learning principles is feasible, and to provide a path toward its development.

Committee:

Janusz Starzyk (Advisor); Mehmet Celenk (Committee Member); Savas Kaya (Committee Member); Jeff Dill (Committee Member); Jeff Vancouver (Committee Member); Annie Shen (Committee Member)

Subjects:

Cognitive Psychology; Computer Science; Electrical Engineering

Keywords:

artificial intelligence; motivated learning; reinforcement learning; cognitive model; embodied intelligence; cognitive architecture

Tennety, SrinivasMobile robot navigation in hilly terrains
MS, University of Cincinnati, 2011, Engineering and Applied Science: Computer Science
Mobile robot navigation in hilly terrains is challenging since the environment is unstructured, ill-conditioned and complex. The features of the terrain cannot be easily classified as traversable or non-traversable and therefore identification of paths that pose minimum danger to the robot becomes difficult. One approach to navigation in hilly terrain is based on unsupervised learning where robot learns via interacting with the environment based on trial and error. This method can be implemented using reinforcement learning. However, this approach is not applicable to real world applications as the robot might incur unrecoverable damage while interacting with the environment. Another approach is using human expert knowledge. Humans learn from their past experiences and display an uncanny ability to identify safe paths even in the presence of uncertainties. Therefore, it is beneficial to use the human expert knowledge when available, to aid in navigation of robots in complex terrains. This thesis presents a framework where human expert assistance is used to guide the robot to the goal through reinforcement learning techniques. When a prior knowledge of the terrain such as low resolution aerial view is available, a human expert can identify one or more paths from start to goal that are relatively safe to traverse. These expert paths are used to approximate a value matrix that would steer the robot from any start position in the terrain to the goal avoiding any unsafe regions that pose obvious danger. This approach aids in global path planning and does not take local terrain information in to the consideration that might not be available to the expert. To facilitate incorporation of local terrain information, a fuzzy logic controller is designed which can be used to update the value matrix based on the local sensor data. Experiments have been carried out in simulated hilly terrains with and without the expert paths to show the effectiveness of the approach. Different scenarios have been discussed to demonstrate the advantages of specifying multiple expert paths over few and also the integration of the fuzzy logic controller.

Committee:

Raj Bhatnagar, PhD (Committee Chair); Ernest Hall, PhD (Committee Member); Manish Kumar, PhD (Committee Member)

Subjects:

Robots

Keywords:

Mobile robot;Reinforcement learning;Hilly terrains;Autonomous Navigation

Ernsberger, Timothy S.Integrating Deterministic Planning and Reinforcement Learning for Complex Sequential Decision Making
Master of Sciences (Engineering), Case Western Reserve University, 2013, EECS - Computer and Information Sciences
This thesis presents a novel approach to solving decision-making problems in discrete, stochastic domains. The method for solving these problems is often dictated by the availability of information about how the environment responds to actions taken by the agent. When the agent is given a model of the environment, it can plan out its actions beforehand, whereas an agent without a model must learn to differentiate good and bad decisions through direct experience. Until now, little attention has been paid to situations in which a model is only available for a part of the environment. We propose an algorithm which combines automated planning with hierarchical reinforcement learning in order to take advantage of the model when it is available and sample from the environment when it is not. We prove that the same guarantees of optimality that apply to hierarchical reinforcement learning also apply to to this approach. Using experiments performed in two different domains, we demonstrate that hierarchically integrated planning and reinforcement learning outperforms pure RL and pure planning hierarchies and that this approach can scale to larger problems than are reasonably computable by other approaches.

Committee:

Soumya Ray (Advisor); Podgurski Andy (Committee Member); Lewicki Michael (Committee Member)

Subjects:

Artificial Intelligence

Keywords:

reinforcement learning; automated planning; Markov decision process

Jagodnik, Kathleen MReinforcement Learning and Feedback Control for High-Level Upper-Extremity Neuroprostheses
Doctor of Philosophy, Case Western Reserve University, 2014, Biomedical Engineering
High-level spinal cord injury causes paralysis below the level of the neck. Functional Electrical Stimulation (FES) is a technology that restores voluntary movement via application of electrical current to nerves and muscles. Our work aims to restore movement in the paralyzed upper limb. When implementing FES systems, effective controllers are needed to translate the current and desired arm positions into a pattern of muscle stimulations that achieve the target position accurately and efficiently. Although a range of upper-extremity neuroprosthesis controllers exist, none is capable of restoring accurate, natural arm movement in a clinical setting. For the purpose of advancing upper-extremity FES control technology, we explore reinforcement learning (RL), a control strategy that uses delayed reward and a trial-and-error search to develop its action policy. A potential advantage of RL control for upper-extremity FES systems is that human user preferences can be incorporated into controller training through the use of user-generated rewards of the controller actions. To date, RL control has been minimally explored for FES systems, and human rewards have never been incorporated for this application. An RL controller was implemented for a planar 2 degree of freedom biomechanical arm model, and this project explored the feasibility of using human rewards to train the RL controller. Simulation experiments were performed using pseudo-human, computer generated rewards that simulate the rewards that a human would be likely to assign. A range of experiments was performed to examine the learning properties of RL control using human-like rewards, and it was determined that RL controller learning occurs over a measurable time frame. Subsequently, human rewards were introduced to train the RL controller. Ten human subjects viewed animations of arm reaching movements, and assigned rewards to train the RL controller based on the quality of each movement. The RL controllers trained by humans learned well, although pseudo-human reward training was found to be equally effective. We discuss the potential benefits of using pseudo-human rewards for initial RL controller training, with subsequent fine-tuning training using human rewards. Reinforcement learning is a promising control strategy to restore natural arm movement to individuals with high-level paralysis.

Committee:

Robert Kirsch, Ph.D. (Advisor); Antonie van den Bogert, Ph.D. (Committee Member); Dawn Taylor, Ph.D. (Committee Member); Kenneth Gustafson, Ph.D. (Committee Member)

Subjects:

Artificial Intelligence; Behavioral Psychology; Behavioral Sciences; Biomedical Engineering; Computer Engineering; Computer Science; Engineering; Health Care; Psychology; Rehabilitation

Keywords:

Functional Electrical Stimulation; Control; Upper Extremity; Arm; Reinforcement Learning; High-Level Paralysis; Spinal Cord Injury; Feedback Control; Proportional-Derivative; Simulation; Human Experiments; Musculoskeletal Model

Skelly, Margaret MaryHierarchical Reinforcement Learning with Function Approximation for Adaptive Control
Doctor of Philosophy, Case Western Reserve University, 2004, Systems and Control Engineering
This dissertation investigates the incorporation of function approximation and hierarchy into reinforcement learning for use in an adaptive control setting through empirical studies. Reinforcement learning is an artificial intelligence technique whereby an agent discovers which actions lead to optimal task performance through interaction with its environment. Although reinforcement learning is usually employed to find optimal problem solutions in unchanging environments, a reinforcement learning agent can be modified to continually explore and adapt in a dynamic environment, carrying out a form of direct adaptive control. In the adaptive control setting, the reinforcement learning agent must be able to learn and adapt quickly enough to compensate for the dynamics of the environment. Since reinforcement learning is known to converge slowly to optimality in stationary environments, the use of abstraction and changes in task representation are examined as a means to accelerate reinforcement learning. Various levels of abstraction and task representations are incorporated into reinforcement learning agents through the use of function approximation and hierarchical task decomposition. The effectiveness of this approach is tested in simulations of representative reinforcement learning tasks. The comparison of the learning and adaptation times for differing levels of abstraction and competing task representations provides insight into the suitability of these techniques to accelerate learning and adaptation. The level of abstraction is examined in experiments where the reinforcement learning agent uses function approximation to store its learned information. The function approximation method chosen provides local generalization, which provides for a controlled diffusion of information throughout the task space. As a consequence, the experiments conducted with function approximation demonstrate how greater levels of abstraction, as determined by the amount of information diffusion, can accelerate learning in tasks where similar states call for similar actions. Hierarchical task decomposition provides a means of representing a task as a set of related subtasks, which introduces modularity into the task’s representation not possible in a monolithic representation. One effect of the hierarchy’s modularity is to contain certain environment changes within the smaller space of a subtask. Therefore, the experiments comparing hierarchical and monolithic representations of a task demonstrate that the hierarchical representation can accelerate adaptation in response to certain isolated environment changes.

Committee:

Michael Branicky (Advisor)

Keywords:

reinforcement learning; hierarchy; function approximation; adaptive control; tile coding; CMAC; supervisory-based hierarchy; hopper