In this thesis, we address the task of reinforcement learning in continuous state and action spaces. Specifically, we consider multi-task reinforcement learning, where a sequence of reinforcement learning tasks have to be solved, and inverse reinforcement learning, where a reward function has to be learned from expert demonstrations. We also use trees to represent models, rewards, and value functions in our domains. First, we design an algorithm to learn from demonstration in the presence of a non- smooth reward function. Second, we design another algorithm to perform offline reinforcement learning in the same scenario. This allows us to re-use experiences to help with new tasks. Third, we introduce a method to incorporate weak knowledge about policies with online learning in policy gradient algorithms. These contributions allow us to create a pipeline that efficiently learns and transfers knowledge across a sequence of tasks. We demonstrate our approaches on the task of learning control of a prosthetic arm from expert demonstrations under various scenarios in simulation.