diff --git a/phd-thesis-signed-submitted.pdf b/phd-thesis-signed-submitted.pdf index f3df4c0..96a221d 100644 Binary files a/phd-thesis-signed-submitted.pdf and b/phd-thesis-signed-submitted.pdf differ diff --git a/phd-thesis.org b/phd-thesis.org index d3075bd..b8f4b4d 100644 --- a/phd-thesis.org +++ b/phd-thesis.org @@ -13459,15 +13459,15 @@ cref:tab-geometry-control-comparison offers a succinct comparison of the three a #+LATEX: \caption[Comparison of mode remaining trajectory optimisation algorithms]{Comparison of the \acrfull{ig} method from \cref{sec-traj-opt-collocation}, the \acrfull{dre} method from \cref{sec-traj-opt-energy} and the \acrfull{cai} method from \cref{chap-traj-opt-inference}.} #+LATEX: \label{tab-geometry-control-comparison} #+ATTR_LATEX: :center t :placement [!t] :align lccc -| | \acrshort{cai} | \acrshort{ig} | \acrshort{dre} | -|------------------------------------------------------+----------------+---------------+----------------| -| Dynamics constraints guaranteed? | \checkmark | \times | \checkmark | -| Considers /epistemic uncertainty/ in dynamics? | \checkmark | \times | \checkmark | -| Considers /epistemic uncertainty/ in gating network? | \checkmark | \checkmark | \checkmark | -| Can remain in *multiple* modes? | \checkmark | \times | \times | -| Boundary conditions guaranteed? | \times | \checkmark | \times | -| $\delta-\text{mode remaining}$? | \checkmark | \times | \checkmark | -| Continuous-time trajectory? | \times | \checkmark | \times | +| | \acrshort{ig} | \acrshort{dre} | \acrshort{cai} | +|------------------------------------------------------+---------------+----------------+----------------| +| Dynamics constraints guaranteed? | \times | \checkmark | \checkmark | +| Considers /epistemic uncertainty/ in dynamics? | \times | \checkmark | \checkmark | +| Considers /epistemic uncertainty/ in gating network? | \checkmark | \checkmark | \checkmark | +| Can remain in *multiple* modes? | \times | \times | \checkmark | +| Boundary conditions guaranteed? | \checkmark | \times | \times | +| $\delta-\text{mode remaining}$? | \times | \checkmark | \checkmark | +| Continuous-time trajectory? | \checkmark | \times | \times | # | Decouples goals? | \checkmark | \checkmark | #+end_table diff --git a/phd-thesis.tex b/phd-thesis.tex index 720f0e3..5ba7a15 100644 --- a/phd-thesis.tex +++ b/phd-thesis.tex @@ -1,4 +1,4 @@ -% Created 2022-05-12 Thu 12:58 +% Created 2022-05-12 Thu 13:19 % Intended LaTeX compiler: pdflatex \documentclass{mimosis-class/mimosis} \usepackage{bm} @@ -377,7 +377,7 @@ \end{titlepage} \addchap{Abstract} -\label{sec:org4a0367a} +\label{sec:org2936d84} \setcounter{page}{1} \begin{singlespace} Over the last decade, \textit{learning-based control} has become a popular paradigm for controlling dynamical systems. @@ -405,7 +405,7 @@ \end{singlespace} \addchap{Covid-19 Statement} -\label{sec:orgf152a89} +\label{sec:org301c705} \begin{singlespace} To mitigate risk due to Covid-19 lab closures, many of the methods in this thesis are validated in simulated experiments and not in real-world experiments. @@ -443,7 +443,7 @@ \end{singlespace} \addchap{Declaration} -\label{sec:org3af1889} +\label{sec:org8ab2a8e} \begin{singlespace} \begin{quote} \initial{I} declare that the work in this thesis was carried out in accordance with the requirements of the University's Regulations and Code of Practice for Research Degree Programmes and that it has not been submitted for any other academic award. Except where indicated by specific reference in the text, the work is the candidate's own work. Work done in collaboration with, or with the assistance of, others, is indicated as such. Any views expressed in the thesis are those of the author. @@ -455,7 +455,7 @@ \end{singlespace} \addchap{Acknowledgements} -\label{sec:org4fa4965} +\label{sec:orgf9a2b9f} \begin{singlespace} %Arthur, I have learned so much under your supervision. \initial{I} am deeply grateful to my two supervisors, Arthur Richards and Carl Henrik Ek. @@ -780,7 +780,7 @@ \newcommand{\gatingKL}{\ensuremath{\text{KL}\left( \gatingInducingVariational \mid\mid \gatingInducingPrior \right)}} \newcommand{\gatingsKL}{\ensuremath{\sum_{\modeInd=1}^\ModeInd \text{KL}\left( \gatingInducingVariational \mid\mid \gatingInducingPrior \right)}} \chapter{Introduction} -\label{sec:orgdad7c1d} +\label{sec:org7c879a3} \newcommand{\targetState}{\ensuremath{\state_f}} %\newcommand{\stateDomain}{\ensuremath{\mathcal{X}}} %\renewcommand{\stateDomain}{\ensuremath{\mathcal{S}}} @@ -999,7 +999,7 @@ \chapter{Introduction} Once the agent has explored enough, how can this learned dynamics model be exploited to plan risk-averse trajectories that remain in the desired dynamics mode? \section{Illustrative Example \label{illustrative_example}} -\label{sec:org1a6aab6} +\label{sec:org8072555} The methods developed throughout this thesis are motivated by a 2D quadcopter navigation example. See \cref{fig-problem-statement} for a schematic of the environment and details of the problem. The goal is to fly the quadcopter from an initial state \(\state_0\), to a target state \(\state_{f}\). @@ -1031,7 +1031,7 @@ \section{Illustrative Example \label{illustrative_example}} and the controls consist of the speed in each direction, given by \(\control = (\velocity_x, \velocity_y)\). \section{Contributions} -\label{sec:orgf96ba24} +\label{sec:org29d3960} This thesis explores methods for mode remaining control in multimodal dynamical systems that explicitly reason about the uncertainties arising during learning and control. The primary contributions of this thesis are as follows: @@ -1060,14 +1060,14 @@ \section{Contributions} \end{itemize} \section{Associated Publications} -\label{sec:orge83c60e} +\label{sec:org1e6a94a} The first trajectory optimisation algorithm presented in \cref{sec-traj-opt-collocation} and an initial version of the approach for learning multimodal dynamical systems in \cref{chap-dynamics}, are published in: {\color{BrickRed}\fullcite{scannellTrajectory2021}} \chapter{Background and Related Work} -\label{sec:orgdcbdad4} +\label{sec:org80a26aa} \newcommand{\gpDomain}{\ensuremath{\mathcal{X}}} \newcommand{\dynamicsModel}{\ensuremath{p_{\theta}}} \newcommand{\constraintFunc}{\ensuremath{c}} @@ -1081,7 +1081,7 @@ \chapter{Background and Related Work} \end{displayquote} This chapter formally defines this mode remaining navigation problem and reviews the relevant literature. \section{Problem Statement \label{problem-statement-main}} -\label{sec:orgf9f5a4f} +\label{sec:org81ac93a} Dynamical systems describe the behaviour of a system over time \(t\) and are a key component of both control theory and \acrshort{rl}. At any given time \(t\), a dynamical system has a state, @@ -1189,7 +1189,7 @@ \section{Problem Statement \label{problem-statement-main}} The novelty of this work arises from remaining in the desired dynamics mode \(\desiredMode\). \section{Optimal Control \label{sec-optimal-control}} -\label{sec:org20f4577} +\label{sec:org5f70880} \begin{figure}[!t] \centering \begin{tikzpicture}[ @@ -1231,7 +1231,7 @@ \section{Optimal Control \label{sec-optimal-control}} dynamic programming. \subsection{Dynamic Programming} -\label{sec:org129fcc4} +\label{sec:org9ad6c9e} Dynamic programming \citep{bellmanDynamic1956} encompasses a large class of algorithms that can be used to find optimal controllers given a model of the environment as an \acrshort{mdp}. However, classical dynamic programming algorithms are of limited use as they rely on @@ -1258,7 +1258,7 @@ \subsection{Dynamic Programming} However, these methods are out of the scope of this thesis. \subsection{Reinforcement Learning} -\label{sec:org4f19814} +\label{sec:org5ef3925} There are multiple approaches to finding controllers \(\pi\) that minimise the expected cost in \cref{eq-optimal-control-objective} subject to the stochastic dynamics in \cref{eq-dynamics-main}. However, a central assumption of many methods is that both the system dynamics and cost function are \emph{known a priori}. @@ -1299,7 +1299,7 @@ \subsection{Reinforcement Learning} \end{algorithm} \subsubsection{Model-based Reinforcement Learning} -\label{sec:orgfe6cc12} +\label{sec:org74dc157} This thesis is interested in a subset of \acrshort{rl} known as \acrfull{mbrl}. It solves the optimal control problem in \cref{eq-optimal-control-objective} by first learning a dynamics model and then using this learned model with model-based control techniques. @@ -1325,7 +1325,7 @@ \subsubsection{Model-based Reinforcement Learning} Alternatively, the controller may take the form of a model-based control algorithm such as \acrfull{mpc}. \subsection{Model-based Control and Planning} -\label{sec:org5702bc3} +\label{sec:org89fd03b} This section reviews model-based control methods that leverage learned dynamics models in the \acrshort{mbrl} setting. In the \acrshort{rl} literature, model-based control is often referred to as planning. The work in this thesis is primarily focused on model-based control techniques, @@ -1398,7 +1398,7 @@ \subsection{Model-based Control and Planning} using guided policy search \citep{levineGuided2013}. \subsection{Constrained Control} -\label{sec:org9049c7d} +\label{sec:orgcd2240d} This work aims to control multimodal dynamical systems subject to the mode remaining constraint in \cref{eq-main-problem}. However, neither the underlying dynamics modes nor how the system switches between them, are \emph{known a priori}. @@ -1479,7 +1479,7 @@ \subsection{Constrained Control} i.e. they are applicable with latent constraints. \section{Learning Dynamical Systems for Control} -\label{sec:org412c905} +\label{sec:orgf73b422} This section reviews methods for learning representations of dynamical systems for control. When learning representations of dynamical systems from observations it is important to consider the different forms of uncertainty. @@ -1489,7 +1489,7 @@ \section{Learning Dynamical Systems for Control} \cref{sec-unc-exploration} discusses exploration strategies that leverage well-calibrated uncertainty estimates. \subsection{Sources of Uncertainty} -\label{sec:orgbc90fd5} +\label{sec:org9636e78} This section characterises the uncertainty that arises in \acrshort{rl}. \textbf{Aleatoric uncertainty} @@ -1524,7 +1524,7 @@ \subsection{Sources of Uncertainty} high \emph{aleatoric uncertainty} because they may result in poor performance or even catastrophic failure. \subsection{Learning Single-Step Dynamics Models} -\label{sec:org099a13b} +\label{sec:orgb2b6e3b} This work considers single-step dynamics models with the delta state formulation, which regularises the predictive distribution, given by \(\state_{\timeInd+1} &= \state_\timeInd + \dynamicsFunc(\state_\timeInd, \control_\timeInd) + \bm\epsilon\). @@ -1566,7 +1566,7 @@ \subsection{Learning Single-Step Dynamics Models} \citep{kollerLearningBased2018,hewingCautious2020,hewingLearningBased2020,kollerLearningBased2018,kamtheDataEfficient2018}. \subsection{Gaussian Processes} -\label{sec:orga0a84a4} +\label{sec:org77c2aab} The mathematical machinery underpinning \acrshort{gps} is now detailed. \textbf{Multivariate Gaussian identities} @@ -1722,7 +1722,7 @@ \subsection{Gaussian Processes} of dynamical systems. \subsection{Learning Multimodal Dynamical Systems \label{to_motivation}} -\label{sec:org61216cb} +\label{sec:orge4a4e5c} \begin{figure}[t!] \centering \begin{minipage}{0.49\textwidth} @@ -1809,7 +1809,7 @@ \subsection{Learning Multimodal Dynamical Systems \label{to_motivation}} that are rich with information regarding how a system switches between its underlying dynamics modes. \section{Uncertainty-based Exploration Strategies \label{sec-unc-exploration}} -\label{sec:orgedca64e} +\label{sec:org11460d5} \acrfull{rl} agents face a trade-off between \emph{exploration}, where they seek to explore the environment and improve their models, and \emph{exploitation}, where they make decisions which are optimal for the data observed so far. There are many approaches from the literature used to tackle the exploration-exploitation trade-off. @@ -1937,7 +1937,7 @@ \section{Uncertainty-based Exploration Strategies \label{sec-unc-exploration}} \cite{buisson-fenetActively2020} considers the entropy over a horizon. \chapter{Probabilistic Inference for Learning Multimodal Dynamical Systems \label{chap-dynamics}} -\label{sec:org4a7541c} +\label{sec:org688ebee} \epigraph{All models are wrong, but some are useful.}{\textit{George Box}} \newcommand{\stateDomain}{\ensuremath{\mathcal{X}}} %\renewcommand{\stateDomain}{\ensuremath{\mathcal{S}}} @@ -2164,7 +2164,7 @@ \chapter{Probabilistic Inference for Learning Multimodal Dynamical Systems \labe It is then tested on a real-world quadcopter data set representing the illustrative example detailed in \cref{illustrative_example}. \section{Problem Statement} -\label{sec:org0659f82} +\label{sec:orgfc9acf4} This work considers learning representations of \emph{unknown} or \emph{partially unknown}, stochastic, multimodal, nonlinear dynamical systems. That is, it seeks to learn a representation of the dynamics from the problem statement in \cref{problem-statement-main}. @@ -2235,7 +2235,7 @@ \section{Problem Statement} \end{align} \section{Preliminaries} -\label{sec:org23773c9} +\label{sec:org294bb83} \acrfull{gps} are the state-of-the-art approach for Bayesian nonparametric regression and they provide a powerful mechanism for encoding expert domain knowledge. They are flexible enough to model arbitrary smooth functions with the simplicity of only requiring @@ -2465,7 +2465,7 @@ \section{Preliminaries} i.e. \(\expertPrior = p\left( \mode{f}(\allInputK) \mid \allInputK, \expertParamsK \right)\). \section{Identifiable Mixtures of Gaussian Process Experts} -\label{sec:orgb111216} +\label{sec:org4eb9ff2} Motivated by improving identifiability and learning latent spaces for control, this work adopts a \acrshort{gp}-based gating network resembling a \acrshort{gp} classification model, similar to that used in the original \acrshort{mogpe} model \citep{trespMixtures2000a}. @@ -2565,7 +2565,7 @@ \section{Identifiable Mixtures of Gaussian Process Experts} where \(\mu_h\) and \(\sigma^2_{h}\) represent the mean and variance of the gating \acrshort{gp} at \(\singleInput\) respectively. \section{Approximate Inference \label{sec-inference}} -\label{sec:org4436114} +\label{sec:org0a56fcb} \epigraph{Nature laughs at the difficulties of integration.}{\textit{Pierre-Simon Laplace}} Performing Bayesian inference involves finding the posterior over the latent variables, \begin{align} \label{eq-} @@ -2772,7 +2772,7 @@ \section{Approximate Inference \label{sec-inference}} be optimised with stochastic gradient methods, whilst still capturing the complex dependencies between the gating network and experts. \subsection{Evidence Lower Bounds} -\label{sec:org9bad51e} +\label{sec:orgc9123bf} Instead of collapsing the inducing variables as seen in \cite{titsiasVariational2009}, they can be explicitly represented as variational distributions, \((\expertsInducingVariational, \gatingsInducingVariational)\) @@ -2895,7 +2895,7 @@ \subsection{Evidence Lower Bounds} The performance of these bounds is evaluated in \cref{sec-mcycle-results}. \subsection{Optimisation} -\label{sec:org1a9c75d} +\label{sec:orgcc3b799} \renewcommand{\expertSampleInd}{\ensuremath{s}} \renewcommand{\ExpertSampleInd}{\ensuremath{S}} \renewcommand{\gatingSampleInd}{\ensuremath{\hat{s}}} @@ -2967,7 +2967,7 @@ \subsection{Optimisation} has complexity \(\mathcal{O}(\NumInducing^2)\). \subsection{Predictions} -\label{sec:org920cfcc} +\label{sec:org3e2d38c} \renewcommand{\testInput}{\ensuremath{\mathbf{X}^*}} \renewcommand{\testOutput}{\ensuremath{\mathbf{y}^*}} \renewcommand{\NumTest}{\ensuremath{\NumData^*}} @@ -3100,7 +3100,7 @@ \subsection{Predictions} \(\gatingVariationalPosteriorBernoulli\) at \(\singleTestInput\). \section{Evaluation of Model and Approximate Inference} -\label{sec:org3db4fc4} +\label{sec:org13c9a41} As a \acrfull{moe} method, our model aims to improve on standard \acrshort{gp} regression with the ability to model non-stationary functions and multimodal distributions over the output variable. With this in mind, the model and approximate inference scheme are evaluated on two data sets. @@ -3111,7 +3111,7 @@ \section{Evaluation of Model and Approximate Inference} Secondly, they are tested on the illustrative example from \cref{illustrative_example}. That is, a data set collected onboard a DJI Tello quadcopter flying in an environment subject to two dynamics modes. \subsection{Experiments} -\label{sec:org3a6e566} +\label{sec:orgd3ef43f} \newcommand{\numTest}{\ensuremath{n}} \newcommand{\NumTest}{\ensuremath{N}} %\newcommand{\testSingleInput}{\ensuremath{\x_{\numTest}}} @@ -3143,7 +3143,7 @@ \subsection{Experiments} Note that all figures in this section show models that were trained on the full data set, i.e. no test/train split. \subsection{Evaluation on Motorcycle Data Set \label{sec-mcycle-results}} -\label{sec:org44f68f9} +\label{sec:orga79ab22} The Motorcycle data set (discussed in \cite{Silverman1985}) contains 133 data points (\(\allInput \in \R^{133 \times 1}\) and \(\allOutput \in \R^{133 \times 1}\)) @@ -3299,7 +3299,7 @@ \subsection{Evaluation on Motorcycle Data Set \label{sec-mcycle-results}} \end{figure} \subsubsection{Two Experts} -\label{sec:org9a3323b} +\label{sec:org97ef5da} The two further lower bounds (\(\furtherBound\) and \(\furtherBoundTwo\)), derived in \cref{sec-inference}, are compared by training each instantiation of the model using the same model and training parameters. @@ -3509,7 +3509,7 @@ \subsubsection{Two Experts} \end{figure} \subsubsection{Three Experts} -\label{sec:org42f5ccd} +\label{sec:org90012fb} The model was then instantiated with three experts \(\ModeInd=3\) and trained following the same procedure as the two experts' experiments. \cref{tab-params-motorcycle-three} shows the initial values for all of the trainable parameters in the model. @@ -3597,7 +3597,7 @@ \subsubsection{Three Experts} \clearpage \subsubsection{Summary} -\label{sec:orgb2efaf2} +\label{sec:orgb23cb14} The tight lower bound \(\tightBound\) and further lower bound \(\furtherBound\) recovered similar results in all experiments. This indicates that \(\furtherBound\) does not loosen the bound to a point where it loses valuable information. @@ -3611,7 +3611,7 @@ \subsubsection{Summary} \newpage \subsection{Evaluation on Velocity Controlled Quadcopter \label{sec-brl-experiment}} -\label{sec:orgcfc274e} +\label{sec:orga96f4c9} As this work is motivated by learning representations of real-world dynamical systems, it was tested on a real-world quadcopter data set following the illustrative example detailed in \cref{illustrative_example}. @@ -3704,7 +3704,7 @@ \subsection{Evaluation on Velocity Controlled Quadcopter \label{sec-brl-experime \end{table} \subsubsection{Results} -\label{sec:org49e6e28} +\label{sec:org056e938} The model was instantiated with two experts, with the goal of each expert learning a separate dynamics mode and the gating network learning a representation of how the underlying dynamics modes vary over the state space. The model was trained using the model and training parameters in \cref{tab-params-quadcopter}. @@ -3817,7 +3817,7 @@ \subsubsection{Results} combined with our gating network and variational inference scheme, is capable of modelling the assignment of observations to experts via the inducing points. \section{Discussion and Future Work} -\label{sec:org494e8a3} +\label{sec:org2d8810a} \textbf{Implicit data assignment} It is worth noting that in contrast to other \acrshort{mogpe} methods, this model does not directly assign observations to experts. @@ -3887,7 +3887,7 @@ \section{Discussion and Future Work} its \acrshort{gps} are used to develop an information-based exploration strategy in \cref{chap-active-learning}. \section{Conclusion} -\label{sec:orgab4257f} +\label{sec:org8fadb28} This chapter has presented a method for learning representations of multimodal dynamical systems using a \acrshort{mogpe} method. Motivated by correctly identifying the underlying dynamics modes and inferring latent structure that can @@ -3926,7 +3926,7 @@ \section{Conclusion} it can successfully learn a factorised representation of a real-world, multimodal, robotic system. \chapter{Mode Remaining Trajectory Optimisation \label{chap-traj-opt-control}} -\label{sec:org0c1f548} +\label{sec:org8827a34} \newcommand{\nominalStateTraj}{\ensuremath{\stateTraj_*}} \newcommand{\nominalControlTraj}{\ensuremath{\controlTraj_*}} \newcommand{\fixedControl}{\ensuremath{\control_{*}}} @@ -4068,7 +4068,7 @@ \chapter{Mode Remaining Trajectory Optimisation \label{chap-traj-opt-control}} An initial version of the \acrfull{ig} method presented in \cref{sec-traj-opt-collocation} is published in \cite{scannellTrajectory2021}. \section{Problem Statement \label{sec-problem-statement}} -\label{sec:orgfa9db56} +\label{sec:org35715f2} The goal of this chapter is to solve the mode remaining navigation problem in \cref{problem-statement-main}. Due to the novelty of this problem, the work in this chapter considers trajectory optimisation algorithms rather than state feedback (closed-loop) controllers. @@ -4162,7 +4162,7 @@ \section{Problem Statement \label{sec-problem-statement}} It is desirable to avoid entering these regions as it may result in the system leaving the desired dynamics mode. \section{Mode Remaining Control via Latent Geometry \label{chap-traj-opt-geometry}} -\label{sec:orgdcb33cf} +\label{sec:org0269e13} This section introduces two different approaches to performing mode remaining trajectory optimisation. They both exploit concepts from Riemannian geometry -- extended to probabilistic manifolds -- to encode mode remaining behaviour. @@ -4175,7 +4175,7 @@ \section{Mode Remaining Control via Latent Geometry \label{chap-traj-opt-geometr with the mode remaining behaviour encoded via a geometric objective function. We name this approach \acrfull{dre}. \subsection{Concepts from Riemannian Geometry \label{sec-geometry-recap}} -\label{sec:org19ede00} +\label{sec:org14e4af7} \begin{figure}[h!] \centering \begin{minipage}[r]{\columnwidth} @@ -4325,7 +4325,7 @@ \subsection{Concepts from Riemannian Geometry \label{sec-geometry-recap}} of length minimising trajectories to probabilistic manifolds. \subsubsection{Probabilistic Geometries \label{sec-prob-geo}} -\label{sec:org37f3d6e} +\label{sec:org1a5528e} \newline Following \cite{tosiMetrics2014} we formulate a metric tensor that captures the variance in the manifold @@ -4421,7 +4421,7 @@ \subsubsection{Probabilistic Geometries \label{sec-prob-geo}} of the dynamics with high \emph{epistemic uncertainty}. \subsubsection{Extension to Sparse Variational Gaussian Processes} -\label{sec:orgcaea62f} +\label{sec:org3859237} The model in Chapter \ref{chap-dynamics} is built upon sparse \acrshort{gp} approximations, so the Jacobian in \cref{eq-predictive-jacobian-dist} must be extended for such approximations. \begin{myquote} @@ -4485,7 +4485,7 @@ \subsubsection{Extension to Sparse Variational Gaussian Processes} \end{myquote} \subsection{\acrfull{ig} \label{sec-traj-opt-collocation}} -\label{sec:orgf602bd4} +\label{sec:orgd215523} This section presents a trajectory optimisation algorithm that exploits the fact that length minimising trajectories on the manifold endowed with the expected metric from \cref{eq-expected-metric}, encodes all of the goals. @@ -4532,7 +4532,7 @@ \subsection{\acrfull{ig} \label{sec-traj-opt-collocation}} This is a boundary value problem (BVP) with a smooth solution so it can be solved using any BVP solver, e.g. (multiple) shooting or collocation methods. \subsubsection{Implicit Trajectory Optimisation} -\label{sec:orge717a28} +\label{sec:orgafc2cb1} Solving the 2\textsuperscript{nd} order \acrshort{ode} in \cref{eq-2ode} with the expected metric from \cref{eq-expected-metric-weighting}, is equivalent to solving our trajectory optimisation problem subject to the same boundary conditions. This resembles an indirect optimal control method as it is based on an observation that the @@ -4654,7 +4654,7 @@ \subsubsection{Implicit Trajectory Optimisation} Ignores much of the stochasticity inherent in the problem. \end{remark} \subsection{\acrfull{dre} \label{sec-traj-opt-energy}} -\label{sec:org9e036be} +\label{sec:org4ab0c8c} This section details a direct optimal control approach which embeds the mode remaining behaviour directly into the \acrfull{soc} problem, via a geometric objective function. In contrast to the previous approach, this method: @@ -4724,7 +4724,7 @@ \subsection{\acrfull{dre} \label{sec-traj-opt-energy}} due to the dependence of metric \(\metricTensor\) on the state. The second stage approximates the calculation of the expected Riemannian energy under normally distributed states. \subsubsection{Approximate Inference for Dynamics Predictions \label{sec-dynamics-predictions}} -\label{sec:orgcffb98f} +\label{sec:org5215bf9} \renewcommand{\singleInput}{\ensuremath{\state_\timeInd, \control_\timeInd}} \renewcommand{\singleInput}{\ensuremath{\hat{\state}_\timeInd}} \renewcommand{\singleOutput}{\ensuremath{\Delta\state_{\timeInd+1}}} @@ -4816,7 +4816,7 @@ \subsubsection{Approximate Inference for Dynamics Predictions \label{sec-dynamic gating functions from \cref{eq-predictive-gating}. \subsubsection{Approximate Riemannian Energy} -\label{sec:org8f5b90b} +\label{sec:orga8f1031} Given this approach for simulating the \acrshort{mosvgpe} dynamics model, the state at each time step is normally distributed. Unlike the terminal and control cost terms in \cref{eq-mode-soc-problem-geometry-cost}, the expected Riemannian energy, @@ -4893,7 +4893,7 @@ \subsubsection{Approximate Riemannian Energy} by constraining the system to be \(\delta-\text{mode remaining}\) (\cref{def-delta-mode-remaining}). \subsubsection{Practical Implementation} -\label{sec:org0dc3690} +\label{sec:orgc1d0f79} An alternative approach to obtain mode remaining behaviour is to optimise subject to the chance constraints in \cref{eq-mode-chance-constraint} alone, i.e. without the Riemannian energy cost term. However, this constrained optimisation is often not able to converge in practice. @@ -4909,7 +4909,7 @@ \subsubsection{Practical Implementation} In most experiments, this strategy was far superior than constraining the optimisation at every iteration. \section{Mode Remaining Control as Probabilistic Inference \label{chap-traj-opt-inference}} -\label{sec:org657c078} +\label{sec:org9e5514b} \newcommand{\startStateDist}{\ensuremath{p(\state_{1})}} \newcommand{\transitionDist}{\ensuremath{p(\state_{\timeInd+1} \mid \state_\timeInd, \control_\timeInd, \modeVar_{\timeInd}=\desiredMode)}} %\newcommand{\trajectoryDist}{\ensuremath{p(\stateControlTraj)}} @@ -4955,7 +4955,7 @@ \section{Mode Remaining Control as Probabilistic Inference \label{chap-traj-opt- \cref{sec-inference-background} recaps the necessary background and related work and \cref{sec-traj-opt-inference} then details the trajectory optimisation algorithm. \subsection{Background and Related Work \label{sec-inference-background}} -\label{sec:orgd93a2ca} +\label{sec:org1b189a2} This section first recaps the \acrfull{cai_unimodal} framework. To formulate optimal control as probabilistic inference it is first embedded into a graphical model (see Figure \ref{fig-basic-control-graphical-model}). @@ -5074,7 +5074,7 @@ \subsection{Background and Related Work \label{sec-inference-background}} \end{figure} \subsubsection{Inference of Sequential Latent Variables} -\label{sec:org4054ca6} +\label{sec:orgda5fdcd} \newline The joint probability for an optimal trajectory (i.e. @@ -5186,7 +5186,7 @@ \subsubsection{Inference of Sequential Latent Variables} \citep{shumwayAPPROACH1982,ghahramaniLearning1999,schonSystem2011}, with input estimation \citep{watsonStochastic2021}. \subsection{Mode Remaining Control as Inference \label{sec-traj-opt-inference}} -\label{sec:org34d7520} +\label{sec:orgeafa5fc} %\renewcommand{\optimalProb}{\ensuremath{\Pr(\optimalVar_\timeInd = 1, \modeVar_\timeInd=\desiredMode \mid \state_\timeInd, \control_\timeInd)}} %\renewcommand{\modeProb}{\ensuremath{\Pr(\modeVar_\timeInd=\desiredMode \mid \state_\timeInd, \control_\timeInd)}} %\renewcommand{\optimalVarTraj}{\ensuremath{\optimalVar_{0:\TimeInd}=1}} @@ -5345,7 +5345,7 @@ \subsection{Mode Remaining Control as Inference \label{sec-traj-opt-inference}} single-step dynamics model, i.e. making long-term predictions. \subsubsection{Approximate Inference for Dynamics Predictions} -\label{sec:org04a3b59} +\label{sec:org8c31d6c} As this method is using a learned representation of the transition dynamics, it suffices to assume that the dynamics are given by the desired mode's learned dynamics. @@ -5383,7 +5383,7 @@ \subsubsection{Approximate Inference for Dynamics Predictions} Note that the state and control at each time step are normally distributed. \subsubsection{Variational Inference for Sequential Latent Variables} -\label{sec:org9ebbdf1} +\label{sec:orgec1be34} Variational inference seeks to optimise \(q(\controlTraj)\) w.r.t. the \acrfull{elbo}. In this setup, the evidence is that \(\optimalVar_{\timeInd}=1\) and \(\modeVar_{\timeInd}=\desiredMode\) for @@ -5475,7 +5475,7 @@ \subsubsection{Variational Inference for Sequential Latent Variables} \end{align} \section{Conclusion} -\label{sec:org73012a3} +\label{sec:orgaf82dee} This chapter has presented three \emph{mode remaining} trajectory optimisation algorithms. The first two have shown how the geometry of the \acrshort{mosvgpe} gating network infers valuable information regarding how a multimodal dynamical system switches between its underlying dynamics modes. @@ -5498,7 +5498,7 @@ \section{Conclusion} perform significantly better than the \acrfull{ig} method. \chapter{Quadcopter Experiments - Mode Remaining Trajectory Optimisation \label{chap-traj-opt-results}} -\label{sec:org36c687a} +\label{sec:org540d902} %\newcommand{\ig}{IG\xspace} %\newcommand{\dre}{DRE\xspace} %\newcommand{\cai}{CaI\xspace} @@ -5563,7 +5563,7 @@ \chapter{Quadcopter Experiments - Mode Remaining Trajectory Optimisation \label{ To aid comparison with the real-world experiments, the layout of the first simulated environment (Environment 1) was kept consistent with the real-world experiments. \section{Real-World Quadcopter Experiments \label{sec-traj-opt-results-brl}} -\label{sec:orgd825b1e} +\label{sec:org401d7d9} The \acrfull{ig} method presented in \cref{sec-traj-opt-collocation} was evaluated using data from the real-world quadcopter navigation problem detailed in \cref{sec-brl-experiment}. However, a different subset of the environment was not observed @@ -5598,7 +5598,7 @@ \section{Real-World Quadcopter Experiments \label{sec-traj-opt-results-brl}} \end{figure} \subsection{Model Learning} -\label{sec:orgffc2d8f} +\label{sec:orgd96a551} The model from \cref{chap-dynamics} was instantiated with \(\ModeInd=2\) experts and trained on the data collected from the velocity controlled quadcopter experiment. Each mode's dynamics \acrshort{gp} used a Squared Exponential kernel with \acrfull{ard} and a constant mean function. @@ -5619,7 +5619,7 @@ \subsection{Model Learning} i.e. where the model is uncertain which mode governs the dynamics. \subsection{Trajectory Optimisation using Indirect Optimal Control via Latent Geodesics} -\label{sec:org8832681} +\label{sec:orgf1ebc3d} The initial (cyan) trajectory in \cref{fig-geometric-traj-opt} was initialised as a straight line with \(10\) collocation points, indicated by the crosses. The collocation solver guarantees that trajectories end at the target state. @@ -5698,12 +5698,12 @@ \subsection{Trajectory Optimisation using Indirect Optimal Control via Latent Ge large omitted any mode remaining behaviour. \section{Simulated Quadcopter Experiments \label{sec-traj-opt-results-simulated}} -\label{sec:orgc016f9f} +\label{sec:org7d50b32} All of the control methods are now tested in two simulated environments so that they can be compared. This section first details the two simulation environments. \subsection{Simulator Setup} -\label{sec:org53a952d} +\label{sec:orgecc07c8} The simulated environments have two dynamics modes, \(\modeVar \in \{1, 2\}\), whose transition dynamics are given by a single integrator discretised using the forward Euler method (velocity times time). @@ -5758,7 +5758,7 @@ \subsection{Simulator Setup} \label{fig-dataset-scenario-both} \end{figure} \subsection{Model Learning} -\label{sec:orgaa23944} +\label{sec:org9ff90db} Following the experiments in \cref{sec-traj-opt-results-brl}, the model from \cref{chap-dynamics} was instantiated with \(\ModeInd=2\) experts, one to represent the desired dynamics mode and one to represent the turbulent dynamics mode. @@ -5820,7 +5820,7 @@ \subsection{Model Learning} \label{fig-traj-opt-gating-network-5} \end{figure} \subsection{Performance Indicators} -\label{sec:org2fe0dc6} +\label{sec:org8c55aaf} Before evaluating the three trajectory optimisation algorithms in the simulated environments, let us restate the goals from \cref{chap-traj-opt-control}: \begin{description} @@ -5908,7 +5908,7 @@ \subsection{Performance Indicators} \end{table} \subsection{Results} -\label{sec:org25d3c54} +\label{sec:org8e6230e} Three settings of the tunable \(\lambda\) parameter were tested for both of the geometry-based methods in each environment. The \acrfull{cai} method from \cref{sec-traj-opt-inference} was tested with @@ -5977,7 +5977,7 @@ \subsection{Results} \end{figure} \subsubsection{Goal 1 - Navigate to the Target State} -\label{sec:orgdd7f43c} +\label{sec:org71055b6} The methods are first evaluated at their ability to navigate to the target state, that is, achieve Goal 1. \textbf{\acrfull{ig}} @@ -6066,7 +6066,7 @@ \subsubsection{Goal 1 - Navigate to the Target State} \end{figure} \subsubsection{Goal 2 - Remain in the Desired Mode} -\label{sec:orgfb64f4a} +\label{sec:orgf3e453d} The experiments are now evaluated at their ability to remain in the desired dynamics mode, that is, their ability to achieve Goal 2. \newline @@ -6171,7 +6171,7 @@ \subsubsection{Goal 2 - Remain in the Desired Mode} In contrast, the clearance resulting from the maximum entropy control term alleviates this issue. \subsubsection{Goal 3 - Avoid Regions of High Epistemic Uncertainty} -\label{sec:org8a35932} +\label{sec:orga84b808} Finally, the methods are evaluated at their ability to avoid regions of the dynamics with high \emph{epistemic uncertainty}. \cref{tab-results-sim-envs} shows the state variance and the gating function variance accumulated over each trajectory. @@ -6304,7 +6304,7 @@ \subsubsection{Goal 3 - Avoid Regions of High Epistemic Uncertainty} method. \section{Conclusion} -\label{sec:orgc3b1991} +\label{sec:org485a27f} This section details the main findings from the experiments, compares the methods and discusses directions for future work. @@ -6369,7 +6369,7 @@ \section{Conclusion} \acrshort{cai} (gauss) methods are competitive approaches for finding mode remaining trajectories. \subsection{Discussion \& Future Work} -\label{sec:orgc929598} +\label{sec:org19a69a2} This section compares the three control methods presented in \cref{sec-traj-opt-collocation,sec-traj-opt-energy,chap-traj-opt-inference} and suggests future work to address their limitations. @@ -6381,15 +6381,15 @@ \subsection{Discussion \& Future Work} \label{tab-geometry-control-comparison} \begin{center} \begin{tabular}{lccc} - & \acrshort{cai} & \acrshort{ig} & \acrshort{dre}\\ + & \acrshort{ig} & \acrshort{dre} & \acrshort{cai}\\ \hline -Dynamics constraints guaranteed? & \(\checkmark\) & \texttimes{} & \(\checkmark\)\\ -Considers \emph{epistemic uncertainty} in dynamics? & \(\checkmark\) & \texttimes{} & \(\checkmark\)\\ +Dynamics constraints guaranteed? & \texttimes{} & \(\checkmark\) & \(\checkmark\)\\ +Considers \emph{epistemic uncertainty} in dynamics? & \texttimes{} & \(\checkmark\) & \(\checkmark\)\\ Considers \emph{epistemic uncertainty} in gating network? & \(\checkmark\) & \(\checkmark\) & \(\checkmark\)\\ -Can remain in \textbf{multiple} modes? & \(\checkmark\) & \texttimes{} & \texttimes{}\\ -Boundary conditions guaranteed? & \texttimes{} & \(\checkmark\) & \texttimes{}\\ -\(\delta-\text{mode remaining}\)? & \(\checkmark\) & \texttimes{} & \(\checkmark\)\\ -Continuous-time trajectory? & \texttimes{} & \(\checkmark\) & \texttimes{}\\ +Can remain in \textbf{multiple} modes? & \texttimes{} & \texttimes{} & \(\checkmark\)\\ +Boundary conditions guaranteed? & \(\checkmark\) & \texttimes{} & \texttimes{}\\ +\(\delta-\text{mode remaining}\)? & \texttimes{} & \(\checkmark\) & \(\checkmark\)\\ +Continuous-time trajectory? & \(\checkmark\) & \texttimes{} & \texttimes{}\\ \end{tabular} \end{center} \end{table} @@ -6527,7 +6527,7 @@ \subsection{Discussion \& Future Work} multimodal dynamical systems, whilst attempting to remain in the desired dynamics mode. \subsection{Summary} -\label{sec:org962dda2} +\label{sec:orgb79c762} This chapter has evaluated and compared the \acrfull{ig} method from \cref{sec-traj-opt-collocation}, the \acrfull{dre} method from \cref{sec-traj-opt-energy} and the \acrfull{cai} method from \cref{chap-traj-opt-inference}. @@ -6552,7 +6552,7 @@ \subsection{Summary} the mode chance constraints. \chapter{Mode Remaining Exploration for Model-Based Reinforcement Learning \label{chap-active-learning}} -\label{sec:org4478ccc} +\label{sec:org15ac2cd} \epigraph{Real knowledge is to know the extent of one’s ignorance.”}{\textit{Confucius (Philosopher, 551–479BC).}} %\renewcommand{\acrshort{modeopt}}{ModeOpt\xspace} \renewcommand{\explorativeController}{\ensuremath{\pi_{\text{explore}}}} @@ -6613,7 +6613,7 @@ \chapter{Mode Remaining Exploration for Model-Based Reinforcement Learning \labe quadcopter environment and \cref{sec-future-work-explore} discusses \acrshort{modeopt} and details directions for future work. \section{Problem Statement \label{problem-statement-explore}} -\label{sec:org29c86e4} +\label{sec:orgd22d966} Similarly to \cref{chap-traj-opt-control}, the goal of this chapter is to solve the mode remaining navigation problem in \cref{problem-statement-main}. That is, to navigate from an initial state \(\state_0\) -- in the desired dynamics mode \(\desiredMode\) @@ -6716,7 +6716,7 @@ \section{Problem Statement \label{problem-statement-explore}} the model having high \emph{epistemic uncertainty}. \section{Mode Optimisation \label{sec-mode-optimisation}} -\label{sec:orge20ccb5} +\label{sec:org63ef8f3} This section details our method for solving the mode remaining navigation problem by consolidating all of the work in this thesis. The method is named \acrfull{modeopt}. At its core, \acrshort{modeopt} learns a single-step dynamics model \(\dynamicsModel\) using the \acrshort{mosvgpe} @@ -6771,7 +6771,7 @@ \section{Mode Optimisation \label{sec-mode-optimisation}} desired dynamics mode with high probability, i.e. satisfy \cref{eq-mode-remaining-def-explore}. \subsection{Mode Remaining Exploration \label{sec-exploration}} -\label{sec:org5a6b94f} +\label{sec:orged4228f} The performance of \acrshort{modeopt} depends on its ability to explore the environment. The exploration strategy is primarily interested in exploring a single desired dynamics mode whilst avoiding entering any of the other modes. @@ -6892,7 +6892,7 @@ \subsection{Mode Remaining Exploration \label{sec-exploration}} \(q(\desiredGatingFunction(\gatingInducingInput)) = \mathcal{N}\left( \desiredGatingFunction(\gatingInducingInput \mid \hat{\mathbf{m}}_{\desiredMode}, \hat{\mathbf{S}}_{\desiredMode} \right)\). \subsection{Mode Remaining Model-based Reinforcement Learning \label{sec-modeopt}} -\label{sec:orgae41dcb} +\label{sec:org35dc382} This section details how this exploration strategy is embedded into a \acrshort{mbrl} loop to solve the mode remaining navigation problem in \cref{eq-mode-explore-problem}. The algorithm is named \acrshort{modeopt} and is detailed in \cref{alg-mode-opt}. @@ -6953,7 +6953,7 @@ \subsection{Mode Remaining Model-based Reinforcement Learning \label{sec-modeopt Extending the method in \cref{sec-exploration} to feedback controllers is left for future work. \section{Preliminary Results \label{sec-preliminary-results}} -\label{sec:org34920e8} +\label{sec:org6b2f5e1} This section presents initial results solving the illustrative quadcopter navigation problem in \cref{illustrative_example} using the exploration strategy from \cref{sec-exploration}. @@ -6971,7 +6971,7 @@ \section{Preliminary Results \label{sec-preliminary-results}} Intuitively, \acrshort{modeopt} seeks to expand the \(1-\delta\) contour until the target state \(\targetState\) lies within the contour. \subsection{Experiment Configuration} -\label{sec:orge543e16} +\label{sec:org4f8f2a0} This section details how the experiments were configured. \textbf{Initial data set \(\dataset_0\)} @@ -7001,7 +7001,7 @@ \subsection{Experiment Configuration} \end{align} \subsection{Comparison of Exploration Terms} -\label{sec:org116c083} +\label{sec:org453cc0d} This section evaluates the different terms in the explorative controller's objective from \cref{eq-explorative-traj-opt}. In particular, it motivates why the entropy-based term was combined with the state difference term. It then shows why the joint gating entropy over a trajectory was used, @@ -7100,7 +7100,7 @@ \subsection{Comparison of Exploration Terms} These results confirm that it is important to consider the information gain over the entire trajectory. \subsection{Exploration in Environment 1} -\label{sec:org6dc45e7} +\label{sec:orgf954a1f} This section presents preliminary results of \acrshort{modeopt}'s exploration strategy. The results show the strategy successfully exploring the simulated Environment 1 from \cref{sec-traj-opt-results-simulated}. The results presented here show how the desired mode's mixing probability and the gating function's variance @@ -7344,7 +7344,7 @@ \subsection{Exploration in Environment 1} However, further analysis is left for future work. \section{Discussion \& Future Work \label{sec-future-work-explore}} -\label{sec:orgebc679c} +\label{sec:org9b0792c} This section discusses \acrshort{modeopt} and proposes some directions for future work. \newline @@ -7525,7 +7525,7 @@ \section{Discussion \& Future Work \label{sec-future-work-explore}} environment, but instead, is inferred by the probabilistic dynamics model. \section{Conclusion} -\label{sec:orga9fdbf7} +\label{sec:orgb92787c} This chapter has presented a novel strategy for exploring multimodal dynamical systems whilst remaining in the desired dynamics mode with high probability. Moreover, it has proposed how this exploration strategy can be combined with the dynamics model from \cref{chap-dynamics}, @@ -7544,7 +7544,7 @@ \section{Conclusion} Further testing and analysis of \acrshort{modeopt} is left for future work. \chapter{Conclusion} -\label{sec:orgc7ec965} +\label{sec:orgab27ac5} The main objective of this thesis was to solve the mode remaining navigation problem in \cref{eq-main-problem}. That is, to control a multimodal dynamical system -- where neither the underlying dynamics modes, nor how the system switches between them, are \emph{known a priori} -- @@ -7596,7 +7596,7 @@ \chapter{Conclusion} Evaluating its performance on real-world problems is left for future work. \section{Future Work} -\label{sec:org00b28fd} +\label{sec:org0bd7107} There are many promising directions for future work. Some are extensions of the work in this thesis, whilst others are alternative approaches to solving the mode remaining navigation problem in \cref{eq-main-problem}.