Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ipddp #249

Open
wants to merge 82 commits into
base: main
Choose a base branch
from
Open

Ipddp #249

wants to merge 82 commits into from

Conversation

wang-chen
Copy link
Member

@wang-chen wang-chen commented Jun 4, 2023

ipddp by @ntu-caokun

New features:

two base classes: module.Cost and module.Constraint; their derivatives: module.QuadCost and module.LinCon. (In addition to module.System, stagewise cost and constraint are frequently used in optimal control problem setting.)
interior-point differential dynamic programming solver, and its differentiable version. (A variant of traditional DDP/iLQR solver, which can deal with stagewise constraint.)

Kun and others added 30 commits November 12, 2022 16:21
correct literally to the ipddp matlab code; can reach optimality, but still need to verify its correctness using more systems
pytorch use float32 by default, so given 0.02, it will return something like 0.199999953
strange bug in fp.filter, line 418, numerical stability if relax a bit
Encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

difficult to debug
should be constructed only on the last iteration.
use no_grad() for previous iterations;

todo: vmap bug
two strange bug fix:
1. disable 'vectorize', otherwise, vmap error; did not use batch, still get this error;
2. disable sin function; sinbackward error
 torch.set_default_dtype(torch.float64)  important for solving, seems related to the tol set in the inequality.

add plot file
seems that the coefficient of last LQR iteration should be detached, as it is fixed values computed from the solved trajectory
1. Remove file docs/requirements.txt
2. change requirements/dev.txt back to latest one
3. Remove pypose/module/minimal_test.py
4. Remove all codes below if __name__ == "__main__": in all pypose/module/ files, and move them to test folder.
5. Use "pyramid style" for all imports.
@ntu-caokun
Copy link

auto-tests has passed, plz review @aerogjy

@aerogjy aerogjy self-requested a review July 12, 2023 18:59
Copy link
Contributor

@aerogjy aerogjy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can successfully run all the examples and tests. Several major issues: 1. The API design should be consistent with lqr or mpc. 2. Infeasible start traj is missing in the docs. 3. The constraints type definition is a bit unclear. 4. Not sure the current organization of cost and constraints in the modules are proper @wang-chen 5. Some of the code in the core ipddp class is overwhelming, need to clean up, and try to avoid long variable discussion. (See the detailed comments inline the code.)

>>> traj_opt[batch_id] = solver.optimizer()

'''
def __init__(self, sys=None, stage_cost=None, terminal_cost=None, cons=None, n_cons=0, init_traj=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The API design should be similar to lqr and mpc, which consists of the system, time horizon and the cost related terms. Here, I think the cons and n_cons could be somehow merged together? Also the init_traj shouldn't be necessary.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted n_cons;
time_horizon maybe not be necessary; if time-varying LQR is considered, then Q matrix should be of shape [ns,ns,T], T is included in Q;
in my case, init_traj seems to be necessary, as I need it to extract ns, nc, and initialize many matrices; while lqr initializes these matrices in lqr_backward, which is not suitable for our case where backward will be called many times.

#-----------------------------------------------
return self

def forward(self, fp_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of confused about solver and forward. It seems in the example, forward is used. In the test case, solver is used. I suggest we keep the consistency with the lqr and mpc with a forward would be great. Also, try to keep similar arg list of the forward function.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test, it only cares about solving the traj, and does not create a computational graph for learning; in the training example, it first calls solver() to solve the trajectory, then use forward to create the computational graph for autodiff.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can find a way to unified them? Not because we are doing testing so we have to write a separate function. How about we keep one of them, and make a flag/switch so that you can turn on/off the graph computation part? So that we always use the forward, no solver() anymore?


The IPDDP process can be summarised as iterative backward and forward recursions.

- The backward recursion.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems two scenarios are implemented with one starts from a feasible trajectory and the other from an infeasible trajectory. But only one scenario's doc is provided? That's why there is no updated related to the slack variable y, although it's defined.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed this.

ns, nc, ncons, self.T = self.x.size(-1), self.u.size(-1), n_cons, self.u.size(-2)

# algorithm parameter
self.mu, self.maxiter, self.tol, self.infeas = 1.0, 50, torch.tensor([1.0e-7], dtype=self.x.dtype, device=self.x.device), False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For self.infeas, didn't see anywhere check the start trajectory and set it to true or false.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

# Potential performance loss here - self.A and self.B involves jacobian eval
return self._ref_c - pp.bmv(self.gx, self._ref_state) - pp.bmv(self.gu, self._ref_input)

class LinCon(Constraint):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this constraint inequality constraint? Didn't see anywhere in the doc defines that and whether it's >0, or <0, or >=0, or <=0.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to the docs in constraint.py

[-1., 0.],
[-2.5, 1.]],
device=device)
state = torch.tensor([[-2.,0.],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why define state twice?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed it.

init_traj_sample = {'state': init_traj['state'][batch_id:batch_id+1],
'input': init_traj['input'][batch_id:batch_id+1]}
ipddp = IPDDP(sys, stage_cost, terminal_cost, lincon, gx.shape[-2], init_traj_sample)
traj_opt[batch_id] = ipddp.solver()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, used solver, but in the training example, used forward.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the test, it only cares about solving the traj, and does not create a computational graph for learning; in the training example, it first calls solver() to solve the trajectory, then use forward to create the computational graph for autodiff.


# -------- derivatives --------------------------
# terms related with system dynamics
self.fx = torch.zeros(B + (self.T, ns, ns), dtype=self.x.dtype, device=self.x.device)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 208 to 237 are a bit overwhelm. I think there should be a better way without defining all of the variables at the beginning. Maybe @wang-chen can have some suggestion.

self.opterr, self.reg, self.bp_failed, self.recovery = 0., 0., False, 0

def getDerivatives(self):
self.p_fn.set_refpoint(self.x[...,-1,:], self.u[...,-1,:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. Can we just directly call the derivative when you are using them? Since you've already defined separate class for the cost and constraints.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that will make the backwardpasscompact() overwhelm. Notice that in LQR, the cost-related terms are passed in instead of being computed inside the loop, and there are no second-order dynamics terms and constraint terms, which makes the LQR code much more elegant.

c_err, r_err, qu_err = torch.zeros(B, dtype=self.x.dtype, device=self.x.device), torch.zeros(B, dtype=self.x.dtype, device=self.x.device), torch.zeros(B, dtype=self.x.dtype, device=self.x.device)

# set regularization parameter
if (self.fp_failed or self.bp_failed):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not recommend. @wang-chen how to avoid those long parameter discussion?

@ntu-caokun
Copy link

fixed issues 2, 3;
issue 1, 4, 5 needs discussion

@ntu-caokun
Copy link

Please use the da9aaf6 to review, as the latest merge cannot pass the tests;
In new commits: as per discussion, I have unified the init function interface with MPC, while still adding the B argument, for batch size;
tidy up the code.

@aerogjy aerogjy self-requested a review September 11, 2023 02:58
Copy link
Contributor

@aerogjy aerogjy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing most of my comments last time. I made a few minor comments on the doc and some other things. 1. I guess we could try to unify forward and solver(); 2. LTI LTV can use system matrices to find the system dimension 3. Put InvPend in the example as a general example for the broad users. I will hand it over to @wang-chen after this.

stage_cost (:obj:`instance`): Stage cost of the optimal control problem.
terminal_cost (:obj:`instance`): Terminal cost of the optimal control problem.
cons (:obj:`instance`): Constraints of the optimal control problem.
n_cons (:obj:`int`): Dimension of constraints.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be updated

filter, line search in the forwardpass.

From the learning perspective, this can be interpreted as a module with unknown parameters in
:math:`\begin{Bmatrix} \mathbf{f}, \mathbf{c}, q, p \end{Bmatrix}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still have q, p in this general constraint?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q, p here denote the stage and terminal costs, respectively.

which can be integrated into a larger end-to-end learning system.

Note:
The implementation is based on paper `Interior Point Differential Dynamic Programming
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Le't try to add the full citation, with authors and journal name, etc. :)

#-----------------------------------------------
return self

def forward(self, fp_list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can find a way to unified them? Not because we are doing testing so we have to write a separate function. How about we keep one of them, and make a flag/switch so that you can turn on/off the graph computation part? So that we always use the forward, no solver() anymore?

Linear/linearized constraint term on state

.. math::
\mathbf{g}_{\mathbf_{x}} = \left. \frac{\partial \mathbf{g}}{\partial \mathbf{x}} \right|_{\chi^*}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gx and gu cannot show up correctly in the compiled document

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged solver and forward;
previously solver is done for each sample, and then collect them together in batch and do forward to get computational graph;

now each sample use their own instance, i.e., do their solve and computational graph separately.

i, traj_loss.item(), model_loss.item()))

print( args.save)
os.system('python .\plot.py "{}" &'.format(args.save))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line always pop up error for not finding the plot.py. I suggest fixing the file path.

def __init__(self, A, B, C, D, c1=None, c2=None):
super().__init__()

def __init__(self, A, B, C, D, c1=None, c2=None, xdim = None, udim = None, ydim = None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought in LTI, LTV, we can find the system dimension from the ABCD matrices?

import pickle as pkl
import torch.optim as optim
from pypose.module.ipddp import IPDDP
from tests.module.test_ipddp import InvPend
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make a InvPend as a dynamic example in the pypose/examples, rather than in the test_ipddp? That will make our code more general and modular to be used by other users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants