-
-
Notifications
You must be signed in to change notification settings - Fork 102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ipddp #249
base: main
Are you sure you want to change the base?
Conversation
…ension unification
correct literally to the ipddp matlab code; can reach optimality, but still need to verify its correctness using more systems
pytorch use float32 by default, so given 0.02, it will return something like 0.199999953
strange bug in fp.filter, line 418, numerical stability if relax a bit
Encounter the RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation difficult to debug
should be constructed only on the last iteration.
use no_grad() for previous iterations; todo: vmap bug
two strange bug fix: 1. disable 'vectorize', otherwise, vmap error; did not use batch, still get this error; 2. disable sin function; sinbackward error
find the explanation of sin function here, https://blog.csdn.net/weixin_39679367/article/details/122754199 fixed it.
torch.set_default_dtype(torch.float64) important for solving, seems related to the tol set in the inequality. add plot file
some tuning
seems that the coefficient of last LQR iteration should be detached, as it is fixed values computed from the solved trajectory
1. Remove file docs/requirements.txt 2. change requirements/dev.txt back to latest one 3. Remove pypose/module/minimal_test.py 4. Remove all codes below if __name__ == "__main__": in all pypose/module/ files, and move them to test folder. 5. Use "pyramid style" for all imports.
address some conflicts in init file, dynamics; also upgrade env to torch2.0
…train_ipddp_cartpole; import InvPend from tests
auto-tests has passed, plz review @aerogjy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can successfully run all the examples and tests. Several major issues: 1. The API design should be consistent with lqr or mpc. 2. Infeasible start traj is missing in the docs. 3. The constraints type definition is a bit unclear. 4. Not sure the current organization of cost and constraints in the modules are proper @wang-chen 5. Some of the code in the core ipddp class is overwhelming, need to clean up, and try to avoid long variable discussion. (See the detailed comments inline the code.)
pypose/module/ipddp.py
Outdated
>>> traj_opt[batch_id] = solver.optimizer() | ||
|
||
''' | ||
def __init__(self, sys=None, stage_cost=None, terminal_cost=None, cons=None, n_cons=0, init_traj=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The API design should be similar to lqr and mpc, which consists of the system, time horizon and the cost related terms. Here, I think the cons and n_cons could be somehow merged together? Also the init_traj shouldn't be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted n_cons;
time_horizon maybe not be necessary; if time-varying LQR is considered, then Q matrix should be of shape [ns,ns,T], T is included in Q;
in my case, init_traj seems to be necessary, as I need it to extract ns, nc, and initialize many matrices; while lqr initializes these matrices in lqr_backward, which is not suitable for our case where backward will be called many times.
pypose/module/ipddp.py
Outdated
#----------------------------------------------- | ||
return self | ||
|
||
def forward(self, fp_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kind of confused about solver and forward. It seems in the example, forward is used. In the test case, solver is used. I suggest we keep the consistency with the lqr and mpc with a forward would be great. Also, try to keep similar arg list of the forward function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test, it only cares about solving the traj, and does not create a computational graph for learning; in the training example, it first calls solver() to solve the trajectory, then use forward to create the computational graph for autodiff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can find a way to unified them? Not because we are doing testing so we have to write a separate function. How about we keep one of them, and make a flag/switch so that you can turn on/off the graph computation part? So that we always use the forward, no solver() anymore?
|
||
The IPDDP process can be summarised as iterative backward and forward recursions. | ||
|
||
- The backward recursion. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems two scenarios are implemented with one starts from a feasible trajectory and the other from an infeasible trajectory. But only one scenario's doc is provided? That's why there is no updated related to the slack variable y, although it's defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed this.
pypose/module/ipddp.py
Outdated
ns, nc, ncons, self.T = self.x.size(-1), self.u.size(-1), n_cons, self.u.size(-2) | ||
|
||
# algorithm parameter | ||
self.mu, self.maxiter, self.tol, self.infeas = 1.0, 50, torch.tensor([1.0e-7], dtype=self.x.dtype, device=self.x.device), False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For self.infeas, didn't see anywhere check the start trajectory and set it to true or false.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
# Potential performance loss here - self.A and self.B involves jacobian eval | ||
return self._ref_c - pp.bmv(self.gx, self._ref_state) - pp.bmv(self.gu, self._ref_input) | ||
|
||
class LinCon(Constraint): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this constraint inequality constraint? Didn't see anywhere in the doc defines that and whether it's >0, or <0, or >=0, or <=0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to the docs in constraint.py
[-1., 0.], | ||
[-2.5, 1.]], | ||
device=device) | ||
state = torch.tensor([[-2.,0.], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why define state twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed it.
tests/module/test_ipddp.py
Outdated
init_traj_sample = {'state': init_traj['state'][batch_id:batch_id+1], | ||
'input': init_traj['input'][batch_id:batch_id+1]} | ||
ipddp = IPDDP(sys, stage_cost, terminal_cost, lincon, gx.shape[-2], init_traj_sample) | ||
traj_opt[batch_id] = ipddp.solver() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, used solver, but in the training example, used forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the test, it only cares about solving the traj, and does not create a computational graph for learning; in the training example, it first calls solver() to solve the trajectory, then use forward to create the computational graph for autodiff.
pypose/module/ipddp.py
Outdated
|
||
# -------- derivatives -------------------------- | ||
# terms related with system dynamics | ||
self.fx = torch.zeros(B + (self.T, ns, ns), dtype=self.x.dtype, device=self.x.device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 208 to 237 are a bit overwhelm. I think there should be a better way without defining all of the variables at the beginning. Maybe @wang-chen can have some suggestion.
self.opterr, self.reg, self.bp_failed, self.recovery = 0., 0., False, 0 | ||
|
||
def getDerivatives(self): | ||
self.p_fn.set_refpoint(self.x[...,-1,:], self.u[...,-1,:]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here. Can we just directly call the derivative when you are using them? Since you've already defined separate class for the cost and constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that will make the backwardpasscompact() overwhelm. Notice that in LQR, the cost-related terms are passed in instead of being computed inside the loop, and there are no second-order dynamics terms and constraint terms, which makes the LQR code much more elegant.
c_err, r_err, qu_err = torch.zeros(B, dtype=self.x.dtype, device=self.x.device), torch.zeros(B, dtype=self.x.dtype, device=self.x.device), torch.zeros(B, dtype=self.x.dtype, device=self.x.device) | ||
|
||
# set regularization parameter | ||
if (self.fp_failed or self.bp_failed): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not recommend. @wang-chen how to avoid those long parameter discussion?
fixed issues 2, 3; |
Please use the da9aaf6 to review, as the latest merge cannot pass the tests; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing most of my comments last time. I made a few minor comments on the doc and some other things. 1. I guess we could try to unify forward and solver(); 2. LTI LTV can use system matrices to find the system dimension 3. Put InvPend in the example as a general example for the broad users. I will hand it over to @wang-chen after this.
pypose/module/ipddp.py
Outdated
stage_cost (:obj:`instance`): Stage cost of the optimal control problem. | ||
terminal_cost (:obj:`instance`): Terminal cost of the optimal control problem. | ||
cons (:obj:`instance`): Constraints of the optimal control problem. | ||
n_cons (:obj:`int`): Dimension of constraints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to be updated
filter, line search in the forwardpass. | ||
|
||
From the learning perspective, this can be interpreted as a module with unknown parameters in | ||
:math:`\begin{Bmatrix} \mathbf{f}, \mathbf{c}, q, p \end{Bmatrix}`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we still have q, p in this general constraint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q, p here denote the stage and terminal costs, respectively.
pypose/module/ipddp.py
Outdated
which can be integrated into a larger end-to-end learning system. | ||
|
||
Note: | ||
The implementation is based on paper `Interior Point Differential Dynamic Programming |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Le't try to add the full citation, with authors and journal name, etc. :)
pypose/module/ipddp.py
Outdated
#----------------------------------------------- | ||
return self | ||
|
||
def forward(self, fp_list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can find a way to unified them? Not because we are doing testing so we have to write a separate function. How about we keep one of them, and make a flag/switch so that you can turn on/off the graph computation part? So that we always use the forward, no solver() anymore?
pypose/module/constraint.py
Outdated
Linear/linearized constraint term on state | ||
|
||
.. math:: | ||
\mathbf{g}_{\mathbf_{x}} = \left. \frac{\partial \mathbf{g}}{\partial \mathbf{x}} \right|_{\chi^*} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gx and gu cannot show up correctly in the compiled document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merged solver and forward;
previously solver is done for each sample, and then collect them together in batch and do forward to get computational graph;
now each sample use their own instance, i.e., do their solve and computational graph separately.
i, traj_loss.item(), model_loss.item())) | ||
|
||
print( args.save) | ||
os.system('python .\plot.py "{}" &'.format(args.save)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line always pop up error for not finding the plot.py. I suggest fixing the file path.
pypose/module/dynamics.py
Outdated
def __init__(self, A, B, C, D, c1=None, c2=None): | ||
super().__init__() | ||
|
||
def __init__(self, A, B, C, D, c1=None, c2=None, xdim = None, udim = None, ydim = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought in LTI, LTV, we can find the system dimension from the ABCD matrices?
import pickle as pkl | ||
import torch.optim as optim | ||
from pypose.module.ipddp import IPDDP | ||
from tests.module.test_ipddp import InvPend |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we make a InvPend as a dynamic example in the pypose/examples, rather than in the test_ipddp? That will make our code more general and modular to be used by other users.
ipddp by @ntu-caokun
New features:
two base classes: module.Cost and module.Constraint; their derivatives: module.QuadCost and module.LinCon. (In addition to module.System, stagewise cost and constraint are frequently used in optimal control problem setting.)
interior-point differential dynamic programming solver, and its differentiable version. (A variant of traditional DDP/iLQR solver, which can deal with stagewise constraint.)