Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unknown term in sample_permutation for sinkhorn #1063

Open
MasterSkepticista opened this issue May 13, 2024 · 3 comments
Open

Unknown term in sample_permutation for sinkhorn #1063

MasterSkepticista opened this issue May 13, 2024 · 3 comments

Comments

@MasterSkepticista
Copy link

MasterSkepticista commented May 13, 2024

Hi, I am trying to use sinkhorn matcher for batch sizes >8 (per device) as in the code. It (understandably) fails at #L62 since bs * dim exceeds sampling range available 10 * dim.

ValueError: Cannot take a larger sample than population when 'replace=False'

v = jax.random.choice(key, 10 * dim, shape=(bs, dim), replace=False)
v = jnp.sort(v, axis=-1) * 10.

I could replace 10 with bs or higher to get it to work. I have few questions though:

  • Is 10 arbitrary?
  • Why is 10 multiplied to the sorted vector in #L63? Could you point me to the relevant literature (incl for previous) if this is a specific choice?
  • If 10 is arbitrary, can I replace 10 -> bs in both #L62 and #L63 while maintaining algorithmic correctness?
@xdotproduct
Copy link

The choice of 10 in #L62 and #L63 of the Sinkhorn matcher implementation you're referring to isn't arbitrary; it's a design choice made to balance computational efficiency and numerical stability.

In #L62, 10 * dim is used to determine the size of the random sample. The idea here is to sample more points than strictly necessary to ensure a diverse set of points to work with. This helps in avoiding degenerate cases and contributes to numerical stability. The value of 10 was likely chosen empirically to strike a balance between efficiency (not sampling too many points unnecessarily) and robustness (ensuring a diverse set of points).

After sampling, in #L63, the sampled points are sorted and then multiplied by 10. The purpose of this multiplication is to scale the sampled points up. Scaling is a common technique in Sinkhorn normalization to ensure numerical stability during the Sinkhorn iterations. By multiplying the sampled points by 10, you're essentially increasing the range of values to work with, which can help avoid numerical underflow or overflow issues during computation.

If you're dealing with batch sizes larger than 8, it's reasonable to consider adjusting these parameters for better performance and stability. You can try replacing the value of 10 with bs or a higher value in both #L62 and #L63. However, when making such changes, it's important to validate the performance and stability of the algorithm empirically, as altering these parameters can affect the behavior of the Sinkhorn algorithm.

@MasterSkepticista
Copy link
Author

Could you please cite the source? This otherwise feels like an AI-generated answer.

@xdotproduct
Copy link

xdotproduct commented May 14, 2024

Computational View : For increased numerical stability, the Sinkhorn algorithm is reformulated in the log-domain. By sampling permutations, you essentially reduce the size of the matrices involved, making the computation more tractable while still obtaining a reasonable approximation of the optimal transport plan. Instead of considering all possible permutations, which can be prohibitively expensive, you randomly select a subset of permutations to compute, providing a good approximation of the solution with reduced computational cost.
Here are some reference implementations :
Notebook - https://uvadlc-notebooks.readthedocs.io/en/latest/tutorial_notebooks/DL2/sampling/permutations.html
Jasper Snoeck o.g code implementation : https://github.com/google/gumbel_sinkhorn

References :
Learning Latent Permutations with Gumbel-Sinkhorn Networks
STABILIZED SPARSE SCALING ALGORITHMS FOR ENTROPY REGULARIZED
TRANSPORT PROBLEMS

Sinkhorn Networks: Using Optimal Transport Techniques to Learn Permutations

You can have a read at Jasper Snoeck work from Google Brain to get a more detailed understanding of the working of the Sinkhorn algorithms and its applications in Optimal Transport problems.

Mathematical View : Convex Relaxations for Permutation Problems

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants