Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AdaptiveVector isNonZero is broken when sparseval == Monoid[T].zero #583

Open
sritchie-stripe opened this issue Nov 28, 2016 · 7 comments
Open

Comments

@sritchie-stripe
Copy link
Contributor

Here's an example of the breakage:

import com.twitter.algebird._
val a = SparseVector(Map.empty, 0, 5)
val b = AdaptiveVector.fromVector(Vector(1), 0)
Monoid.zeroEquiv[AdaptiveVector[Int]].equiv(Monoid.plus(a, b), b)
// => false

Monoid.plus returns SparseVector(1, 0, 0, 0, 0), extending the first vector. We want to be able to add zero-only vectors of any length and treat them as identity.

@sritchie-stripe sritchie-stripe changed the title AdaptiveVector isNonZero is broken AdaptiveVector isNonZero is broken when sparseval == Monoid[T].zero Nov 28, 2016
@sritchie-stripe
Copy link
Contributor Author

sritchie-stripe commented Nov 28, 2016

This is only a problem when the sparse value is the same as the zero of the underlying monoid. Once this is fixed, we'll want to remove the custom equiv implicits from

  • property("AdaptiveVector[String] has a monoid")
  • property("AdaptiveVector[Int] has a monoid")

@johnynek
Copy link
Collaborator

this is still flakey:

AdaptiveVector[Int] has a monoid

@mio-stripe
Copy link

mio-stripe commented Feb 23, 2017

Hi all, two questions

  1. Could we add two more cases to AVSemigroup.plus to check if either argument isZero
def plus(left: AdaptiveVector[V], right: AdaptiveVector[V]) = {
...
  (left, right) match {
    case _ if isZero(left) => right
    case _ if isZero(right) => left
    case (DenseVector(lv, ls, ld), DenseVector(rv, rs, rd)) => ...
}

This bug is happening because the size of the result is getting set to the larger of the two arguments which is the right thing to do as long as neither argument isZero.

  1. While we're at it, could we simplify AVMonoid.isZero to
def isZero(v: AdaptiveVector[V]) = (v.size == 0) || v.denseIterator.isEmpty

@mio-stripe
Copy link

mio-stripe commented Feb 23, 2017

Whoops! Scratch (2), I get it now: isZero should return true on val c = SparseVector(Map(0 -> 0, 1 -> 0, 2 -> 0), 2, 3) where the sparseValue is not Monoid.zero, but all of its values are Monoid.zero.

Okay, so I maintain that isZero and isNonzero are working correctly; the problem is that plus does not make sure that neither of its arguments is zero.

@mio-stripe
Copy link

mio-stripe commented Apr 4, 2017

Now that I've poked around, a couple qs about what the semantics should be.

  1. Is denseEquiv--having the same non-sparse values at the same indices, but being of possibly different lengths--the right notion of equivalence? This makes sense algebraically so I argue: yes. If you are doing machine learning, though, different-length vectors mean different things--they have different features--but given that we already allow adding vectors of different lengths here, if the user doesn't want vectors of different lengths to interact, they should make sure they don't in advance.

  2. Currently, for sparseValue != monoid.zero, plus adds sparse values

val v = AdaptiveVector.fill[Int](10)(2)
// => v: com.twitter.algebird.AdaptiveVector[Int] = SparseVector(2, 2, 2, 2, 2, 2, 2, 2, 2, 2)
val twoV = Semigroup.plus(v,v)
// => twoV: com.twitter.algebird.AdaptiveVector[Int] = DenseVector(4, 4, 4, 4, 4, 4, 4, 4, 4, 4)

I think we should only be adding non-sparse values so this looks like a bug to me, but want to make sure this is not, in fact, desired behavior.

@johnynek
Copy link
Collaborator

johnynek commented Apr 4, 2017

I think the only use of AdaptiveVector is for sketchmap. I think we never change the size of a densevector.

I'm happy adding some better constraints and saying that two densevectors are only equal if they have the same length. I think we should do the more complex thing for sparse vs dense.

@mio-stripe
Copy link

Same length for dense, not-necessarily the same length for sparse makes perfect sense so I'll do that and address the adding sparse values issue, too.

Okay, I see, AdaptiveVector is used in AdaptiveMatrix (and related things), which then gets used in SketchMap and that's about it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants