Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve environment performance #8846

Merged
merged 22 commits into from May 16, 2024

Conversation

kyri-petrou
Copy link
Contributor

@kyri-petrou kyri-petrou commented May 12, 2024

/claim #8828
/closes #8828

@jdegoes Leaving this as draft I would like some feedback on this approach, until I finish the following:

  • Figure out what to do with MiMa. Currently it complains about a synthetic method missing
  • Add tests for the UpdateOrderLinkedMap

Main changes

This PR pretty much rewrites the underlying implemention of ZEnvironment to be optimized in the following 2 ways:

  1. Avoid cache busting when adding services that are commonly added to the ZEnvironment. Currently this is only for Scope, but we could potentially add more in the future.
  2. Reimplement ZEnvironment using a map which preserves ordering based on when an entry was last modified. Besides improved performance, this also allows us to remove all the code that was previously needed to keep track of update ordering

Why are we caching?

If you're familiar with the internals of the ZEnvironment, you can skip to the next section

The ZEnvironment is designed to use the Tag of an object as the key to the map. When we add a new service to the environment, we simply add it using the tag as the key to the map. This will replace the existing service if it exists in the map. However, since the environment is covariant, we may come across this issue:

sealed trait Service
case class Foo(value: String) extends Service

val fooAsService: Service = Foo("foo1")
val fooAsFoo: Foo = Foo("foo2")
val env = ZEnvironment.empty.add(fooAsService).add(fooAsFoo)

env.get[Service] // This must equal `fooAsFoo`!

In the example above, our env now contains 2 entries, one with the Service as key and another Foo as the key. A naive approach while accessing the Service afterwards would be to just use Service as the key, but this would return us the wrong value (fooAsService). This is wrong because the ZEnvironment is Covariant, which means the most up-to-date value for Service is at the Foo key!

And because of this comes the need to iterate through all of the services in the map, find the ones that are subtypes of Service and return the most recently modified one. This is very computationally expensive, and that's where the cache comes in; for tags that we've previously seen and got a match for, put them in a secondary internal cache so that we don't have to do this all over again. This works well except for cases when we add new services to the map, as the underlying cache becomes busted.

Improving performance of uncached keys

The approach taken in this PR is to use an update-ordered linked Map implementation, where iterator / reverseIterator return the elements in the order they've been last updated. For reverseIterator, that means the first key would be the one that was last updated. This means that when a key is uncached, we don't need to iterate through the entire map; we can just stop iterating on the first subtype tag match, making the complexity at most O(n). In addition, since logically we're more likely to access the most recently added services, we're more likely to be closer to O(1) than O(n).

Implementation of UpdateOrderLinkedMap

The implementation used in this PR took the implementation of Scala's VectorMap (2.13+) and adapted it to suite the specific needs of the ZEnvironment; primarily:

  1. VectorMap is insertion ordered. This means to achieve our goal we would have to do map.remove(key).updated(key, value), which is less efficient than doing a single update
  2. VectorMap doesn't provide an efficient reverseIterator implementation, which is something we really need

The implementation works by placing Tombstones at the locations where entries were previously but then updated (and thus places to the end). This approach allows us to create iterators that don't require to traverse the entirety of the underlying Vector.

In addition, the iterators are cached using a poor-man's version of a LazyList. This means that whenever we create a new iterator, we cache the entries as we keep iterating through it. The implementation in this PR would be the equivalent of doing this with a LazyList, but it allows us to use it in Scala 2.12 and it's more performant as it has a very specialized usecase:

private val lz = LazyList.from(iterator0)
def iterator = lz.iterator

Using an update-ordered map also allows us to delete a lot of code that was previously there to keep track of the order that entries were updated!

Other changes

  • Added ZEnvironment#unsafe.addScope method to bypass type-checking wherever possible
  • Improved performance of prune and unionAll
  • Improved performance of ZEnvironment.Patch.diff (important when joining fibers!!)

Benchmark results

One of the most difficult things I had to do in this PR was come up with sensible benchmarks. Please let me know if you can think of any case that wasn't included. I used a ZEnvironment with 50 entries in the benchmark. For full transparency, very small ZEnvironments don't get much benefit out of these changes since the cost of checking the entire map was quite small.

The good:

  • get on cached entries is roughly the same
  • get on uncached entries is significantly faster
  • get[Scope] after ZIO.scoped is significantly faster
  • get[A] after ZIO.scoped is significantly faster
  • prune is significantly faster
  • unionAll is significantly faster

The bad:

  • add is slower

Full results

series/2.x

[info] Benchmark                                         Mode  Cnt        Score        Error  Units
[info] ZEnvironmentBenchmark.access                     thrpt    6  9158946.335 ±  34393.354  ops/s
[info] ZEnvironmentBenchmark.accessAfterScoped          thrpt    6   493354.160 ±  64372.574  ops/s
[info] ZEnvironmentBenchmark.accessAfterScopedUncached  thrpt    6   547822.610 ± 126364.567  ops/s
[info] ZEnvironmentBenchmark.accessScope                thrpt    6   514460.772 ±   5835.740  ops/s
[info] ZEnvironmentBenchmark.add                        thrpt    6  3148615.620 ±  28126.447  ops/s
[info] ZEnvironmentBenchmark.addGetMulti                thrpt    6    56656.684 ±   4983.456  ops/s
[info] ZEnvironmentBenchmark.addGetOne                  thrpt    6   907429.244 ±  92090.403  ops/s
[info] ZEnvironmentBenchmark.addGetRepeat               thrpt    6   639540.635 ±  15187.636  ops/s
[info] ZEnvironmentBenchmark.addGetRepeatBaseline       thrpt    6   633620.575 ±   3673.705  ops/s
[info] ZEnvironmentBenchmark.prune                      thrpt    6    79619.724 ±   1407.312  ops/s
[info] ZEnvironmentBenchmark.union                      thrpt    6   998003.551 ±  43563.744  ops/s

PR:

[info] Benchmark                                         Mode  Cnt        Score        Error  Units
[info] ZEnvironmentBenchmark.access                     thrpt    6  9350561.124 ±  74526.515  ops/s
[info] ZEnvironmentBenchmark.accessAfterScoped          thrpt    6  1959546.633 ±  26073.619  ops/s
[info] ZEnvironmentBenchmark.accessAfterScopedUncached  thrpt    6   832135.422 ±  31198.798  ops/s
[info] ZEnvironmentBenchmark.accessScope                thrpt    6  1904231.684 ±  19471.317  ops/s
[info] ZEnvironmentBenchmark.add                        thrpt    6  1828685.550 ±  12715.251  ops/s
[info] ZEnvironmentBenchmark.addGetMulti                thrpt    6   476091.751 ±  21265.381  ops/s
[info] ZEnvironmentBenchmark.addGetOne                  thrpt    6  8632006.222 ± 333867.899  ops/s
[info] ZEnvironmentBenchmark.addGetRepeat               thrpt    6   459445.983 ±  15382.794  ops/s
[info] ZEnvironmentBenchmark.addGetRepeatBaseline       thrpt    6   560367.659 ±  18064.408  ops/s
[info] ZEnvironmentBenchmark.prune                      thrpt    6   238747.880 ±   5288.523  ops/s
[info] ZEnvironmentBenchmark.union                      thrpt    6  2658024.830 ±  53314.151  ops/s


def isEmpty: Boolean = size == 0

def updated[V1 >: V](key: K, value: V1): UpdateOrderLinkedMap[K, V1] = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Need to add a hard limit on how big the underlying Vector can get before rebuilding it fresh. It's very unlikely we'll ever have a case that keeps forever adding services but it's not impossible

@jdegoes
Copy link
Member

jdegoes commented May 12, 2024

@kyri-petrou Thanks for your work on this! The fast-path for Scope is clearly working.

I am concerned that there are a number of benchmarks that seem to get slower after the PR. Is this still a work-in-progress or are you looking for more final feedback here?

@kyri-petrou
Copy link
Contributor Author

@jdegoes I'm looking for some final feedback; The work that it's left to do is adding tests getting MiMa happy.

I am concerned that there are a number of benchmarks that seem to get slower after the PR

There are only 2 benchmarks that become slower after this PR, one that only benchmarks add (old implementation was ~x2 faster)and another that has a 10:1 add-to-get ratio, where before it was only 20% faster.

On the other hand, for a 1:10 add-to-get ratio(which I think is a more representative usage pattern), we get ~10x improvement with the new implementation.

Previously, the add method was extremely simple (just an updated on the immutable map), which meant that a lot of the complexity was delegated to get. Personally I think it's OK to sacrifice some write performance to get a much bigger improvement in reading performance, but at the same time I do understand your conservation. If you think the writting performance loss is too much, I can perhaps try optimizing it a bit better or look into alternative implementations

@kyri-petrou
Copy link
Contributor Author

@jdegoes in case you started reviewing the code already; I just realised I didn't push my latest commit. Just pushed it with some minor fixes to UpdateOrderLinkedMap.Builder.

@kyri-petrou
Copy link
Contributor Author

@jdegoes marking this as ready for review as I've added the tests that I wanted to.

There's still 1 MiMa failure, complaining about about a synthetic method missing. This synthetic method was due to the parameter that had a default value. However, given that the constructor of ZEnvironment is private, I think it's relatively safe to add that as a MiMa exclusion?

Let me know what you think

@jdegoes
Copy link
Member

jdegoes commented May 15, 2024

This synthetic method was due to the parameter that had a default value. However, given that the constructor of ZEnvironment is private, I think it's relatively safe to add that as a MiMa exclusion?

@kyri-petrou Yes, for sure!

jdegoes
jdegoes previously approved these changes May 15, 2024
@kyri-petrou kyri-petrou marked this pull request as ready for review May 15, 2024 19:37
@kyri-petrou
Copy link
Contributor Author

@jdegoes done 👍 I also deleted one of the tests as it didn't work on JS/Native and I could just see it end up being flaky

@jdegoes jdegoes merged commit ae497b8 into zio:series/2.x May 16, 2024
20 of 21 checks passed
@kyri-petrou kyri-petrou deleted the improve-environment-performance branch May 17, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Optimize ZEnvironment
2 participants