Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split Commons Compress into multiple artifacts #490

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

ppkarwasz
Copy link

This is proof-of-concept of the possibility of splitting Commons Compress into artifacts without optional dependencies, while maintaining backward compatibility.

The Commons Compress code is split into:

  • commons-compress-brotli: an artifact that includes Brotli support and with a non-optional dependency on org.brotli:dec,
  • commons-compress-core: the old Commons Compress with one entire package and the dependency on org.brotli:dec removed,
  • common-compress: an almost empty artifact, whose purpose is to provide backward compatibility. It depends on the other artifacts and contains a module descriptor that allows JPMS users that declares requires org.apache.commons.compress; to also read the other modules:
    module org.apache.commons.compress {
      requires transitive org.apache.commons.compress.brotli;
      requires transitive org.apache.commons.compress.core;
    }
  • commons-compress-bom: a BOM artifact to help users prevent having multiple copies of the same classes on the classpath (or module path). Without using a BOM for dependency management, users could end up having commons-compress-core version 1.27.0 and an older commons-compress as transitive dependency. This would lead to a duplication of classes on the classpath.
  • commons-compress-parent: a technical POM to manage dependency versions and plugin configurations in a single place. We might use Maven Flatten Plugin to remove this artifact from the published ones.

The PoC is fully functional, although the POMs might need some small adjustments.

@ebourg
Copy link
Member

ebourg commented Feb 29, 2024

I like this idea. If we follow this path I'd push further and fully modularize commons-compress, with a separate module for each compression and archive format. That may be easier to split the component in two groups with new names, for example commons-archive-* and commons-compression-*.

@ppkarwasz ppkarwasz marked this pull request as ready for review February 29, 2024 10:19
<module>compress</module>
<module>compress-brotli</module>
<module>compress-core</module>
<module>compress-parent</module>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we simply have this pom as parent which looks more obvious in the tree :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean moving compress-parent/pom.xml here and this file to compress-bom/pom.xml, without changing the inheritance tree (commons-compress-bom -> commons-compress-parent -> commons-compress-core? It should probably work.

In Log4j we structure our projects this way, which probably has to do with the fact that we use ${revision} as version number and the definition of the revision property is in the root pom.xml. @vy, are the other reasons to put the BOM at the root of the project?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean why not having a structure as follows?

  1. /commons-bom/pom.xml (commons-bom module)
  2. /pom.xml (commons-parent module parented by commons-bom)
  3. /commons-core/pom.xml (commons-core moduleparented bycommons-parent`)

This setup has the following shortcoming:

Imagine you have a, say, maven-enforcer-plugin configuration that applies to all pom.xmls, including /commons-bom/pom.xml. If you run ./mvnw enforcer:enforce at the project root directory, it will only run on commons-parent and modules defined in /pom.xml. That is, it won't run for /commons-bom/pom.xml. You can add more examples: apache-rat-plugin won't check the license of /commons-bom/pom.xml, etc.

@ppkarwasz
Copy link
Author

[To give some context to this PR, it was motivated by this discussion on dev@commons.]

@ebourg,

Yes, it would be nice to split Commons Compress into a small API and many modules. Right now we use commons-compress as an optional dependency in log4j-core. It is used by a big rolling file appender component (10%, i.e. about 140 KiB of compressed classes). I plan to remove the optional dependency and move its code to a separate artifact. However if there was a commons-compress-archiver-api artifact that weights 10-20 KiB, I would rather link it directly from log4j-core.

Copy link
Contributor

@elharo elharo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is just a PoC, it would help to convert it to a draft PR.

As written, this does not work and cannot be shipped. The fundamental problem is that it moves classes from the commons-compress artifact to the commons-compress-core artifact without changing the Java package or class names.

This breaks existing code, even of projects that do not depend directly on commons-compress, and will cause a lot of needless pain in the Java community.

Adding a BOM is a workaround for a problem that we shouldn't cause users in the first place.

@ppkarwasz ppkarwasz marked this pull request as draft March 1, 2024 13:10
@garydgregory
Copy link
Member

garydgregory commented Mar 1, 2024

From my POV, if we do anything, and as a first cut, I would follow the route we are taking in Commons VFS: Move to new modules, format-specific code that is already in their own package that depend on a currently optional library, which is: brotli, zstd, and xz. There is no breakage of BC, just new artifacts. If you don't want the new artifacts, stick to the old one. Simple.

For the implementation, the root POM needs to no longer generate a jar obviously, so the current commons-compress artifact can move to a module dir and the root can be renamed commons-compress-parent. Having a BOM POM is nice and easy to add.

Alternatively, the root POM can keep the same name, and the current code can move to a commons-compress-core module. I think I like the first option better.

@vy
Copy link
Member

vy commented Mar 1, 2024

@elharo, I am confused with your remarks given @ppkarwasz stated that it works and it is backward compatible.

I see your objections in the mailing list due to JPMS issues:

A classpath MUST NOT have the same fully package qualified name in
more than one JAR. This is a hard requirement in Java 9+ with JPMS.
The JVM will not run a program that violates this. It is a very
important requirement in Java 8.

But as @ppkarwasz responded

JPMS does not really care which module has which package. As long as
they are disjoint it will be fine

AFAIK about JPMS, Piotr is right.

As written, this does not work and cannot be shipped. The fundamental problem is that it moves classes from the commons-compress artifact to the commons-compress-core artifact without changing the Java package or class names.

I am not able to see the problem here. Could you give a concrete example where this change will break the application?

This breaks existing code, even of projects that do not depend directly on commons-compress, and will cause a lot of needless pain in the Java community.

Could you elaborate on your reasoning a bit, please? It is really difficult to come up with a fix when you only say "this breaks existing code".

@elharo
Copy link
Contributor

elharo commented Mar 1, 2024

I believe @ppkarwasz is simply wrong. This change is not backwards compatible, and will break existing projects. It works in the very simple case where a project only has a direct dependency on commons-compress and has no transitive dependencies on commons-compress. However, in then much more common case where the dependency on commons-compress lives deep in the transitive dependencies this does not work. It's another instance of diamond dependencies and split packages.

The basic problem is that a project can have the old commons-compress deep in the transitive dependency tree.

One then updates some other library to a new version that now has common-compress-core deep in its transitive dependency tree.

The classpath now contains both commons-compress and commons-compress-core, even though the developer of the project might never have heard of commons-compress.

The project breaks in more or less clear ways, depending on Java version and the dependency graph, because a class is found in both commons-compress and commons-compress-core.

See https://jlbp.dev/JLBP-5 and https://jlbp.dev/JLBP-6

@vy
Copy link
Member

vy commented Mar 4, 2024

The basic problem is that a project can have the old commons-compress deep in the transitive dependency tree.

One then updates some other library to a new version that now has common-compress-core deep in its transitive dependency tree.

The classpath now contains both commons-compress and commons-compress-core, even though the developer of the project might never have heard of commons-compress.

This is a crystal clear argument @elharo, thanks so much for the elaborate explanation. @ppkarwasz indeed needs to address this.

@ppkarwasz
Copy link
Author

The basic problem is that a project can have the old commons-compress deep in the transitive dependency tree.

One then updates some other library to a new version that now has common-compress-core deep in its transitive dependency tree.

@elharo, I see your point.

I might modify the PR, so that most of the code remains in commons-compress and only the Brotli specific package is moved to commons-compress-brotli (and the package is renamed).

However I am not sure if the pro/con balance of such a solution would be positive:

  • it gets rid of one optional dependency (for me it is a +1),
  • this will still break users code that directly use the Brotli input stream (-1). This code can very well be in a transitive dependency the user knows nothing about (another -1).

So it appears that the only way to make breaking changes in Java is to break everything by repackaging all the code. It is a pity, because it means that any attempt to make smaller artifacts practically has the opposite effect for users and requires Jakarta-like code changes.

@garydgregory
Copy link
Member

The problem of deep dependencies can be solved if Compress provides a BOM POM and my app depends on this BOM POM (commons-compress-bom). The refactored Compress can contain an empty commons-compress artifact/jar. Or, the refactored Compress keeps core code in commons-compress and introduces a commons-compress-parent POM (this is not the BOM POM).
This is the path VFS is on.

@ppkarwasz
Copy link
Author

The problem of deep dependencies can be solved if Compress provides a BOM POM and my app depends on this BOM POM (commons-compress-bom).

I think the point that @elharo tried to make is that many developers will not notice that commons-compress was upgraded and that they will not know that they need to upgrade their dependency management:

The classpath now contains both commons-compress and commons-compress-core, even though the developer of the project might never have heard of commons-compress.

I am sure that Spring Boot will rapidly replace commons-compress with commons-compress-bom in their dependency management, but many users will not know about this requirement and will consider it an issue that Commons Compress needs to solve.

Anyway: if you know about an Apache-wide BOM that features coherent library versions, my only question is: where do I sign up?

@elharo
Copy link
Contributor

elharo commented Mar 4, 2024

Yes, a good BOM is a workaround; but it's still only a workaround. and one that operates post facto. That is, first the developer sees their project break. Then they spend an hour to a day or more debugging it. How long it takes depends on how familiar they are with Maven dependency trees and whether they immediately recognize the symptoms. If you've encountered it before, you curse and say, "oh yeah, that again" and then start inspecting with mvn dependency:tree and google searches until you figure out exactly which exclusion or BOM you need to add where. If you haven't debugged one of these problems before, though, it's going to take a lot longer to figure it out.

The kindest thing we can do for developers is to change package names and Maven coordinates at the same time, or not at all.

As to an Apache-wide BOM that features coherent library versions, it's not a bad idea. A cross-functional team I was part of spent a couple of years creating such a thing for the Google Cloud Platform. The amount of cat herding involved was significant. The majority of the effort was not in writing the BOM, but simply synchronizing the dozens of projects so that a consistent, compatible set of library versions existed and could be released at the same time. Some of the tooling we built for that effort could be reused here. However, that was within one company and mostly within a single product area. Apache is far more diverse and much less well funded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants