Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate COSINE VectorSimilarity function #13308

Closed
wants to merge 45 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
589d1e2
Mark COSINE Vector Similarity Function as Deprecated
Pulkitg64 May 6, 2024
3f23885
tidy fixes
Pulkitg64 May 6, 2024
0c0c35b
Added entry to CHANGES.txt
Pulkitg64 May 8, 2024
821f775
Mark VectorUtil cosine function as deprecated
Pulkitg64 May 8, 2024
6ca2e43
Move change from 10.0 to 9.11.0
Pulkitg64 May 13, 2024
a644bea
Merge branch 'main' into cosine-deprecate
Pulkitg64 May 28, 2024
54d3ff6
hunspell: speed up "compress"; minimize the number of the generated e…
donnerpeter May 28, 2024
ea0646d
Fixes failing test case for TestOrdinalMap.testRamBytesUsed (#13421)
pseudo-nymous May 28, 2024
b3dc915
Allow users to retrieve counts from taxo association facets (#13414)
stefanvodita May 29, 2024
9a3dbd5
Add next minor version 9.12.0
benwtrent May 29, 2024
750a7c4
Fix TestIndexWriterOnError.testIOError failure. (#13436)
jpountz May 29, 2024
f3c2b91
SimpleText[Float|Byte]VectorValues::scorer should return null when th…
ChrisHegarty May 31, 2024
a540027
Add new dynamic confidence interval configuration to scalar quantized…
benwtrent Jun 1, 2024
a6f920d
Fix test failure on TestPoint#testEqualsAndHashCode (#13433)
easyice Jun 3, 2024
edd7747
Add prefetching support to stored fields. (#13424)
jpountz Jun 3, 2024
c132e95
mention KnnVectorsFormat in o.a.l.codecs package javadocs (#13448)
msokolov Jun 3, 2024
801b822
Avoid unnecessary memory allocation in PackedLongValues#Iterator (#13…
easyice Jun 4, 2024
e868b82
Rewrite newSlowRangeQuery to MatchNoDocsQuery when upper > lower (#13…
ioanatia Jun 4, 2024
846aa2f
Use `ReadAdvice#NORMAL` on files that have a forward-only access patt…
jpountz Jun 5, 2024
05b4639
Add prefetching for doc values and norms. (#13411)
jpountz Jun 5, 2024
fe50e86
Implement Weight#count for vector values in the FieldExistsQuery (#13…
bugmakerrrrrr Jun 5, 2024
9a4caa9
Update Gradle wrapper to 8.8 - supports Java 22 (#13453)
ChrisHegarty Jun 6, 2024
868897e
Update WrapperDownloader to accept java 22 and correct deprecated new…
dweiss Jun 6, 2024
14782a2
Add a github workflow that checks common (and less common) gradle tas…
dweiss Jun 6, 2024
b85c99d
Java 22 has been released, so drop -ea from smoketester gh workflow m…
dweiss Jun 6, 2024
d5aa88b
Add test for ghost fields to BaseKnnVectorQueryTestCase. (#13455)
jpountz Jun 6, 2024
d0d2aa2
Removed Scorer#getWeight (#13440)
iamsanjay Jun 6, 2024
61a6abd
DOAP changes for release 9.11.0
benwtrent Jun 6, 2024
58ab5b7
Merge related HashMaps in FieldInfos#FieldNumbers into one map (#13460)
iverase Jun 6, 2024
51d8d72
Sync CHANGES for 9.11.0
benwtrent Jun 6, 2024
39a7eab
Add back-compat indices for 9.11.0
benwtrent Jun 6, 2024
512ff4a
MultiTermQuery return null for ScoreSupplier (#13454)
mayya-sharipova Jun 6, 2024
9f8e886
Move entry in CHANGES.txt
iverase Jun 7, 2024
c7a7d48
Reduce the heap use of BKDReader instances (#13464)
original-brownbear Jun 7, 2024
a5b4b8c
Document how to make tests run faster in IntelliJ (#13466)
msokolov Jun 7, 2024
2d62faa
Add int8_hnsw backcompat index creawtion to dev tools scripts (#13465)
benwtrent Jun 7, 2024
262341b
on README.md, make links to CONTRIBUTING.md more prominent, and demot…
Jun 7, 2024
71a9aed
fix fumble-finger
Jun 7, 2024
0699117
clarify that IntelliJ UI varies across platforms
Jun 7, 2024
00a8704
Mark COSINE Vector Similarity Function as Deprecated
Pulkitg64 May 6, 2024
be3527a
tidy fixes
Pulkitg64 May 6, 2024
0e91af0
Move changes from 9.11.0 to 9.12.0
Pulkitg64 Jun 9, 2024
1163749
Mark VectorUtil cosine function as deprecated
Pulkitg64 May 8, 2024
ec6a037
Move change from 10.0 to 9.11.0
Pulkitg64 May 13, 2024
2a52ebf
Merge remote-tracking branch 'origin/cosine-deprecate' into cosine-de…
Pulkitg64 Jun 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
88 changes: 88 additions & 0 deletions .github/workflows/run-checks-gradle-upgrade.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
name: "Run checks: gradle upgrade"

on:
workflow_dispatch:

pull_request:
branches:
- 'main'
- 'branch_9x'
paths:
- '.github/workflows/run-checks-gradle-upgrade.yml'
- 'gradle/wrapper/**'

push:
branches:
- 'main'
- 'branch_9x'
paths:
- '.github/workflows/run-checks-gradle-upgrade.yml'
- 'gradle/wrapper/**'

env:
GRADLE_ENTERPRISE_ACCESS_KEY: ${{ secrets.GE_ACCESS_TOKEN }}

jobs:
gradleSanityCheck:
name: "Run tasks (java: ${{ matrix.java-version }}, alt-java: ${{ matrix.uses-alt-java }})"
timeout-minutes: 30

strategy:
matrix:
os: [ ubuntu-latest ]
java-version: [ '22' ]
uses-alt-java: [ true, false ]

runs-on: ${{ matrix.os }}

env:
ALT_JAVA_DIR: /tmp/alt-java

steps:
- uses: actions/checkout@v4
- uses: ./.github/actions/prepare-for-build
with:
java-version: ${{ matrix.java-version }}

- name: Set up RUNTIME_JAVA_HOME variable
if: ${{ matrix.uses-alt-java }}
run: |
echo "All installed JDKs:"
set | grep "JAVA"

echo "Gradle's 'RUNTIME_JAVA_HOME' JDK:"
RUNTIME_JAVA_HOME_VAR=JAVA_HOME_`echo ${{ matrix.java-version }} | egrep --only "[0-9]+"`_X64
echo ${RUNTIME_JAVA_HOME_VAR} points at ${!RUNTIME_JAVA_HOME_VAR}

# Copy the JDK from its default location to /tmp so that it appears different to gradle.
rsync -av ${!RUNTIME_JAVA_HOME_VAR}/ ${{ env.ALT_JAVA_DIR }}/

# This sets the environment variable and makes it available for subsequent job steps.
# https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-environment-variable
echo "RUNTIME_JAVA_HOME=${{ env.ALT_JAVA_DIR }}" >> "$GITHUB_ENV"

- run: ./gradlew -p lucene/core check -x test

- name: ./gradlew regenerate
run: |
# add this package for generateEmojiTokenizationTestChecksumLoad.
sudo apt-get install libwww-perl
./gradlew regenerate -x generateUAX29URLEmailTokenizerInternal --rerun-tasks
if [ ! -z "$(git status --porcelain)" ]; then
echo ":warning: **regenerateleft local checkout in modified state**" >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
git status --porcelain >> $GITHUB_STEP_SUMMARY
echo '```' >> $GITHUB_STEP_SUMMARY
git reset --hard && git clean -xfd .
fi

- run: ./gradlew testOpts
- run: ./gradlew helpWorkflow
- run: ./gradlew licenses updateLicenses
- run: ./gradlew tidy
- run: ./gradlew check -x test
- run: ./gradlew assembleRelease mavenToLocal

# Conserve resources: only run these in non-alt-java mode.
- run: ./gradlew getGeoNames
if: ${{ !matrix.uses-alt-java }}
2 changes: 1 addition & 1 deletion .github/workflows/run-nightly-smoketester.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
strategy:
matrix:
os: [ ubuntu-latest ]
java-version: [ '21', '22-ea' ]
java-version: [ '21', '22' ]

runs-on: ${{ matrix.os }}

Expand Down
2 changes: 1 addition & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ In case your contribution fixes a bug, please create a new test case that fails

### IDE support

- *IntelliJ* - IntelliJ idea can import and build gradle-based projects out of the box.
- *IntelliJ* - IntelliJ idea can import and build gradle-based projects out of the box. However please note that it will default to running tests by calling the gradle wrapper, and while this works, it is for some reason quite slow. Instead we recommend configuring IntelliJ to use its own built-in test runner. You can modify this config (in 2024 version) by navigating to settings for Build Execution & Deployment/Build Tools/Gradle (under File/Settings menu on some platforms) and selecting "Build and Run using: IntelliJ IDEA" and "Run Tests using: IntelliJ IDEA".
- *Eclipse* - Basic support ([help/IDEs.txt](https://github.com/apache/lucene/blob/main/help/IDEs.txt#L7)).
- *Netbeans* - Not tested.

Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,8 @@ comprehensive documentation, visit:

- Latest Releases: <https://lucene.apache.org/core/documentation.html>
- Nightly: <https://ci-builds.apache.org/job/Lucene/job/Lucene-Artifacts-main/javadoc/>
- New contributors should start by reading [Contributing Guide](./CONTRIBUTING.md)
- Build System Documentation: [help/](./help/)
- Developer Documentation: [dev-docs/](./dev-docs/)
- Migration Guide: [lucene/MIGRATE.md](./lucene/MIGRATE.md)

## Building
Expand All @@ -45,15 +45,15 @@ comprehensive documentation, visit:

We'll assume that you know how to get and set up the JDK - if you don't, then we suggest starting at https://jdk.java.net/ and learning more about Java, before returning to this README.

See [Contributing Guide](./CONTRIBUTING.md) for details.

## Contributing

Bug fixes, improvements and new features are always welcome!
Please review the [Contributing to Lucene
Guide](./CONTRIBUTING.md) for information on
contributing.

- Additional Developer Documentation: [dev-docs/](./dev-docs/)

## Discussion and Support

- [Users Mailing List](https://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import java.io.OutputStream;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URI;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Path;
Expand Down Expand Up @@ -55,8 +56,8 @@ public static void main(String[] args) {

public static void checkVersion() {
int major = Runtime.version().feature();
if (major != 21) {
throw new IllegalStateException("java version must be 21, your version: " + major);
if (major != 21 && major != 22) {
throw new IllegalStateException("java version must be 21 or 22, your version: " + major);
}
}

Expand Down Expand Up @@ -86,7 +87,7 @@ public void run(Path destination) throws IOException, NoSuchAlgorithmException {
}
}

URL url = new URL("https://raw.githubusercontent.com/gradle/gradle/v" + wrapperVersion + "/gradle/wrapper/gradle-wrapper.jar");
URL url = URI.create("https://raw.githubusercontent.com/gradle/gradle/v" + wrapperVersion + "/gradle/wrapper/gradle-wrapper.jar").toURL();
System.err.println("Downloading gradle-wrapper.jar from " + url);

// Zero-copy save the jar to a temp file
Expand Down
7 changes: 7 additions & 0 deletions dev-tools/doap/lucene.rdf
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,13 @@
</maintainer>

<!-- NOTE: please insert releases in numeric order, NOT chronologically. -->
<release>
<Version>
<name>lucene-9.11.0</name>
<created>2024-06-06</created>
<revision>9.11.0</revision>
</Version>
</release>
<release>
<Version>
<name>lucene-9.10.0</name>
Expand Down
3 changes: 3 additions & 0 deletions dev-tools/scripts/addBackcompatIndexes.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ def create_and_add_index(source, indextype, index_version, current_version, temp
'cfs': 'index',
'nocfs': 'index',
'sorted': 'sorted',
'int8_hnsw': 'int8_hnsw',
'moreterms': 'moreterms',
'dvupdates': 'dvupdates',
'emptyIndex': 'empty'
Expand All @@ -60,6 +61,7 @@ def create_and_add_index(source, indextype, index_version, current_version, temp
'cfs': 'testCreateCFS',
'nocfs': 'testCreateNoCFS',
'sorted': 'testCreateSortedIndex',
'int8_hnsw': 'testCreateInt8HNSWIndices',
'moreterms': 'testCreateMoreTermsIndex',
'dvupdates': 'testCreateIndexWithDocValuesUpdates',
'emptyIndex': 'testCreateEmptyIndex'
Expand Down Expand Up @@ -204,6 +206,7 @@ def main():
current_version = scriptutil.Version.parse(scriptutil.find_current_version())
create_and_add_index(source, 'cfs', c.version, current_version, c.temp_dir)
create_and_add_index(source, 'nocfs', c.version, current_version, c.temp_dir)
create_and_add_index(source, 'int8_hnsw', c.version, current_version, c.temp_dir)
should_make_sorted = current_version.is_back_compat_with(c.version) \
and (c.version.major > 6 or (c.version.major == 6 and c.version.minor >= 2))
if should_make_sorted:
Expand Down
4 changes: 2 additions & 2 deletions gradle/testing/alternative-jdk-support.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ if (jvmGradle != jvmCurrent) {
doFirst {

def jvmInfo = { JavaInfo javaInfo ->
JvmInstallationMetadata jvmMetadata = jvmDetector.getMetadata(new InstallationLocation(javaInfo.javaHome, "specific path"))
JvmInstallationMetadata jvmMetadata = jvmDetector.getMetadata(InstallationLocation.userDefined(javaInfo.javaHome, "specific path"))
return "${jvmMetadata.languageVersion} (${jvmMetadata.displayName} ${jvmMetadata.runtimeVersion}, home at: ${jvmMetadata.javaHome})"
}

Expand Down Expand Up @@ -88,6 +88,6 @@ if (jvmGradle != jvmCurrent) {
// Set up root project's properties.
rootProject.ext.runtimeJavaExecutable = jvmCurrent.javaExecutable
rootProject.ext.runtimeJavaHome = jvmCurrent.javaHome
rootProject.ext.runtimeJavaVersion = jvmDetector.getMetadata(new InstallationLocation(jvmCurrent.javaHome, "specific path")).getLanguageVersion()
rootProject.ext.runtimeJavaVersion = jvmDetector.getMetadata(InstallationLocation.userDefined(jvmCurrent.javaHome, "specific path")).getLanguageVersion()
rootProject.ext.usesAltJvm = (jvmGradle != jvmCurrent);

2 changes: 1 addition & 1 deletion gradle/validation/check-environment.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ import org.gradle.util.GradleVersion

configure(rootProject) {
ext {
expectedGradleVersion = '8.4'
expectedGradleVersion = '8.8'
hasJavaFlightRecorder = ModuleLayer.boot().findModule('jdk.jfr').map(this.class.module::canRead).orElse(false)
}

Expand Down
2 changes: 1 addition & 1 deletion gradle/wrapper/gradle-wrapper.jar.sha256
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0336f591bc0ec9aa0c9988929b93ecc916b3c1d52aed202c7381db144aa0ef15
cb0da6751c2b753a16ac168bb354870ebb1e162e9083f116729cec9c781156b8
2 changes: 1 addition & 1 deletion gradle/wrapper/gradle-wrapper.jar.version
Original file line number Diff line number Diff line change
@@ -1 +1 @@
8.4.0
8.8.0
2 changes: 1 addition & 1 deletion gradle/wrapper/gradle-wrapper.properties
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
distributionBase=GRADLE_USER_HOME
distributionPath=wrapper/dists
distributionUrl=https\://services.gradle.org/distributions/gradle-8.4-bin.zip
distributionUrl=https\://services.gradle.org/distributions/gradle-8.8-bin.zip
networkTimeout=10000
validateDistributionUrl=true
zipStoreBase=GRADLE_USER_HOME
Expand Down
60 changes: 58 additions & 2 deletions lucene/CHANGES.txt
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,8 @@ API Changes
I/O for top-level disjunctions. Weight#bulkScorer() still exists for compatibility, but delegates
to ScorerSupplier#bulkScorer(). (Adrien Grand)

* GITHUB#13410: Removed Scorer#getWeight (Sanjay Dutt, Adrien Grand)

New Features
---------------------

Expand Down Expand Up @@ -202,6 +204,10 @@ Changes in Backwards Compatibility Policy
Other
---------------------

* GITHUB#13459: Merges all immutable attributes in FieldInfos.FieldNumbers into one Hashmap saving
memory when writing big indices. Fixes an exotic bug when calling clear where not all attributes
were cleared. (Ignacio Vera)

* LUCENE-10376: Roll up the loop in VInt/VLong in DataInput. (Guo Feng)

* LUCENE-10253: The @BadApple annotation has been removed from the test
Expand Down Expand Up @@ -233,13 +239,51 @@ Other

* GITHUB#13332: Improve MissingDoclet linter to check records correctly. (Uwe Schindler)

======================== Lucene 9.12.0 =======================

API Changes
---------------------

* GITHUB#13281: Mark COSINE VectorSimilarityFunction as deprecated. (Pulkit Gupta)

New Features
---------------------
(No changes)

Improvements
---------------------
(No changes)

Optimizations
---------------------

* GITHUB#13439: Avoid unnecessary memory allocation in PackedLongValues#Iterator. (Zhang Chao)

* GITHUB##13425: Rewrite SortedNumericDocValuesRangeQuery to MatchNoDocsQuery when the upper bound is smaller than the
lower bound. (Ioana Tagirta)

* GITHUB#13322: Implement Weight#count for vector values in the FieldExistsQuery. (Pan Guixin)

* GITHUB#13454: MultiTermQuery returns null ScoreSupplier in cases where
no query terms are present in the index segment (Mayya Sharipova)

Bug Fixes
---------------------
(No changes)

Other
--------------------
(No changes)

======================== Lucene 9.11.0 =======================

API Changes
---------------------

* GITHUB#13145: Deprecate ByteBufferIndexInput as it will be removed in Lucene 10.0. (Uwe Schindler)

* GITHUB#13281: Mark COSINE VectorSimilarityFunction as deprecated. (Pulkit Gupta)

* GITHUB#13422: an explicit dependency on the HPPC library is removed in favor of an internal repackaged copy in
oal.internal.hppc. If you relied on HPPC as a transitive dependency, you'll have to add it to your project explicitly.
The HPPC classes now bundled in Lucene core are internal and will have restricted access in future releases, please do
Expand Down Expand Up @@ -274,6 +318,12 @@ New Features
* GITHUB#13181: Add new VectorScorer interface to vector value iterators. This allows for vector codecs to supply
simpler and more optimized vector scoring when iterating vector values directly. (Ben Trent)

* GITHUB#13414: Counts are always available in the result when using taxonomy facets. (Stefan Vodita)

* GITHUB#13445: Add new option when calculating scalar quantiles. The new option of setting `confidenceInterval` to
`0` will now dynamically determine the quantiles through a grid search over multiple quantiles calculated
by multiple intervals. (Ben Trent)

Improvements
---------------------

Expand Down Expand Up @@ -309,6 +359,8 @@ Improvements

* GITHUB#13276: UnifiedHighlighter: new 'passageSortComparator' option to allow sorting other than offset order. (Seunghan Jung)

* GITHUB#13429: Hunspell: speed up "compress"; minimize the number of the generated entries; don't even consider "forbidden" entries anymore (Peter Gromov)

Optimizations
---------------------

Expand Down Expand Up @@ -355,16 +407,18 @@ Optimizations

* GITHUB#13327: Reduce memory usage of field maps in FieldInfos and BlockTree TermsReader. (Bruno Roustant, David Smiley)

* GITHUB#13339: Add a MemorySegment Vector scorer - for scoring without copying on-heap (Chris Hegarty)

* GITHUB#13368: Replace Map<Integer, Object> by primitive IntObjectHashMap. (Bruno Roustant)

* GITHUB#13392: Replace Map<Long, Object> by primitive LongObjectHashMap. (Bruno Roustant)

* GITHUB#13339: Add a MemorySegment Vector scorer - for scoring without copying on-heap (Chris Hegarty)

* GITHUB#13400: Replace Set<Integer> by IntHashSet and Set<Long> by LongHashSet. (Bruno Roustant)

* GITHUB#13406: Replace List<Integer> by IntArrayList and List<Long> by LongArrayList. (Bruno Roustant)

* GITHUB#13420: Replace Map<Character> by CharObjectHashMap and Set<Character> by CharHashSet. (Bruno Roustant)

Bug Fixes
---------------------

Expand Down Expand Up @@ -400,6 +454,8 @@ Bug Fixes

* GITHUB#13376: Fix integer overflow exception in postings encoding as group-varint. (Zhang Chao, Guo Feng)

* GITHUB#13421: Fixes TestOrdinalMap.testRamBytesUsed for multiple default PackedInts.NullReader instances. (Amir Raza)

Build
---------------------

Expand Down
14 changes: 14 additions & 0 deletions lucene/MIGRATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -779,3 +779,17 @@ to manage the indexed data on their own and create new `Facet` implementations t
The `Weight#scorerSupplier` method is now declared abstract, compelling child classes to implement the ScorerSupplier
interface. Additionally, `Weight#scorer` is now declared final, with its implementation being delegated to
`Weight#scorerSupplier` for the scorer.

### Reference to `weight` is removed from Scorer (GITHUB#13410)

The `weight` has been removed from the Scorer class. Consequently, the constructor, `Scorer(Weight)`,and a getter,
`Scorer#getWeight`, has also been eliminated. References to weight have also been removed from nearly all the subclasses
of Scorer, including ConstantScoreScorer, TermScorer, and others.

Additionally, several APIs have been modified to remove the weight reference, as it is no longer necessary.
Specifically, the method `FunctionValues#getScorer(Weight weight, LeafReaderContext readerContext)` has been updated to
`FunctionValues#getScorer(LeafReaderContext readerContext)`.

Callers must now keep track of the Weight instance that created the Scorer if they need it, instead of relying on
Scorer.