Skip to content

Releases: jina-ai/finetuner

v0.7.0

18 Jan 09:02
6478f78
Compare
Choose a tag to compare

Release Note Finetuner 0.7.0

This release covers Finetuner version 0.7.0, including dependencies finetuner-api 0.4.10 and finetuner-core 0.12.3.

This release contains 4 new features, 4 bug fixes, and 4 documentation improvements.

🆕 Features

Allow Fine-Tuning for PointNet++ Models (#638)

We have added a new embedding model based on the PointNet++ model. You can use this model for 3D-mesh applications. To fine-tune it, set the model parameter of the fit function to pointnet++:

import finetuner

finetuner.login()

run = finetuner.fit(
    model='pointnet++',
    train_data='finetuner/modelnet40-train',
    epochs=10,
    batch_size=64,
    learning_rate=5e-4,
    loss='TripletMarginLoss',
    device='cuda',
)

We have also prepared a tutorial with detailed information about how to use the model and prepare 3D mesh data for Finetuner.

Add val_split to fit interface (#624)

To make it easier to evaluate models, we have added the val_split parameter to the fit function. Using this parameter automatically splits the training data into a training dataset and a validation dataset, used to calculate the validation loss. For example, the following call will automatically hold out 20% of the data for validation:

run = finetuner.fit(
    model='efficientnet_b0',
    train_data=train_data,
    val_split=0.2,
)

Add recall and F1 score to evaluation metrics

If you are using the evaluation callback, evaluation results now include two new metrics: recall_at_k and f1_score_at_k.

Evaluation callback evaluates zero-shot models and reports this in the logs

Previously, the evaluation callback only evaluated models after each fine-tuning epoch. Now, evaluation metrics are also calculated before any fine-tuning and the log displays the metrics after each epoch, together with the pre-fine-tuning (i.e. zero-shot) evaluation results:

DEBUG    Finetuning took 0 days, 0 hours 5 minutes and 39 seconds                                         __main__.py:197
INFO     Metric: 'pointnet++_precision_at_k' before fine-tuning:  0.56533 after fine-tuning: 0.81100      __main__.py:210
INFO     Metric: 'pointnet++_recall_at_k' before fine-tuning:  0.15467 after fine-tuning: 0.24175         __main__.py:210
INFO     Metric: 'pointnet++_f1_score_at_k' before fine-tuning:  0.23209 after fine-tuning: 0.34774       __main__.py:210
INFO     Metric: 'pointnet++_hit_at_k' before fine-tuning:  0.95667 after fine-tuning: 0.95333            __main__.py:210
INFO     Metric: 'pointnet++_average_precision' before fine-tuning:  0.71027 after fine-tuning: 0.85515   __main__.py:210
INFO     Metric: 'pointnet++_reciprocal_rank' before fine-tuning:  0.79103 after fine-tuning: 0.89103     __main__.py:210
INFO     Metric: 'pointnet++_dcg_at_k' before fine-tuning:  4.71826 after fine-tuning: 6.41999 

⚙ Refactoring

Drop support for Python version 3.7

Due to a new dependency in the stubs package of finetuner-core that is not supported by Python 3.7, finetuner now requires Python 3.8 or higher.

Change default experiment_name from current working dir to default (#637)

Previously, Finetuner named experiments after the current folder, if no other name was provided. Now, the generic name default is used instead. Please note, that you can not create two runs with the same name in the same experiment, and we recommend always giving experiments explicit names.

Add page and size parameters to list_runs and list_experiments functions (#637)

We have added two new optional arguments to the list_runs and list_experiments functions: page and size. You can set size so that you retrieve no more than a specific number of runs or experiments. To retrieve more, set the page argument to progressively higher values. You can use these parameters to paginate the list of runs and experiments, which can become quite long.

Deprecate cpu parameter and notebook_login function (#631)

The cpu parameter of the fit function is deprecated. Instead, use the device parameter which can be set to cuda or cpu:

run = finetuner.fit(
    model='efficientnet_b0',
    train_data=train_data,
-   cpu=False,
+   device='cuda',
)

Additionally, we have deprecated the notebook_login() function. This was necessary to specifically support Jupyter notebooks, but login() now works correctly in notebooks by itself. We have removed the notebook_login() function and you must update any code that calls it.

🐞 Bug Fixes

Fix build_encoding_dataset (#623)

Previously, when fine-tuning a CLIP model, if you passed Finetuner a list of strings to encode, it would fail because it was unable to determine if this was a list of texts or a list of URIs to images or other objects. You would need to explicitly state the type in a DocumentArray object. Finetuner can now automatically detect the data type and handle it correctly.

finetuner.encode(clip_text_model, ['some text to encode'])

Adjust num_items_per_class if necessary

The finetuner.fit function has a num_items_per_class parameter which determines how many items per class should be put into a batch during training. Unfortunately, it is not possible to freely set this parameter for every batch_size and every dataset, which could lead to errors during the training. Finetuner will now automatically adjust num_items_per_class if the one provided by the user is not compatible with the rest of the configuration.

Finetuner will try to find a parameter value close to the one you provided. You will only receive a warning that the parameter has been adjusted, and training will continue.

Set default freeze to False

Until now, the build_model function would automatically set the parameter freeze=True when constructing a CNN model. Finetuner would also add a projection head to the function. To avoid this, freeze is now set to False by default.

Log messages in evaluation callback

Previously, some logging messages from the evaluation callback were overwritten by progress bars in some cases. This should no longer occur.

📗 Documentation Improvements

Re-write the README (#638, #643)

We have re-written the README file to be more concise and to include results from PointNet++ fine-tuning.

Rewrite the M-CLIP notebook to use the German Fashion12k dataset (#643)

Our M-CLIP tutorial now uses the German Fashion12k dataset, which represents a more realistic application scenario.

Add before and after examples in the tutorials (#622)

In our tutorials, we now include some examples that let you compare results before and after fine-tuning.

Restructuring and enriching our documentation as a whole (#643)

We have performed a substantial re-write of our documentation pages. This includes new advanced topics like "Negative Mining" and more comprehensive information about inference in the Walkthrough section. We also improved the developer reference.

🤟 Contributors

We would like to thank all contributors to this release:

v0.6.7

25 Nov 11:31
44d2ed6
Compare
Choose a tag to compare

Release Note Finetuner 0.6.7

This release contains 4 new features.

🆕 Features

Add support for cross-modal evaluation in the EvaluationCallback (#615)

In previous versions of Finetuner, when using the EvaluationCallback to calculate IR metrics, you could only use a single model to encode both the query and the index data.
This means that for training multiple models at the same time, like in CLIP fine-tuning, you could only use one encoder for evaluation.
It is now possible to do cross-modal evaluation, where you use one model for encoding the query data and a second model for encoding the index data.
This is useful in multi-modal tasks like text-to-image.

For doing the cross-modal evaluation, all you need to do is specify the model and index_model arguments in the EvaluationCallback, like so:

import finetuner
from finetuner.callback import EvaluationCallback

run = finetuner.fit(
    model='openai/clip-vit-base-patch32',
    train_data=train_data,
    eval_data=eval_data,
    loss='CLIPLoss',
    callbacks=[
        EvaluationCallback(
            query_data=query_data,
            index_data=index_data,
            model='clip-text',
            index_model='clip-vision'
        )
    ]
)

See the EvaluationCallback section of the Finetuner documentation for details on using this callback.
See also the sections Text-to-Image Search via CLIP and Multilingual Text-to-Image search with MultilingualCLIP for concrete examples of cross-modal evaluation.

Add support for Multilingual CLIP (#611)

Finetuner now supports a Multilingual CLIP model from the OpenCLIP project.
Multilingual CLIP models are trained on large text and image datasets from different languages using the CLIP constrastive learning approach.

They are a good fit for text-to-image applications where texts are in languages other than English.

The currently supported Multilingual CLIP model - xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k - uses a ViT Base32 image encoder and an XLM Roberta Base text encoder.

You can find details on how to fine-tune this specific model in the Multilingual Text-to-Image search with MultilingualCLIP section of the documentation.

import finetuner
run = finetuner.fit(
    model='xlm-roberta-base-ViT-B-32::laion5b_s13b_b90k',
    train_data=train_data,
    eval_data=eval_data,
    epochs=5,
    learning_rate=1e-6,
    loss='CLIPLoss',
    device='cuda',
)

Filter models by task in finetuner.describe_models() (#610)

The finetuner.describe_models() function, which provides an overview of supported model backbones, now accepts an optional task argument that filters the models by task.

To display all models you can omit the argument.

import finetuner
finetuner.describe_models()

To filter based on task, you need to provide a valid task name. For example:

finetuner.describe_models(task='image-to-image')

or

finetuner.describe_models(task='text-to-image')

Currently valid task names are text-to-text, text-to-image and image-to-image.

Configure the num_items_per_class argument in finetuner.fit() (#614)

The finetuner.fit() method now includes a new argument num_items_per_class that allows you to set the number of items per label that will be included in each batch.
This gives the user the ability to further tailor batch construction to their liking. If not set, this argument has a default value of 4, compatible with the previous versions of Finetuner.

You can easily set this when calling finetuner.fit():

import finetuner
run = finetuner.fit(
    model='efficient_b0',
    train_data=train_data,
    eval_data=eval_data,
    batch_size=128,
    num_items_per_class=8,
)

⚠️ The batch size needs to be a multiple of the number of items per class, in other words batch_size % num_items_per_class == 0.
Otherwise Finetuner cannot respect the given num_items_per_class and throws an error.

🤟 Contributors

We would like to thank all contributors to this release:

v0.6.5

11 Nov 13:28
c66babe
Compare
Choose a tag to compare

Release Note Finetuner 0.6.5

This release contains 6 new features, 1 bug fix, 2 refactorings, and 2 documentation improvements.

🆕 Features

Support loading training data and evaluation data from CSV files (#592)

We now support CSV files in the finetuner.fit()method. This simplifies training because it is no longer necessary to construct a DocumentArray object to contain training data. Instead, you can use a CSV file that contains the training data or pointers (i.e. URIs) to the relevant data objects.

train_data = '/path/to/data.csv'

run = finetuner.fit(
    model='efficientnet_b0',
    train_data=train_data,
)

See the Finetuner documentation page for preparing CSV files for more information.

You can also provide CSV files for evaluation data, as well as for query and index data when using EvaluationCallback. See the EvaluationCallback page in the Finetuner documentation for more information.

import finetuner
from finetuner.callback import EvaluationCallback

finetuner.fit(
    model='efficient_b0',
    train_data='/path/to/train.csv',
    eval_data='/path/to/eval.csv',
    callbacks=[
        EvaluationCallback(
            query_data='/path/to/query.csv',
            index_data='/path/to/index.csv',
        )
    ]
)

Support for data in lists when encoding (#598)

The finetuner.encode() method now takes lists of texts or image URIs as well as DocumentArray objects as inputs. This simplifies encoding because it is no longer necessary to construct a DocumentArray object to contain data.

model = finetuner.get_model('/path/to/YOUR-MODEL.zip')

texts = ['some text to encode']

embeddings = finetuner.encode(model=model, data=texts)

See the Finetuner documentation page for encoding documents for more information.

Artifact sharing (#602)

Users can now share their model artifacts with anyone who has access to Jina and has the artifact ID by adding the public=True flag to finetuner.fit(). By default, artifacts are set to private, equivalent to public=False.

finetuner.fit(
    model=model_name
    train_data=data,
    public=True,
)

See the Finetuner documentation for advanced job options for more information.

Allow access_paths for FinetunerExecutor

The FinetunerExecutor now takes an optional argument access_paths that allows users to specify a traversal path through an array of nested Document instances. The executor only processes those document chunks specified by the traversal path.

See the FinetunerExecutor documentation and the DocArray documentation for information on constructing document paths.

Allow logger callback for Weights & Biases during Finetuner runs

You can now use the Weights & Biases logger callback to track metrics for your finetuner run, using anonymous mode. After finetuning runs are finished, users receive a URL in the logs that points to a Weights & Biases web page with the tracked metrics of the run. This log is temporary (automatically deleted after seven days if unclaimed), and users can claim it by logging in with their Weights & Biases account credentials.

wandb: Currently logged in as: anony-mouse-279369. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.13.5
wandb: Run data is saved locally in [YOUR-PATH]
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run cool-wildflower-2
wandb:  View project at https://wandb.ai/anony-mouse-279369/[YOUR-PROJECT-URL]
wandb:  View run at https://wandb.ai/anony-mouse-279369/[YOUR-RUN-URL]

See the Finetuner documentation page on callbacks for more information.

Support for image blobs

We now support DocumentArray image blobs in Finetuner. It is no longer necessary to directly convert images into tensors before sending them to the cloud.

You can convert image filepaths or URIs to blobs with the Document.load_uri_to_blob() method.

This saves a lot of memory and bandwidth since blobs are stored in their native, typically compressed format. Blobs are usually as small as 10% of the size of their corresponding tensor.

d = Document(uri='tests/resources/lena.png')
d.load_uri_to_blob()

If you use CSV to input local image files to Finetuner, this conversion happens automatically by default.

⚙ Refactoring

Bump Hubble SDK version to 0.23.3 (#594)

We have updated Finetuner to the latest version of Hubble, improving functionality and particularly improving access from code running in notebooks.

We will deprecate the method finetuner.notebook_login() starting from version 0.7 of Finetuner. Inside notebooks, finetuner.login() will now detect the environment automatically.

Remove connect function (#596)

We have removed the finetuner.connect() method, since Finetuner no longer requires you to log in to Jina again if you are already logged in.

🐞 Bug Fixes

Fix executor _finetuner import

This bug caused the Finetuner executor to fail to start, and we have fixed the underlying issue.

📗 Documentation Improvements

Document the force argument to finetuner.login() (#596)

We documented the force parameter to finetuner.login(), which forces users to log in to Jina again, even if already logged in.

Update Image-to-Image example (#599)

We have changed the configuration and training sets in the examples in the Image-to-Image Search via ResNet50 documentation page.

🤟 Contributors

We would like to thank all contributors to this release:

v0.6.4

27 Oct 13:02
6e1a692
Compare
Choose a tag to compare

Release Note Finetuner 0.6.4

This release contains 6 new features, 1 bug fix and 1 documentation improvement.

🆕 Features

User-friendly login from Python notebooks (#576)

We've added the method finetuner.notebook_login() as a new method for logging in from notebooks like
Jupyter in a more user-friendly way.

Notebook login

Change device specification argument in finetuner.fit() (#577)

We've deprecated the cpu argument to the finetuner.fit() method, replacing it with the device argument.

Instead of specifying cpu=False, for a GPU run, you should now use device='cuda'; and for a CPU run, instead of
cpu=True, use device='cpu'.

The default is equivalent to device='cuda'. Unless you're certain that your Finetuner job will run quickly on a CPU,
you should use the default argument.

We expect to remove the cpu argument entirely in version 0.7, which will break any old code still using it.

Validate Finetuner run arguments on the client side (#579)

The Finetuner client now checks that the arguments to Finetuner runs are coherent and at least partially valid, before
transmitting them to the cloud infrastructure. Not all arguments can be validated on the client-side, but the Finetuner
client now checks all the ones that can.

Update names of OpenCLIP models (#580)

We have changed the names of open-access CLIP models available via Finetuner to be compatible with
CLIP-as-Service. For example, the model previously referenced as ViT-B-16#openai
is now ViT-B-16::openai.

Add method finetuner.build_model() to load pre-trained models without fine-tuning (#584)

Previously, it was not possible to load a pre-trained model via Finetuner without performing some retraining or
'fine-tuning' on it. Now it is possible to get a pre-trained model, as is, and use it via Finetuner immediately.

For example, to use a BERT model in the finetuner without any fine-tuning:

import finetuner
from docarray import Document, DocumentArray

model = finetuner.build_model('bert-base-cased') # load pre-trained model
documents = DocumentArray([Document(text='example text 1'), Document(text='example text 2')])
finetuner.encode(model=model, data=documents) # encode texts without having done any fine-tuning
assert documents.embeddings.shape == (2, 768)

Show progress while encoding documents (#586)

You will now see a progress bar when using finetuner.encode().

🐞 Bug Fixes

Fix GPU-availability issues

We have observed some problems with GPU availability in Finetuner's use of Jina AI's cloud infrastructure. We've fully
analyzed and repaired these issues.

📗 Documentation Improvements

Add Colab links to Finetuning Tasks pages (#583)

We have added runnable Google Colab notebooks for the examples in the Finetuning Tasks documentation pages:
Text-to-Text,
Image-to-Image,
and Text-to-Image.

🤟 Contributors

We would like to thank all contributors to this release:

v0.6.3

13 Oct 12:48
ef54aae
Compare
Choose a tag to compare

Release Note

This release contains 2 new features, 2 bug fixes, and 1 documentation improvement.

🆕 Features

Allocate more GPU memory in GPU environments

Previously, the run scheduler was allocating 16GB of VRAM for GPU runs. Now, it allocates 24GB.

Users can now fine-tune significantly larger models and use larger batch sizes.

Add WiSE-FT to CLIP finetuning (#571)

WiSE-FT is a recent development that has proven to be an effective way to fine-tune
models with a strong zero-shot capability, such as CLIP. We have added it to Finetuner
along with documentation on its use.

Finetuner allows you to apply WiSE-FT easily using WiSEFTCallback. Finetuner will
trigger the callback when fine-tuning job finished and merge the weights between the
pre-trained model and the fine-tuned model:

from finetuner.callbacks import WiSEFTCallback

run = finetuner.fit(
    model='ViT-B-32#openai',
    ...,
    loss='CLIPLoss',
    callbacks=[WiSEFTCallback(alpha=0.5)],
)

See the documentation for advice on how to set alpha.

🐞 Bug Fixes

Fix Image Normalization for CLIP Models (#569)

  • Finetuner's image processing was not identical to that used by OpenAI for training CLIP, potentially leading to inconsistent results.
  • The new version fixes the bug and matches OpenAI's preprocessing.

Add open_clip to FinetunerExecutor requirements

The previous version of FinetunerExecutor failed to include the open_clip package in its requirements, forcing users to add it
manually to their executors. This has now been repaired.

📗 Documentation Improvements

Add callbacks documentation (#564)

There is now full documentation for using callbacks with the Finetuner.

🤟 Contributors

We would like to thank all contributors to this release:

v0.6.2

29 Sep 14:19
4fa84d8
Compare
Choose a tag to compare

Release Note

Finetuner makes neural network fine-tuning easier and faster by streamlining the workflow and handling all the complexity
and infrastructure requirements in the cloud. With Finetuner, one can easily enhance the performance of pre-trained models and make them production-ready without expensive hardware.

What's in this Release?

This release covers Finetuner version 0.6.2, including dependencies finetuner-api 0.4.1 and finetuner-core 0.10.2.

It contains 3 new features and 1 bug fix.

🆕 Features

Finetuner can now produce PyTorch models

Previously, Finetuner only produced ONNX models. Users can now choose between ONNX and
PyTorch models.

⚠️ PyTorch is now the default format for Finetuner output.

To select ONNX you must add the
to_onnx flag to calls to finetuner.fit():

run = finetuner.fit(
    ...,
    to_onnx=True,
)

You must also add the flag to calls to finetuner.get_model() to use an ONNX model directly with
DocArray:

model = finetuner.get_model(..., is_onnx=True)

To use an ONNX model inside a Jina Flow:

f = Flow().add(uses='jinahub+docker://FinetunerExecutor/v0.10.2', uses_with={'is_onnx': True})

Resubmit jobs automatically

Previously, when submitting a request for Finetuner to use cloud computing resources, if
the request failed, the job would fail and the user would have to resubmit it. Now, the
job will be resubmitted automatically up to five times, before failing completely.

Concise and more readable log messages

We have improved the logging in Finetuner to provide fewer and more readable messages for users.

🐞 Bug Fixes

Require ONNX runtime version > 1.11.1

  • This bug was causing version incompatibility errors for users of Python 3.10.
  • The new version fixes the bug and makes Finetuner fully compatible with the latest Python releases.

🤟 Contributors

We would like to thank all contributors to this release:

v0.6.1

27 Sep 15:41
237034e
Compare
Choose a tag to compare

[0.6.1] - 2022-09-27

Added

  • Add finetuner_version equal to the stubs version in the create run request. (#552)

Removed

Changed

  • Bump hubble client version. (#546)

Fixed

  • Preserve request headers in redirects to the same domain. (#552)

Docs

  • Improve example and install documentation. (#534)

  • Update finetuner executor version in docs. (#543)

v0.6.0

09 Sep 14:26
8829ec7
Compare
Choose a tag to compare

[0.6.0] - 2022-09-09

Added

  • Add get_model and encode method to encode docarray. (#522)

  • Add connect function to package (#532)

Removed

Changed

  • Incorporate commons and stubs to use shared components. (#522)

  • Improve usability of stream_logs. (#522)

  • Improve describe_models with open-clip models. (#528)

  • Use stream logging in the README example (#532)

Fixed

  • Print logs before run status is STARTED. (#531)

Docs

  • Add inference session in examples. (#529)

v0.5.2

31 Aug 13:04
e324312
Compare
Choose a tag to compare

[0.5.2] - 2022-08-31

Added

  • Enable wandb callback. (#494)

  • Support log streaming in finetuner client. (#504)

  • Support optimizer and miner options #517

Removed

Changed

  • Mark fit as login required. (#494)

Fixed

  • Replace the artifact name from dot to dash. (#519)

Docs

  • Fix google analytics Id for docs. (#499)

  • Update sphinx-markdown-table to v0.0.16 to get this fix (#499)

  • Place install instructions in the documentation more prominent (#518)

v0.5.1

15 Jul 12:53
1710dfa
Compare
Choose a tag to compare

[0.5.1] - 2022-07-15

Added

  • Add artifact id and token interface to improve usability. (#485)

Removed

Changed

  • save_artifact should show progress while downloading. (#483)

  • Give more flexibility on dependency versions. (#483)

  • Bump jina-hubble-sdk to 0.8.1. (#488)

  • Improve integration section in documentation. (#492)

  • Bump docarray to 0.13.31. (#492)

Fixed

  • Use uri to represent image content in documentation creating training data code snippet. (#484)

  • Remove out-dated CLIP-specific documentation. (#491)