Skip to content

Releases: ruanchaves/hashformers

Hashformers v2.0.0: Enhanced Compatibility with Broad Range of Transformer Models including Large Language Models (LLMs)

03 Jun 15:52
144b8f8
Compare
Choose a tag to compare

The new hashformers v2.0.0 release marks a significant upgrade to our hashtag segmentation library. Earlier versions were compatible only with BERT and GPT-2 models. Now, hashformers v2.0.0 allows you to segment hashtags with nearly any transformer model, including Large Language Models (LLMs) like Dolly, GPT-J, and Alpaca-LoRA. This will enable us to achieve unprecedented state-of-the-art results that surpass our earlier benchmarks.

Key improvements in this release include:

  • Expanded Support: Accommodates various Seq2Seq models, Masked Language Models, and autoregressive Transformer models available on the Hugging Face Model Hub. This includes but is not limited to FLAN-T5, DeBERTa, XLNet.

  • Greater Language Coverage: Thanks to the increase in supported models, we can now segment hashtags in a broader range of languages. This includes improved support for large models pretrained in low-resource languages.

  • Enhanced Documentation: We have updated our documentation and wiki, providing in-depth explanations of each library feature. This will enable users to customize their usage of our library more effectively according to their specific needs.

v1.3.0: Deprecating Features and Enhancing the Documentation

22 May 03:45
Compare
Choose a tag to compare

In this new version, we're refining our library by narrowing our focus onto the essential components. Here are the key updates in this release:

  • Deprecation of Experimental Features: To sharpen our focus on the core functionality, we've decided to discontinue support for the experimental features including the Word Segmenter Cascades and the Unigram word segmenter. This change will not impact any functions illustrated in our Google Colab tutorial.

  • Enhanced Code Documentation: We've upgraded our documentation with comprehensive docstrings to ensure the clarity and readability of the code. Given our small codebase, we have removed external documentation from the repository.

  • Disclaimer Notice Addition: A disclaimer notice has been included in the Google Colab notebook due to the discontinued support for mxnet-cu110. For future releases, we plan to eliminate this dependency to facilitate easier library updates.

v1.2.2: Word Segmenter Cascades, Unigram word segmenter

12 Feb 10:15
a49e144
Compare
Choose a tag to compare

Features:

  • Introduces word segmenter cascades that allow us to chain rerankers ( ad infinitum ).
  • Replaces ekphrasis by an unigram segmenter based on wordfreq. It can run on all languages supported by the wordfreq library.

Breaking changes:

  • WordSegmenter has been renamed to TransformerWordSegmenter.

v1.1.0: Bug fixes, unit tests, extra segmenters

06 Feb 23:12
2dc158d
Compare
Choose a tag to compare

Features added to prepare for integration with pysentimiento.

  • GPT-2 batch size bug fix
  • More word segmenters ( regex, ekphrasis )
  • A tweet segmenter
  • Unit tests

v1.0.0: Clean-up, tutorial, packaging

04 Feb 08:51
Compare
Choose a tag to compare

First release of the hashformers library.

  • General clean-up of the codebase.
  • Step-by-step tutorial for usage, evaluation, and speed optimization.
  • PyPI package.