Skip to content

Releases: laekov/fastmoe

v1.1.0

08 Oct 02:54
5b60285
Compare
Choose a tag to compare

Performance

  • Smart schedule of FasterMoE is updated with correct stream management, and becomes faster.

Testing

  • All unit tests are checked and they run correctly now.

Adaption

  • Megatron-LM 3.2 supported.

Documentation

v1.0.1

30 May 06:16
c9ccc0e
Compare
Choose a tag to compare

Compatibility

  • PyTorch 2.0 supported.
  • Megatron-LM 2.5 supported.

Documentation

Performance related

  • Generalize FasterMoE's schedule to n_expert > 1 and more bug fixes.
  • Synchronization reduction thanks to @Fragile-azalea

v1.0.0

02 Apr 03:11
59bcec8
Compare
Choose a tag to compare

FasterMoE

  • The new performance boosting features in the PPoPP'22 paper FasterMoE, detailed in the document.
    • Expert Shadowing.
    • Smart Scheduling.
    • Topology-aware gate.

Bug fixes

  • Transformer-XL examples.
  • Compatibility to PyTorch versions.
  • Megatron-LM documents.
  • GShardGate.

v0.3.0

08 Nov 09:37
acf8bec
Compare
Choose a tag to compare

FMoE core

  • Previous mp_group is renamed to slice_group, indicating that all workers in the group receive the same input batch, and process a slice of the input. mp_group will be deprecated in our next release.
  • ROCm supported.
  • FMoELinear is moved to a stand-alone file.

Groupped data parallel

  • Support any group name by their relative tag name.

Load balancing

  • A brand new balancing strategy - SWIPE. Contributed by authors of a (currently unpublished) paper.
  • A property has_loss is added to each gate, in order to identify whether balance loss should be collected.

Megatron-LM support

  • Experts are partitioned by tensor model parallelism in mp_group, instead of expert parallelism.
  • Support arbitrary customized gate in MegatronMLP.
  • Move the patches to a stand-alone file.

Tests

  • Move util functions into test_ddp.py.

v0.2.1

23 Aug 08:28
d2392de
Compare
Choose a tag to compare

Load balancing

  • Fix gradient for balance loss.

Misc

  • Typos.
  • Update benchmark interface.
  • Remove some redundant code for performance improvement.
  • Enable USE_NCCL by default.
  • Compatibility for PyTorch <1.8.0 and >=1.8.0.

Megatron adaption

  • Patch for numerical correctness of gradient clipping.
  • Support to pipeline parallelism.

v0.2.0

31 May 08:27
c96f886
Compare
Choose a tag to compare

Load balancing

  • A brand new gate module with capacity-related utilities.
  • GShard's and Switch Transformer's balance strategies are implemented as integrated gates.
  • Balance loss is enabled.
  • Balance monitor is provided.

Checkpointing

  • MoE models can be loaded and saved by fmoe's checkpointing module.

Performance

  • FP16 training performance is improved.

Misc

  • CUDA code directory is reconstructed.
  • More tests are added.

v0.1.2

13 Mar 10:17
Compare
Choose a tag to compare

Compilation

  • Remove dependency on the CUDA examples repository.

Distributed

  • Fix a bug related to PyTorch v1.8.0. FastMoE can now operate on multiple GPUs
    on multiple nodes with PyTorch v1.8.0.

Misc

  • Fix tons of typos.
  • Format the code.

v0.1.1

01 Mar 06:51
Compare
Choose a tag to compare

First public release with basic distributed MoE functions, tested with Megatron-LM and Transformer-XL.