Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[Development] MXNet 2.0 Update #18931

Open
szha opened this issue Aug 15, 2020 · 19 comments
Open

[Development] MXNet 2.0 Update #18931

szha opened this issue Aug 15, 2020 · 19 comments
Labels
RFC Post requesting for comments

Comments

@szha
Copy link
Member

szha commented Aug 15, 2020

Overview

As MXNet development approaches its 2.0 major milestone, we would like to update our community on roadmap status, and highlight new and upcoming features.

Motivation

The deep learning community has largely evolved independently from the data science and machine learning(ML) user base in NumPy. While most deep learning frameworks now implement NumPy-like math and array libraries, they differ in the definition of the APIs which creates confusion and a steeper learning curve of deep learning for ML practitioners and data scientists. This creates a barrier not only in the skillsets of the two different communities, but also hinders the knowledge sharing and code interoperability. MXNet 2.0 seeks to unify the deep learning and machine learning ecosystems.

What's new in version 2.0?

MXNet 2.0 is a major version upgrade of MXNet that provides NumPy-like programming interface, and is integrated with the new, easy-to-use Gluon 2.0 interface. Under the hood, we provide an enhanced DL implementation in NumPy. As a result, NumPy users can easily adopt MXNet. Version 2.0 incorporated accumulative learnings from MXNet 1.x and focuses on usability, extensibility, and developer experiences.

What's coming next?

We plan to make a series of beta releases of MXNet 2.0 in lockstep with downstream projects migration schedule. The first release is tracked in #19139. Also, subscribe to [email protected] for additional announcements.

How do I get started?

As a developer of MXNet, you can check out our main 2.0 branch. MXNet 2.0 nightly builds are available for download.

How can I help?

There are many ways you can contribute:

  • By submitting bug reports, you can help us identify issues and fix them.
  • If there are issues you would like to help with, let us know in the issue comments and one of the committers will help provide suggestions and pointers.
  • If you have a project that you would like to build on top of MXNet 2.0, post an RFC and let the MXNet developers know.
  • Looking for ideas to get started with developing MXNet? Check out the good-first-issues labels for Python developers and C++ developers

Highlights

Below are the highlights of new features that are available now in the MXNet 2.0 nightly build.

NumPy-compatible Array and Math Library

NumPy has long been established as the standard array and math library in Python and the MXNet community recognizes significant benefits in bridging the existing NumPy machine learning community and the growing deep learning community. In #14253, the MXNet community reached consensus on moving towards a NumPy-compatible programming experience, and committed to a major effort on providing NumPy compatible array library and operators.

To see what the new programming experience is like, check out Dive into Deep Learning book, the most comprehensive interactive deep learning book with code+math+forum. The latest version has an MXNet implementation with the new MXNet np, the NumPy-compatible math and array interface.

Gluon 2.0

Since the introduction of the Gluon API in MXNet 1.x, it has superseded other MXNet API for model development such as symbolic, module, and model APIs. Conceptually, Gluon was the first attempt in the deep learning community to unify the flexibility of imperative programming with the performance benefits of symbolic programming, through just-in-time compilation.

In Gluon 2.0, we are extending support to MXNet np with simplified interface and new functionalities:

  • Simplified hybridization with deferred compute and tracing: Deferred compute allows the imperative execution to be used for graph construction, which allows us to unify the historic divergence of NDArray and Symbol. Hybridization now works in simplified hybrid forward interface; users only need to specify the computation through imperative programming. Hybridization also works through tracing.
  • Data 2.0: The new design for data loading in Gluon allows hybridizing and deploy data processing pipeline in the same way as model hybridization. The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
  • Distributed 2.0: The new distributed-training design in Gluon 2.0 provides a unified distributed data parallel interface across native Parameter Server, BytePS, and Horovod, and is extensible for supporting custom distributed training libraries.
  • Gluon Probability: parameterizable probability distributions and sampling functions to facilitate more areas of research such as Baysian methods and AutoML.
  • Gluon Metrics and Optimizers: refactored with MXNet np interface and addressed legacy issues.

3rdparty Plugin Support

Extensibility is important for both academia and industry users who want to develop new, and customized capabilities. In MXNet 2.0, we added the following support for plugging in 3rdparty functionality at runtime.

Developer Experiences

In MXNet 2.0, we are making development process more efficient in MXNet.

  • New CMake build system: improved CMake build system for compiling the most performant MXNet backend library based on the available environment, as well as cross-compilation support.
  • Memory profiler: the goal is to provide visibility and insight into the memory consumption of the MXNet backend.
  • Pythonic exception type in backend: updated error reporting in MXNet backend that allows directly defining exception types with Python exception classes to enable Pythonic error handling.

Documentation for Developers

We are improving the documentation for MXNet and deep learning developers.

  • CWiki for developers: reorganized and improved the development section in MXNet CWiki.
  • Developer Guide: new developer guides on how to develop and improve deep learning application with MXNet.

Ecosystem: GluonNLP NumPy

We are refactoring GluonNLP with NumPy interface for the next generation of GluonNLP. The initial version is available on dmlc/gluon-nlp master branch:

  • NLP models with NumPy: we support a large number of state-of-the-art backbone networks in GluonNLP including
    BERT, ALBERT, ELECTRA, MobileBERT, RoBERTa, XLMR, Transformer, Transformer XL
  • New Data Processing CLI: Consolidated data processing scripts into one CLI.

API Deprecation

As described in #17676, we are taking this major version upgrade as an opportunity to address the legacy issues in MXNet 1.x. Most notably, we are deprecating the following API:

  • Model, Module, Symbol: we are deprecating the legacy modeling and graph construction API in favor of automated graph tracing through deferred compute and Gluon.
  • mx.rnn: we are deprecating the symbolic RNN API in favor of the Gluon RNN API.
  • NDArray: we are deprecating NDArray and the old nd API in favor of the NumPy-compatible np and npx. The NDArray operators will be provided as an optional feature potentially in a separate repo. This will enable existing users who rely on MXNet 1.x for inference to have an easy upgrade path as old models will continue to work.
  • Caffe converter and Torch plugin: both extensions see low usage nowadays. We are extending support in DLPack to better support interoperability with PyTorch and Tensorflow instead.

Related Projects

Below is a list of project trackers for MXNet 2.0.

@apache/mxnet-committers feel free to comment or directly edit this post for updates in additional areas.

@szha szha added the RFC Post requesting for comments label Aug 15, 2020
@szha szha pinned this issue Aug 15, 2020
@szha

This comment has been minimized.

@szha

This comment has been minimized.

@qqaatw

This comment has been minimized.

@szha

This comment has been minimized.

@szha

This comment has been minimized.

@qqaatw

This comment has been minimized.

@analog-cbarber

This comment has been minimized.

@sxjscience

This comment has been minimized.

@szha
Copy link
Member Author

szha commented Aug 19, 2020

@qqaatw this thread is for 2.0 discussion on the technical roadmap. Community engagement is not part of this topic. As mentioned above, I shared the same goal and sentiment as you. And I also treat this issue seriously and by no means am dismissing you. But, unless the community can have a proper discussion on this, others have no way of working together with you on this.

@StevenJokes here is how we intended the discussion channels to be used: https://mxnet.apache.org/versions/1.6/community/contribute.html
And here's the code of conduct of this project: http://www.mxnet.incubator.apache.org/foundation/policies/conduct.html
And very much contrary to what you said, I've been on this project for the past three years and I care deeply about having a healthy community, and opinions on making it better like yours.

@qqaatw @StevenJokes I started a thread for this on your behalf here: #18963, hope that's ok.

@sxjscience

This comment has been minimized.

@lanking520
Copy link
Member

@StevenJokes Thanks for your effort working on MXNet project and D2L part. It's really an awesome work you have done. We should definitely raise our attention on active contributors. Before we drive this thread into cat-fight, I would recommend you open another issue like Sheng mentioned in the issue. We can discuss over there. This one is just the roadmap for 2.0 features and updates. If you have concerns to MXNet community, please feel free to reach out to [email protected]. For the no-official reply part, we all just open-source and voluntarily contribute to the community, so there is absolute no gauranteed 3-day respond mechanism. We tried our best to maintain this and we will keep doing that.

If possible, I would recommend to withdraw the extreme comment to this thread and put it to the community conversation channel. Thanks for your support!

@ehsanmok
Copy link
Contributor

@StevenJokes Thanks for your comments! Couple of things:

  • Your comments are violating the Code of Conduct. Rude and harassing comments, no matter how useful they are, will not be tolerated and listened to. We are a big community here, please be respectful.
  • Pretty much all of your peers are aware of the forum issues so there is dedicated D2L forum for your specific issues.

@pengzhao-intel
Copy link
Contributor

What's plan of the validation and user tutorial?
Does the example still work?

@szha
Copy link
Member Author

szha commented Aug 21, 2020

What's plan of the validation and user tutorial?

I think the tutorials (in the form of markdown, shown on the website) are currently validated in the website build pipeline. Examples will be maintained in apache/incubator-mxnet-examples with CI enforcement.

@hmf
Copy link

hmf commented Sep 4, 2020

@szha Just a small comment and question on the issue of the tutorials asked by @pengzhao-intel. I am starting to look at this framework and find that I need to hunt down the dependencies so that the project examples work.

Case in point the MNIST example has a link to the setup. However, we still need information on the dependencies for the ai.djl.basicdataset.Mnist import. That important information is found here. I have not found a link to this page from the documentation.

I would also like to know how are you validate the tutorials (I haven't checked so they may be in perfect working order).

I am interested in this because I have other projects in Scala (example) wherein I generate a site. In one case I use for example Laika to process the Markdown. To ensure the code is working I use either Tut or Mdoc, which supersedes it to preprocess the Markdown sources. Note that all the code are in the Markdown files and are checked when they are compiled and execute (output can be placed in the source Markdown). Thus code and documentation are always in sink. Is there a way to do this in Java?

EDIT: just realized that the checks above won't catch any missing references to dependencies.

@szha
Copy link
Member Author

szha commented Sep 4, 2020

@hmf thanks for pointing out the issue on dependencies.

I would also like to know how are you validate the tutorials (I haven't checked so they may be in perfect working order).

For tutorials on mxnet.apache.org, they are jupyter notebooks in markdown format (processed by notedown) that are executed at the time of building the documentation. for the examples folder, we plan to move them to the new repo gradually, guarded by regular CI checks to make sure they are working.

Since the example issues you pointed out belong to DJL, @lanking520 will likely be of best help in resolving them.

@hmf
Copy link

hmf commented Sep 5, 2020

@szha Thanks for the answers regarding the checks. Thanks for link also.

Since the example issues you pointed out belong to DJL, @lanking520 will likely be of best help in resolving them.

Oops, my apologies. Scratch that.

@jens-maus
Copy link

@szha Sorry for being late, but can someone please comment on R-package support in MXNet 2.0? I looked over the master branch and the 2.0.0.beta0.rc0 tagged pre-release and the whole R-package directory is missing? Is R support supposed to be dropped with the upcoming MXNet 2.0 version?

@szha
Copy link
Member Author

szha commented Dec 10, 2021

@jens-maus while we don't explicitly plan on dropping, for R-package maintenance we will need community members who maintain R package to come up with 2.0 support plan. At the moment this hasn't happened yet.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
RFC Post requesting for comments
Projects
None yet
Development

No branches or pull requests

10 participants