[Development] MXNet 2.0 Update #18931

szha · 2020-08-15T00:51:07Z

Overview

As MXNet development approaches its 2.0 major milestone, we would like to update our community on roadmap status, and highlight new and upcoming features.

Motivation

The deep learning community has largely evolved independently from the data science and machine learning(ML) user base in NumPy. While most deep learning frameworks now implement NumPy-like math and array libraries, they differ in the definition of the APIs which creates confusion and a steeper learning curve of deep learning for ML practitioners and data scientists. This creates a barrier not only in the skillsets of the two different communities, but also hinders the knowledge sharing and code interoperability. MXNet 2.0 seeks to unify the deep learning and machine learning ecosystems.

What's new in version 2.0?

MXNet 2.0 is a major version upgrade of MXNet that provides NumPy-like programming interface, and is integrated with the new, easy-to-use Gluon 2.0 interface. Under the hood, we provide an enhanced DL implementation in NumPy. As a result, NumPy users can easily adopt MXNet. Version 2.0 incorporated accumulative learnings from MXNet 1.x and focuses on usability, extensibility, and developer experiences.

What's coming next?

We plan to make a series of beta releases of MXNet 2.0 in lockstep with downstream projects migration schedule. The first release is tracked in #19139. Also, subscribe to [email protected] for additional announcements.

How do I get started?

As a developer of MXNet, you can check out our main 2.0 branch. MXNet 2.0 nightly builds are available for download.

How can I help?

There are many ways you can contribute:

By submitting bug reports, you can help us identify issues and fix them.
If there are issues you would like to help with, let us know in the issue comments and one of the committers will help provide suggestions and pointers.
If you have a project that you would like to build on top of MXNet 2.0, post an RFC and let the MXNet developers know.
Looking for ideas to get started with developing MXNet? Check out the good-first-issues labels for Python developers and C++ developers

Highlights

Below are the highlights of new features that are available now in the MXNet 2.0 nightly build.

NumPy-compatible Array and Math Library

NumPy has long been established as the standard array and math library in Python and the MXNet community recognizes significant benefits in bridging the existing NumPy machine learning community and the growing deep learning community. In #14253, the MXNet community reached consensus on moving towards a NumPy-compatible programming experience, and committed to a major effort on providing NumPy compatible array library and operators.

To see what the new programming experience is like, check out Dive into Deep Learning book, the most comprehensive interactive deep learning book with code+math+forum. The latest version has an MXNet implementation with the new MXNet np, the NumPy-compatible math and array interface.

Gluon 2.0

Since the introduction of the Gluon API in MXNet 1.x, it has superseded other MXNet API for model development such as symbolic, module, and model APIs. Conceptually, Gluon was the first attempt in the deep learning community to unify the flexibility of imperative programming with the performance benefits of symbolic programming, through just-in-time compilation.

In Gluon 2.0, we are extending support to MXNet np with simplified interface and new functionalities:

Simplified hybridization with deferred compute and tracing: Deferred compute allows the imperative execution to be used for graph construction, which allows us to unify the historic divergence of NDArray and Symbol. Hybridization now works in simplified hybrid forward interface; users only need to specify the computation through imperative programming. Hybridization also works through tracing.
Data 2.0: The new design for data loading in Gluon allows hybridizing and deploy data processing pipeline in the same way as model hybridization. The new C++ data loader improves data loading efficiency on CIFAR 10 by 50%.
Distributed 2.0: The new distributed-training design in Gluon 2.0 provides a unified distributed data parallel interface across native Parameter Server, BytePS, and Horovod, and is extensible for supporting custom distributed training libraries.
Gluon Probability: parameterizable probability distributions and sampling functions to facilitate more areas of research such as Baysian methods and AutoML.
Gluon Metrics and Optimizers: refactored with MXNet np interface and addressed legacy issues.

3rdparty Plugin Support

Extensibility is important for both academia and industry users who want to develop new, and customized capabilities. In MXNet 2.0, we added the following support for plugging in 3rdparty functionality at runtime.

C++ custom operators: Enable operators to be implemented in separate libraries and loaded at runtime without re-compiling MXNet and maintaining MXNet fork.
Custom subgraph property for 3rdparty acceleration libraries: enable dispatching subgraphs to 3rdparty acceleration libraries that are plugged in at runtime.
Custom graph passes for 3rd party acceleration libraries: enable custom graph modification in C++ to enable fusing params, replacing operators, and other custom optimizations.

Developer Experiences

In MXNet 2.0, we are making development process more efficient in MXNet.

New CMake build system: improved CMake build system for compiling the most performant MXNet backend library based on the available environment, as well as cross-compilation support.
Memory profiler: the goal is to provide visibility and insight into the memory consumption of the MXNet backend.
Pythonic exception type in backend: updated error reporting in MXNet backend that allows directly defining exception types with Python exception classes to enable Pythonic error handling.

Documentation for Developers

We are improving the documentation for MXNet and deep learning developers.

CWiki for developers: reorganized and improved the development section in MXNet CWiki.
Developer Guide: new developer guides on how to develop and improve deep learning application with MXNet.

Ecosystem: GluonNLP NumPy

We are refactoring GluonNLP with NumPy interface for the next generation of GluonNLP. The initial version is available on dmlc/gluon-nlp master branch:

NLP models with NumPy: we support a large number of state-of-the-art backbone networks in GluonNLP including
BERT, ALBERT, ELECTRA, MobileBERT, RoBERTa, XLMR, Transformer, Transformer XL
New Data Processing CLI: Consolidated data processing scripts into one CLI.

API Deprecation

As described in #17676, we are taking this major version upgrade as an opportunity to address the legacy issues in MXNet 1.x. Most notably, we are deprecating the following API:

Model, Module, Symbol: we are deprecating the legacy modeling and graph construction API in favor of automated graph tracing through deferred compute and Gluon.
mx.rnn: we are deprecating the symbolic RNN API in favor of the Gluon RNN API.
NDArray: we are deprecating NDArray and the old nd API in favor of the NumPy-compatible np and npx. The NDArray operators will be provided as an optional feature potentially in a separate repo. This will enable existing users who rely on MXNet 1.x for inference to have an easy upgrade path as old models will continue to work.
Caffe converter and Torch plugin: both extensions see low usage nowadays. We are extending support in DLPack to better support interoperability with PyTorch and Tensorflow instead.

Related Projects

Below is a list of project trackers for MXNet 2.0.

MXNet 2.0: the tracking project for MXNet 2.0.
MXNet NumPy API: the goal is to provide full feature coverage for NumPy in MXNet with auto-differentiation and GPU support.
MXNet Website 2.0: the revamped MXNet official website for better browsing experiences.
np interface bug fixes: the goal is to address technical debts and performance issues in np and npx operators.
CI & CD ops and developer experience improvement: reduce the development overhead by upgrading the CI/CD infrastructure and toolchain to improve the stability of CI/CD and developer efficiency.
MXNet 2.0 JVM language binding redesign ([RFC] MXNet 2.0 JVM Language development #17783)

@apache/mxnet-committers feel free to comment or directly edit this post for updates in additional areas.

szha · 2020-08-19T18:44:27Z

@qqaatw this thread is for 2.0 discussion on the technical roadmap. Community engagement is not part of this topic. As mentioned above, I shared the same goal and sentiment as you. And I also treat this issue seriously and by no means am dismissing you. But, unless the community can have a proper discussion on this, others have no way of working together with you on this.

@StevenJokes here is how we intended the discussion channels to be used: https://mxnet.apache.org/versions/1.6/community/contribute.html
And here's the code of conduct of this project: http://www.mxnet.incubator.apache.org/foundation/policies/conduct.html
And very much contrary to what you said, I've been on this project for the past three years and I care deeply about having a healthy community, and opinions on making it better like yours.

@qqaatw @StevenJokes I started a thread for this on your behalf here: #18963, hope that's ok.

lanking520 · 2020-08-19T18:46:16Z

@StevenJokes Thanks for your effort working on MXNet project and D2L part. It's really an awesome work you have done. We should definitely raise our attention on active contributors. Before we drive this thread into cat-fight, I would recommend you open another issue like Sheng mentioned in the issue. We can discuss over there. This one is just the roadmap for 2.0 features and updates. If you have concerns to MXNet community, please feel free to reach out to [email protected]. For the no-official reply part, we all just open-source and voluntarily contribute to the community, so there is absolute no gauranteed 3-day respond mechanism. We tried our best to maintain this and we will keep doing that.

If possible, I would recommend to withdraw the extreme comment to this thread and put it to the community conversation channel. Thanks for your support!

ehsanmok · 2020-08-19T18:51:53Z

@StevenJokes Thanks for your comments! Couple of things:

Your comments are violating the Code of Conduct. Rude and harassing comments, no matter how useful they are, will not be tolerated and listened to. We are a big community here, please be respectful.
Pretty much all of your peers are aware of the forum issues so there is dedicated D2L forum for your specific issues.

pengzhao-intel · 2020-08-21T02:13:39Z

What's plan of the validation and user tutorial?
Does the example still work?

szha · 2020-08-21T02:55:33Z

What's plan of the validation and user tutorial?

I think the tutorials (in the form of markdown, shown on the website) are currently validated in the website build pipeline. Examples will be maintained in apache/incubator-mxnet-examples with CI enforcement.

hmf · 2020-09-04T14:43:02Z

@szha Just a small comment and question on the issue of the tutorials asked by @pengzhao-intel. I am starting to look at this framework and find that I need to hunt down the dependencies so that the project examples work.

Case in point the MNIST example has a link to the setup. However, we still need information on the dependencies for the ai.djl.basicdataset.Mnist import. That important information is found here. I have not found a link to this page from the documentation.

I would also like to know how are you validate the tutorials (I haven't checked so they may be in perfect working order).

I am interested in this because I have other projects in Scala (example) wherein I generate a site. In one case I use for example Laika to process the Markdown. To ensure the code is working I use either Tut or Mdoc, which supersedes it to preprocess the Markdown sources. Note that all the code are in the Markdown files and are checked when they are compiled and execute (output can be placed in the source Markdown). Thus code and documentation are always in sink. Is there a way to do this in Java?

EDIT: just realized that the checks above won't catch any missing references to dependencies.

szha · 2020-09-04T17:54:42Z

@hmf thanks for pointing out the issue on dependencies.

I would also like to know how are you validate the tutorials (I haven't checked so they may be in perfect working order).

For tutorials on mxnet.apache.org, they are jupyter notebooks in markdown format (processed by notedown) that are executed at the time of building the documentation. for the examples folder, we plan to move them to the new repo gradually, guarded by regular CI checks to make sure they are working.

Since the example issues you pointed out belong to DJL, @lanking520 will likely be of best help in resolving them.

hmf · 2020-09-05T09:24:56Z

@szha Thanks for the answers regarding the checks. Thanks for link also.

Since the example issues you pointed out belong to DJL, @lanking520 will likely be of best help in resolving them.

Oops, my apologies. Scratch that.

jens-maus · 2021-12-09T17:09:14Z

@szha Sorry for being late, but can someone please comment on R-package support in MXNet 2.0? I looked over the master branch and the 2.0.0.beta0.rc0 tagged pre-release and the whole R-package directory is missing? Is R support supposed to be dropped with the upcoming MXNet 2.0 version?

szha · 2021-12-10T17:37:13Z

@jens-maus while we don't explicitly plan on dropping, for R-package maintenance we will need community members who maintain R package to come up with 2.0 support plan. At the moment this hasn't happened yet.

szha added the RFC Post requesting for comments label Aug 15, 2020

szha pinned this issue Aug 15, 2020

This comment has been minimized.

Sign in to view

szha mentioned this issue Aug 19, 2020

Discussion on Community Management #18963

Open

This comment has been minimized.

Sign in to view

szha mentioned this issue Sep 14, 2020

[Development] v2.0.0 beta 0 release #19139

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Development] MXNet 2.0 Update #18931

[Development] MXNet 2.0 Update #18931

szha commented Aug 15, 2020 •

edited

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

szha commented Aug 19, 2020

This comment has been minimized.

lanking520 commented Aug 19, 2020

ehsanmok commented Aug 19, 2020

pengzhao-intel commented Aug 21, 2020

szha commented Aug 21, 2020 •

edited

hmf commented Sep 4, 2020 •

edited

szha commented Sep 4, 2020

hmf commented Sep 5, 2020

jens-maus commented Dec 9, 2021

szha commented Dec 10, 2021

[Development] MXNet 2.0 Update #18931

[Development] MXNet 2.0 Update #18931

Comments

szha commented Aug 15, 2020 • edited

Overview

Motivation

What's new in version 2.0?

What's coming next?

How do I get started?

How can I help?

Highlights

NumPy-compatible Array and Math Library

Gluon 2.0

3rdparty Plugin Support

Developer Experiences

Documentation for Developers

Ecosystem: GluonNLP NumPy

API Deprecation

Related Projects

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

szha commented Aug 19, 2020

This comment has been minimized.

lanking520 commented Aug 19, 2020

ehsanmok commented Aug 19, 2020

pengzhao-intel commented Aug 21, 2020

szha commented Aug 21, 2020 • edited

hmf commented Sep 4, 2020 • edited

szha commented Sep 4, 2020

hmf commented Sep 5, 2020

jens-maus commented Dec 9, 2021

szha commented Dec 10, 2021

szha commented Aug 15, 2020 •

edited

szha commented Aug 21, 2020 •

edited

hmf commented Sep 4, 2020 •

edited