Skip to content

Accelerating convolution using numba, cupy and xnor in python

Notifications You must be signed in to change notification settings

anilsathyan7/ConvAcc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

ConvAcc: Accelerating convolution using numba, cupy and xnor in python.

Numba is a just-in-time, type-specializing, function compiler for accelerating numerically-focused Python. It can be typically enabled by applying a decorator to a python function and can compile your code for CPU or GPU. It uses LLVM to compile python functions just-in-time, under the hood. Cupy is a numpy-like library accelerated with CUDA. It's syntax is very similar to numpy and in most cases you can directly replace the numpy import with cupy. It allows us to write custom kernels in CUDA and can be easily used with numba CUDA functions.The deep learning library chainer uses cupy in it's backend.

In XNOR convolution, both the filters and the input to convolutional layers are binary. Now, by approximating the convolution operations with XNOR and bitcounting operations, we can gain massive speed-up and memory savings. Even though this seems straight-forward(theoretically),in practice an efficient implementation of bitpacking, approximation techniques and training mechanisms are required to acheive sufficient accuracy and speed on conventional hardware platforms.The AI startup-company XNOR.AI(recently acquired by apple) actually made these techniques popular in their 2016 paper - XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks.

In the IPython Notebook, we try to implement a basic convolution using python and subsequently improve it's speed using numba and other optimization techniques. Finally,we compare and benchmark the various techniques in python for CPU and GPU in terms of execution speed. The notebook can be directly run on google colaboratory ,using a GPU runtime without any additional installaton of libraries or packages.

Note: The benchmarks heavily depends on the hardware and library versions used for experimentation.

Dependencies

  • Python, Numba
  • Cupy, Numpy
  • CUDA

References