Skip to content
This repository has been archived by the owner on May 1, 2023. It is now read-only.

Potential Bottleneck found while pruning a Model by means of Distiller Framework #548

Open
franec94 opened this issue Dec 19, 2020 · 0 comments

Comments

@franec94
Copy link

franec94 commented Dec 19, 2020

Hi to everyone involved into development of such an interesting DNNs compression framework such as Distiller.

Currently I've branched the root project to develop by myself, as a work for my current thesys at Polytechnic of Turin which concerns with attempting to find out whether Siren-based neural networks are suitable for dealing with net compression while at the same time still keeping and holding good performance when adopted to predict as output a implicit representation of input image to be compressed. Here you can find details and rationale behind Siren Arch dev, while here you can find my current project. More precisely my active branch is named siren-support, which can be seen, investigated and explored from here.

So, while developing a variant of image_classifier.py that can be seen from here that is more adequate to work with regression tasks instead of classification - which I've named renamed and updatef for my own purposes into siren_image_regressor.py, visible instead from here - I've found some troubles with source code that is responsible for low performance and low GPU usage. It looks like as I've a source code that does not allow me to correctly takes advantage of GPU computational power since GPUs are well-known for helping with parallel training since them are planty of kernels that speed up training phase.

What I mean is that, due to the intrinsic nature of Siren arch models, we have to let them be trained for even more than 1e+5 epochs, reaching even a number of epochs that ranges within 2e+5 up to 5e+5 epochs. So I've to work out the code to reduce the logging with a proper logging frequency, however when attempting to pruning a model as an instance by means of AGP technique and at the same time I'm monitoring the GPU volatile usage I found out that I'm spanning between 20~27 % of usage, against a 68~80 % of usage without distiller framwork when adopting code from the developers of Siren Official Project via Pytorch framework.

However the interval 20~27 % has been reached by me adapting the code where I was able to detect potential bottlenecks. For example the SparsityAccuracyTracker object, (distiller root branch implementation here) that becomes in my changed code SparsityMSETracker (siren-support branch implementation here)intially perfomed a sort operation on overall data collected while metrices are calculated and stored, which leads drammatically to waste of cpu computational time and does not left room for gpu usage since process is focused in sorting data. For such a reason I've decided to recode SparsityMSETracker class behavior to become more a Top-k tracker so that I was able to preserve a bit of gpu usage, otherwise GPU usage was reduced even to 0% of volatile usage since, with a number of epochs huge as necessary for training siren based net, the standard behaviour of a class that looks like SparsityAccuracyTracker will lead to employing computational resources to sort data without any suitable sense or reason.

  • So, it is more reasonable to at least warning users that SparsityAccuracyTracker might become a bottleneck when setting a huge amount of epochs for performing pruning ?
  • Where rationale behind such a question, as I've already told, is rising due to the fact that collecting into a plain list items, where these elements might be a ordered dictionary objects with details about current running, then sorting them will lead to waste CPU usage at run-time since the python based script will partly waste time ordering overall data recorded while pruning and training instead of just behaving as a Top-k tracker which for sure will lead to reduce memory usage and improve resources utilization at run-time.

That is my question.

Furthermore, I think that the overall ecosystem of calls to functions that belong to distiller seems to lead of waste ot time required by the model to be trained because we do not let the model to employing gpu in a manner that let the train phase be accomplished into a reasonable time interval without making such a time too long to justify distiller framework usage.

So, what I'm going to ask is the followin: are the call to the functions requested by distiller framework adequante at the right level so that them do not lead to waste of gpu time while training for pruning reason ?

Some machine and OS specification about my runnings:

  • CUDA version: 10.0
  • CUDNN version: 7603
  • Kernel: 4.15.0-121-generic
  • OS: Ubuntu 18.04.1 LTS
  • Python: 3.6.9 (default, Oct 8 2020, 12:12:24) [GCC 8.4.0]
  • GPU: GeForce GTX 1080 Ti
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant