You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
H2O Deep Learning triggers an internal limitation of H2O on the max. size of an object in the distributed K-V store (that is the core of H2O). This limit is 256MB, and once the DL model hits that size, this condition occurs. The reason is that the Deep Learning model is currently stored as one large piece, instead of splitting it up into partial pieces. Cutting it into one piece per hidden layer won't solve this issue either, so we would have to cut a single matrix into multiple pieces to address this issue, which is somewhat cumbersome to implement. That said, a model of that size is also going to take a long time to train.
Note: The memory limit has nothing to do with the number of rows of the training data (just the # columns, as that affects the first hidden layer matrix size), nor the RAM or max. allowed heap memory (that is checked separately). It also has nothing to do with the number of nodes, threads, etc. It's purely a function of the model complexity, see the next section.
What affects the model size?
It's mainly the number of total weights and biases, multiplied by an overhead factor of x1, x2 or x3, depending on whether momentum_start==0 && momentum_stable==0 (x1), momentum > 0 (x2) or adaptive learning rate (x3) is used. Then there's some small overhead for model metrics, statistics, counters, etc.
The total weights is directly given by the fully connected layers:
The number of input columns (after automatic one-hot encoding of categoricals)
The size of the hidden layers
The number of output neurons (#classes)
Failing example (~25M floats * 3 for ADADELTA > 256MB)
java.lang.IllegalArgumentException: Model is too large
For more information visit: http://jira.h2o.ai/browse/TN-5
at hex.deeplearning.DeepLearningModel.(DeepLearningModel.java:424)
at hex.deeplearning.DeepLearning$DeepLearningDriver.buildModel(DeepLearning.java:201)
at hex.deeplearning.DeepLearning$DeepLearningDriver.compute2(DeepLearning.java:171)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1005)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
barrier onExCompletion for hex.deeplearning.DeepLearning$DeepLearningDriver@5205f0fd
Solution
The current solution is to reduce the number of hidden neurons, or to reduce the number of (especially categorical) features.
JIRA Issue Migration Info
Jira Issue: TN-5
Assignee: Arno Candel
Reporter: Arno Candel
State: Closed
Relates to: #13925
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Problem
H2O Deep Learning triggers an internal limitation of H2O on the max. size of an object in the distributed K-V store (that is the core of H2O). This limit is 256MB, and once the DL model hits that size, this condition occurs. The reason is that the Deep Learning model is currently stored as one large piece, instead of splitting it up into partial pieces. Cutting it into one piece per hidden layer won't solve this issue either, so we would have to cut a single matrix into multiple pieces to address this issue, which is somewhat cumbersome to implement. That said, a model of that size is also going to take a long time to train.
Note: The memory limit has nothing to do with the number of rows of the training data (just the # columns, as that affects the first hidden layer matrix size), nor the RAM or max. allowed heap memory (that is checked separately). It also has nothing to do with the number of nodes, threads, etc. It's purely a function of the model complexity, see the next section.
What affects the model size?
It's mainly the number of total weights and biases, multiplied by an overhead factor of x1, x2 or x3, depending on whether momentum_start==0 && momentum_stable==0 (x1), momentum > 0 (x2) or adaptive learning rate (x3) is used. Then there's some small overhead for model metrics, statistics, counters, etc.
The total weights is directly given by the fully connected layers:
The number of input columns (after automatic one-hot encoding of categoricals)
The size of the hidden layers
The number of output neurons (#classes)
Failing example (~25M floats * 3 for ADADELTA > 256MB)
Working example (~25M floats * 1 without ADADELTA and no momentum < 256MB)
Output:
java.lang.IllegalArgumentException: Model is too large
For more information visit:
http://jira.h2o.ai/browse/TN-5
at hex.deeplearning.DeepLearningModel.(DeepLearningModel.java:424)
at hex.deeplearning.DeepLearning$DeepLearningDriver.buildModel(DeepLearning.java:201)
at hex.deeplearning.DeepLearning$DeepLearningDriver.compute2(DeepLearning.java:171)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1005)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:429)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
barrier onExCompletion for hex.deeplearning.DeepLearning$DeepLearningDriver@5205f0fd
Solution
The current solution is to reduce the number of hidden neurons, or to reduce the number of (especially categorical) features.
JIRA Issue Migration Info
Jira Issue: TN-5
Assignee: Arno Candel
Reporter: Arno Candel
State: Closed
Relates to: #13925
Beta Was this translation helpful? Give feedback.
All reactions