Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with larger, custom datasets #133

Open
pugantsov opened this issue Feb 21, 2019 · 3 comments
Open

Working with larger, custom datasets #133

pugantsov opened this issue Feb 21, 2019 · 3 comments
Assignees
Labels

Comments

@pugantsov
Copy link

pugantsov commented Feb 21, 2019

I am currently using TensorRec for my masters project, and have been following the MovieLens guide on getting started with the library.

My dataset is a CSV file in which each row represents a tweet made by a user and potentially useful item metadata for a content-based system. The issue is that a single week's worth of data consists of roughly 48,000 entries for which there is a one-to-one interaction between a tweet's author and the tweet itself.

I initially trained the model on a month's worth of data which caused Python to crash with an out of memory error as I am running this on a machine with 16GB RAM. I narrowed this down to a week for the purposes of building a baseline. I managed to train a CF model, with the poor results that I had expected due to the lack of interactions. In order to enrich precision and recall evaluation scores, I tried to add item metadata and make use of scikit's MultiLabelBinarizer as the guide describes but this seems to crash when I train the model.

I was wondering if there were any optimisation methods to prevent this from crashing or if it were possible to use a Iterable to fit the models instead of storing it in memory? I can't afford to shrink the dataset any further without losing valuable information.

2019-02-21 14:48:22.918219: W tensorflow/core/framework/allocator.cc:122] Allocation of 9571537584 exceeds 10% of system memory.
2019-02-21 14:48:22.924032: W tensorflow/core/framework/allocator.cc:122] Allocation of 9571537584 exceeds 10% of system memory.
Killed

These are the errors I get when running any model of a larger size.

@pugantsov pugantsov changed the title Fitting larger models, local datasets Working with larger, custom datasets Feb 21, 2019
@jfkirk
Copy link
Owner

jfkirk commented Feb 26, 2019

Hey @ajhepburn -- to help debug this, can you give me the following information:
Shape of the interactions [rows, cols, #values]
Shape of the user features [rows, cols, #values]
Shape of the item features [rows, cols, #values]
Are you using the user_batch_size parameter for the fit() method?

@jfkirk jfkirk self-assigned this Feb 26, 2019
@pugantsov
Copy link
Author

pugantsov commented Feb 26, 2019

@jfkirk Hi, I've a feeling this may be on my end.

I've switched to running your keras Book Crossing example, just as is and about 5 epochs in, Python kills my process. Tensorflow backend and GPU usage seems to be fine, no idea why there seems to be this memory leak, was wondering if you've came across anything similar.

Here's what is dumped when I run it:

Using TensorFlow backend.
/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/eval.py:111: RuntimeWarning: divide by zero encountered in true_divide
  ndcg = dcg/idcg
/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/eval.py:111: RuntimeWarning: invalid value encountered in true_divide
  ndcg = dcg/idcg
INFO:root:UserRepr Graph                          ItemRepr Graph                Rec. In-Sample      Rec. Out-sample     Prec. In-Sample     Prec. Out-sample    NDCG In-Sample      NDCG Out-sample
INFO:root:RANDOM BASELINE                                                     : 0.0089              0.0085              0.0003              0.0003              0.0028              0.0026
2019-02-26 15:23:48.176917: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-26 15:23:48.272303: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-26 15:23:48.272773: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: 
name: GeForce GTX 1060 with Max-Q Design major: 6 minor: 1 memoryClockRate(GHz): 1.3415
pciBusID: 0000:01:00.0
totalMemory: 5.94GiB freeMemory: 3.93GiB
2019-02-26 15:23:48.272789: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-02-26 15:23:48.840133: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-02-26 15:23:48.840165: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988]      0 
2019-02-26 15:23:48.840171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0:   N 
2019-02-26 15:23:48.840343: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3660 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1060 with Max-Q Design, pci bus id: 0000:01:00.0, compute capability: 6.1)
INFO:root:Processing interaction and feature data
/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/ops/gradients_impl.py:112: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:root:Beginning fitting
INFO:root:EPOCH 0 BATCH 0 loss = 6.32940673828125, weight_reg_l2_loss = 0.11993500000000001, mean_pred = -0.1165490597486496
INFO:root:EPOCH 1 BATCH 0 loss = 6.247504711151123, weight_reg_l2_loss = 0.11817267578125001, mean_pred = -0.022028077393770218
INFO:root:EPOCH 2 BATCH 0 loss = 6.161670207977295, weight_reg_l2_loss = 0.11675623046875001, mean_pred = 0.06877429783344269
INFO:root:EPOCH 3 BATCH 0 loss = 6.072144508361816, weight_reg_l2_loss = 0.11563564453125001, mean_pred = 0.1571342647075653
INFO:root:EPOCH 4 BATCH 0 loss = 5.981313228607178, weight_reg_l2_loss = 0.11476899414062501, mean_pred = 0.24712017178535461
INFO:root:EPOCH 5 BATCH 0 loss = 5.8869147300720215, weight_reg_l2_loss = 0.114135625, mean_pred = 0.3380657136440277
Killed

EDIT: I must've completely missed the user_batch_size option, set that to 32 and everything is fine now. Thanks!

As for the original issue, I think I wasn't forming my tensors correctly as I was trying to work with a dataset which did not have explicit ratings so I must've messed up somewhere. Going to try your keras implementation and hopefully won't run into many issues.

@pugantsov
Copy link
Author

pugantsov commented Feb 26, 2019

Spoke too soon,

I trained with a user_batch_size of 128 at 10 epochs and get the following:

2019-02-26 16:26:20.490686: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.60GiB.  Current allocation summary follows.
2019-02-26 16:26:20.490740: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256): 	Total Chunks: 17, Chunks in use: 17. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 84B client-requested in use in bin.
2019-02-26 16:26:20.490754: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490772: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024): 	Total Chunks: 1, Chunks in use: 1. 1.2KiB allocated for chunks. 1.2KiB in use in bin. 1.0KiB client-requested in use in bin.
2019-02-26 16:26:20.490789: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490805: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490822: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490836: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490847: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490856: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490871: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490889: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144): 	Total Chunks: 4, Chunks in use: 4. 1.33MiB allocated for chunks. 1.33MiB in use in bin. 1.33MiB client-requested in use in bin.
2019-02-26 16:26:20.490906: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490920: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576): 	Total Chunks: 4, Chunks in use: 4. 5.99MiB allocated for chunks. 5.99MiB in use in bin. 5.99MiB client-requested in use in bin.
2019-02-26 16:26:20.490934: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490951: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490968: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608): 	Total Chunks: 1, Chunks in use: 0. 10.71MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.490985: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.491003: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.491022: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.491034: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-02-26 16:26:20.491045: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456): 	Total Chunks: 3, Chunks in use: 2. 3.54GiB allocated for chunks. 3.21GiB in use in bin. 3.21GiB client-requested in use in bin.
2019-02-26 16:26:20.491056: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 1.60GiB was 256.00MiB, Chunk State: 
2019-02-26 16:26:20.491078: I tensorflow/core/common_runtime/bfc_allocator.cc:619]   Size: 337.86MiB | Requested Size: 842.4KiB | in_use: 0, prev:   Size: 1.60GiB | Requested Size: 1.60GiB | in_use: 1
2019-02-26 16:26:20.491096: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294800000 of size 256
2019-02-26 16:26:20.491110: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294800100 of size 256
2019-02-26 16:26:20.491125: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294800200 of size 1571328
2019-02-26 16:26:20.491141: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f929497fc00 of size 347904
2019-02-26 16:26:20.491154: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f92949d4b00 of size 256
2019-02-26 16:26:20.491169: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f92949d4c00 of size 1280
2019-02-26 16:26:20.491182: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f92949d5100 of size 256
2019-02-26 16:26:20.491197: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f92949d5200 of size 256
2019-02-26 16:26:20.491211: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f92949d5300 of size 1571328
2019-02-26 16:26:20.491226: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294b54d00 of size 1571328
2019-02-26 16:26:20.491240: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294cd4700 of size 1571328
2019-02-26 16:26:20.491252: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294e54100 of size 347904
2019-02-26 16:26:20.491266: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294ea9000 of size 347904
2019-02-26 16:26:20.491275: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294efdf00 of size 347904
2019-02-26 16:26:20.491282: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f52e00 of size 256
2019-02-26 16:26:20.491289: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f52f00 of size 256
2019-02-26 16:26:20.491295: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53000 of size 256
2019-02-26 16:26:20.491302: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53100 of size 256
2019-02-26 16:26:20.491308: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53200 of size 256
2019-02-26 16:26:20.491318: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53300 of size 256
2019-02-26 16:26:20.491331: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53400 of size 256
2019-02-26 16:26:20.491340: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53500 of size 256
2019-02-26 16:26:20.491347: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53600 of size 256
2019-02-26 16:26:20.491353: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53700 of size 256
2019-02-26 16:26:20.491360: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53800 of size 256
2019-02-26 16:26:20.491366: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9294f53900 of size 256
2019-02-26 16:26:20.491373: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7f9294f53a00 of size 11225600
2019-02-26 16:26:20.491384: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f9295a08400 of size 1721096448
2019-02-26 16:26:20.491397: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7f92fc365d00 of size 1721096448
2019-02-26 16:26:20.491411: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7f9362cc3600 of size 354273792
2019-02-26 16:26:20.491425: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
2019-02-26 16:26:20.491442: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 17 Chunks of size 256 totalling 4.2KiB
2019-02-26 16:26:20.491460: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1280 totalling 1.2KiB
2019-02-26 16:26:20.491476: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 347904 totalling 1.33MiB
2019-02-26 16:26:20.491493: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 1571328 totalling 5.99MiB
2019-02-26 16:26:20.491510: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 1721096448 totalling 3.21GiB
2019-02-26 16:26:20.491525: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 3.21GiB
2019-02-26 16:26:20.491544: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                  3815374848
InUse:                  3449875456
MaxInUse:               3449875456
NumAllocs:                  375475
MaxAllocSize:           1721096448

2019-02-26 16:26:20.491574: W tensorflow/core/common_runtime/bfc_allocator.cc:271] *******************************************************************************************_________
2019-02-26 16:26:20.519361: W tensorflow/core/framework/op_kernel.cc:1273] OP_REQUIRES failed at topk_op.cc:83 : Resource exhausted: OOM when allocating tensor with shape[39903,10783] and type int32 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[39903,10783] and type int32 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node TopKV2}} = TopKV2[T=DT_FLOAT, sorted=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Max, strided_slice_4)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "model_keras_bc.py", line 85, in <module>
    recall_k=100, precision_k=100, ndcg_k=100)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/eval.py", line 156, in fit_and_eval
    predicted_ranks = model.predict_rank(user_features=user_features, item_features=item_features)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/tensorrec.py", line 731, in predict_rank
    rankings = self.tf_rankings.eval(session=get_session())
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 713, in eval
    return _eval_using_default_session(self, feed_dict, self.graph, session)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5157, in _eval_using_default_session
    return session.run(tensors, feed_dict)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[39903,10783] and type int32 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node TopKV2 (defined at /home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/recommendation_graphs.py:81)  = TopKV2[T=DT_FLOAT, sorted=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Max, strided_slice_4)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.


Caused by op 'TopKV2', defined at:
  File "model_keras_bc.py", line 85, in <module>
    recall_k=100, precision_k=100, ndcg_k=100)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/eval.py", line 155, in fit_and_eval
    interactions=train_interactions, **fit_kwargs)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/tensorrec.py", line 537, in fit
    n_sampled_items=n_sampled_items)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/tensorrec.py", line 605, in fit_partial
    self._build_tf_graph(n_user_features=n_user_features, n_item_features=n_item_features)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/tensorrec.py", line 454, in _build_tf_graph
    self.tf_rankings = rank_predictions(tf_prediction=self.tf_prediction)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/recommendation_graphs.py", line 81, in rank_predictions
    tf_indices_of_ranks = tf.nn.top_k(tf_prediction, k=tf_prediction_item_size)[1]
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2359, in top_k
    return gen_nn_ops.top_kv2(input, k=k, sorted=sorted, name=name)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 7701, in top_kv2
    "TopKV2", input=input, k=k, sorted=sorted, name=name)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[39903,10783] and type int32 on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node TopKV2 (defined at /home/alex/anaconda3/envs/recsys/lib/python3.6/site-packages/tensorrec/recommendation_graphs.py:81)  = TopKV2[T=DT_FLOAT, sorted=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](Max, strided_slice_4)]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

This is still with the Book Crossing dataset. Also using a GeForce 1060, 6GB GPU

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants