Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the JupyterNoteBook cell is still running after the progress bars have been finished? #248

Open
thatmee opened this issue Aug 31, 2023 · 3 comments

Comments

@thatmee
Copy link

thatmee commented Aug 31, 2023

Snipaste_2023-08-31_17-36-43
Here is the picture of the cell. I noticed that all progress bars finished in about 3mins. However, the cell is still running and stops after 8mins. Why would this happen? Thank you!

@thatmee thatmee changed the title Why the JupyterNoteBookcell is still running after the progress bars have been finished. Why the JupyterNoteBook cell is still running after the progress bars have been finished? Aug 31, 2023
@tmvfb
Copy link

tmvfb commented Oct 7, 2023

Second this. In my case, the dataset has 100+ million rows, and the delay is much longer than 8 mins.
As it seems a relatively new issue, it may be connected with using Jupyter Notebook 7 or Pandas 2.0+.

UPD: Allocating more RAM to WSL, hiding the progress bar, setting use_memory_fs=True and running script in a new notebook actually helped to significantly speed up the process, so the problem may be related to RAM, and not the library.

Some details:
Operating System: Ubuntu 22.04.2 LTS (WSL 2)
Python version: 3.10.12
Pandas version: 2.1.1
Pandarallel version: 1.6.5
Jupyter Notebook version: 7.0.2

That's what I get if I stop the cell execution when all threads are at 100%:

Process ForkPoolWorker-14:
Process ForkPoolWorker-15:
Process ForkPoolWorker-16:
Process ForkPoolWorker-17:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 365, in get
    res = self._reader.recv_bytes()
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
    with self._rlock:
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 414, in _recv_bytes
    buf = self._recv(4)
  File "/usr/lib/python3.10/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt

During handling of the above exception, another exception occurred:


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
    with self._rlock:
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 364, in get
    with self._rlock:
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
  File "/usr/lib/python3.10/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.10/multiprocessing/queues.py", line 365, in get
    res = self._reader.recv_bytes()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
KeyboardInterrupt

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
KeyboardInterrupt
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.10/multiprocessing/pool.py", line 114, in worker
    task = get()
KeyboardInterrupt
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
    [... skipping hidden 1 frame]

Cell In[22], line 1
----> 1 convert_timestamp(train)

File ~/sleep/lib/parallel.py:31, in convert_timestamp(df)
     30 gc.collect()
---> 31 df["timestamp"] = df.timestamp.parallel_apply(to_correct_format)

File ~/.local/lib/python3.10/site-packages/pandarallel/core.py:444, in parallelize_with_pipe.<locals>.closure(data, user_defined_function, *user_defined_function_args, **user_defined_function_kwargs)
    442         progress_bars.set_error(worker_index)
--> 444 results = results_promise.get()
    446 return data_type.reduce(results, reduce_extra)

File /usr/lib/python3.10/multiprocessing/pool.py:768, in ApplyResult.get(self, timeout)
    767 def get(self, timeout=None):
--> 768     self.wait(timeout)
    769     if not self.ready():

File /usr/lib/python3.10/multiprocessing/pool.py:765, in ApplyResult.wait(self, timeout)
    764 def wait(self, timeout=None):
--> 765     self._event.wait(timeout)

File /usr/lib/python3.10/threading.py:607, in Event.wait(self, timeout)
    606 if not signaled:
--> 607     signaled = self._cond.wait(timeout)
    608 return signaled

File /usr/lib/python3.10/threading.py:320, in Condition.wait(self, timeout)
    319 if timeout is None:
--> 320     waiter.acquire()
    321     gotit = True

KeyboardInterrupt: 

During handling of the above exception, another exception occurred:

KeyboardInterrupt                         Traceback (most recent call last)
KeyboardInterrupt: 

@nalepae
Copy link
Owner

nalepae commented Jan 23, 2024

Pandaral·lel is looking for a maintainer!
If you are interested, please open an GitHub issue.

@shermansiu
Copy link

@thatmee You mentioned that the problem is likely related to the amount of RAM available and not a problem with pandarallel itself. Do you have any other problems?

I'm tempted to close this issue otherwise (or if there isn't a reply in a while).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants