Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using requests inside the mapped function causes troubles. #225

Open
zenodallavalle opened this issue Feb 12, 2023 · 1 comment
Open

Using requests inside the mapped function causes troubles. #225

zenodallavalle opened this issue Feb 12, 2023 · 1 comment

Comments

@zenodallavalle
Copy link

zenodallavalle commented Feb 12, 2023

General

  • Mac OS X Ventura 13.1:
  • Python 3.10.9:
  • Pandas 1.5.2:
  • Pandarallel 1.6.4:

Acknowledgement

  • My issue is NOT present when using pandas without alone (without pandarallel)

Bug description

Using requests inside the mapped function causes troubles.

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.

Observed behavior

Function get stuck.

Expected behavior

Function runs fine.

Minimal but working code sample to ease bug fix for pandarallel team

create a separate file to hold the test functions called test_fns.py

class Downloader:
    def __init__(self, url) -> None:
        self.url = url

    def download(self) -> str:
        import requests

        r = requests.get(self.url)
        assert r.status_code == 200
        return r.text[:10]


def docs_example(x) -> float:
    import math
    
    return math.sin(x**2) + math.sin(x**2)


def request(x) -> str:
    return Downloader('https://www.google.com').download()

Create another file that contains the actual script

import test_fns
import pandas as pd

from pandarallel import pandarallel

pandarallel.initialize(progress_bar=True)

if __name__ == '__main__':
    source = pd.Series(range(10))
    source.parallel_map(test_fns.docs_example)
    source.parallel_map(test_fns.request)

Other considerations:

Starting the processes in spawn mode solves the problem.
Forcing spawn mode with

import pandarallel
pandarallel.core.CONTEXT = pandarallel.core.multiprocessing.get_context('spawn')
pandarallel.pandarallel.initialize(progress_bar=True)

solves the problem.

Another strange aspect is that if you call the function without creating a new process test_fns.request('') before applying parallel_map or simply make a request before applying parallel_map (for instance requests.get('http://github.com') everything runs fine.

@nalepae
Copy link
Owner

nalepae commented Jan 23, 2024

Pandaral·lel is looking for a maintainer!
If you are interested, please open an GitHub issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants