Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: AsyncChromiumLoader instructions do not work in Windows Jupyter notebook #21246

Open
2 tasks done
mieslep opened this issue May 3, 2024 · 1 comment
Open
2 tasks done
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder

Comments

@mieslep
Copy link
Contributor

mieslep commented May 3, 2024

Checklist

  • I added a very descriptive title to this issue.
  • I included a link to the documentation page I am referring to (if applicable).

Issue with current documentation:

On this page:

https://python.langchain.com/docs/integrations/document_loaders/async_chromium/

with a modified notebook cell:

from langchain_community.document_loaders import AsyncChromiumLoader
import nest_asyncio
nest_asyncio.apply()

urls = ["https://www.wsj.com"]
loader = AsyncChromiumLoader(urls)
docs = loader.load()
docs[0].page_content[0:100]

I get this stacktrace:

Task exception was never retrieved
future: <Task finished name='Task-19' coro=<Connection.run() done, defined at c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_connection.py:265> exception=NotImplementedError()>
Traceback (most recent call last):
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\tasks.py", line 277, in __step
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_connection.py", line 272, in run
    await self._transport.connect()
  File "c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_transport.py", line 133, in connect
    raise exc
  File "c:\Users\phil\git\graphvec\.venv\Lib\site-packages\playwright\_impl\_transport.py", line 120, in connect
    self._proc = await asyncio.create_subprocess_exec(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\subprocess.py", line 223, in create_subprocess_exec
    transport, protocol = await loop.subprocess_exec(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 1708, in subprocess_exec
    transport = await self._make_subprocess_transport(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\phil\AppData\Local\Programs\Python\Python311\Lib\asyncio\base_events.py", line 503, in _make_subprocess_transport
    raise NotImplementedError
NotImplementedError

From some internet sleuthing it seems this is a problem specific to Windows?

If I put the code into a .py file and run it directly it does run correctly, so the environment is installed correctly, but it is a Jupyter-related invocation problem.

Idea or request for content:

  • If this is not supported on Windows, then the documentation should indicate as such.
  • If there is a Windows-specific workaround then that should be documented.
  • Ideally, of course, the example is copy-paste workable across all platforms.
@dosubot dosubot bot added the 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder label May 3, 2024
@eyurtsev
Copy link
Collaborator

eyurtsev commented May 3, 2024

The chromium loader likely needs to be re-written to support true async. It's relying right now on nest_asyncio which might not be supporting windows (and it appears to be an archived project now)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder
Projects
None yet
Development

No branches or pull requests

2 participants