Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: selectively remove a particular cell when converting from .md to .ipynb #1235

Open
agriyakhetarpal opened this issue May 13, 2024 · 2 comments

Comments

@agriyakhetarpal
Copy link

agriyakhetarpal commented May 13, 2024

Description

Hi there! Thank you for Jupytext and its CLI, it's been quite useful for me recently for converting Jupytext-formatted Markdown files to IPyNB format. What I am going to describe might not be the best approach to what I am trying to do, but at PyWavelets/pywt#741, I am trying to display Markdown notebooks with MyST-NB, a tool similar to nbsphinx (#119) for a Sphinx-based documentation websites and then convert them to IPyNB format at the time of building the documentation pages, so that I can use these notebooks with JupyterLite via jupyterlite-sphinx on a Sphinx-based webpage (because JupyterLite does not seem to support Markdown-based notebooks yet: jupyterlite/jupyterlite#731, #1225).

Essentially, I have a .md file that gets converted to .ipynb, then executed to bring out its outputs and rendered by MyST-NB via Sphinx. A preliminary deployment can be viewed here: https://pywavelets--741.org.readthedocs.build/en/741/regression/wavelet.html, where, upon expanding the dropdown and viewing the notebook(s) (or downloading them) this big cell at the top is revealed (and it is present because it existed in the .md file prior to conversion to .ipynb):

Tap to expand
```{eval-rst}
.. currentmodule:: pywt

.. dropdown:: 🧑‍🔬 This notebook can be executed online. Click this section to try it out! ✨
    :color: success

    .. notebooklite:: dwt-idwt.ipynb
      :width: 100%
      :height: 600px
      :prompt: Open notebook

.. dropdown:: Download this notebook
    :color: info
    :open:

    Please use the following links to download this notebook in various formats:

    1. :download:`Download IPyNB (IPython Notebook) <dwt-idwt.ipynb>`
    2. :download:`Download Markdown Notebook (Jupytext) <dwt-idwt.md>`
```

Proposition

I was wondering if it is possible to hide this code block in particular when I pursue this conversion, i.e., by letting Jupytext parse through the cell to find some metadata of the form

```{eval-rst}
---
tags: [ignore]
---

This cell should not be present after conversion
```

or with Markdown cells, another way could be:

+++ {"tags": ignore}

This cell should not be present after conversion

+++

that can be added to this cell (and for other cells that I want not to be parsed). This is sort of related to #220, but it is also a fundamentally different issue and possibly unique ask since I wish to ignore a particular cell entirely during conversion from one file format to another (in this case Markdown to IPyNB, while #220 is most likely talking about other formats ➡️ Markdown and its flavours).

Additional context

This is how it ends up looking like on the above link when I render the notebook with the NotebookLite directive to be able to run it inside the browser:

Tap to view NotebookLite standard notebook UI containing a page from the Usage Examples section of the PyWavelets documentation

therefore, this is a cell that I don't wish to retain after conversion, and so my notebook can be made cleaner and improved if it starts with the relevant contents directly (in this case, the Markdown heading of the notebook) and skips adding this cell.


Any help or suggestions would be greatly appreciated—especially if the --pipe option can be made of use or similar—thanks!

@mwouts
Copy link
Owner

mwouts commented May 14, 2024

Hi @agriyakhetarpal , thank you for opening this issue, and sharing this use case with such level of detail!

I will be happy to look further into this, and right now I only have time to give a few global comments, but this is what came to my mind:

  1. Thanks for the links, I'll take more time to explore them, indeed JupyterLite is promising and it's really great to see it loading so fast (but I don't know yet how to make it works with Jupytext)
  2. Is it possible that the code that you want to filter from the notebook would be at a better place somewhere else, e.g. in a Sphinx or a Jupyter Book plugin (if such a concept exists, I am not directly familiar with Sphinx) ?
  3. In Jupytext we put a strong emphasis on the round trip conversion so there is no option to remove any cell inputs, but...
  4. I would be happy to help you write a short Python script that removes some specific cells (see below)
  5. Maybe you could also share your use case with the MyST-NB developers (see also 2. above)

Now re a script that would remove the cells with an "ignore" tag:

import nbformat

notebook = nbformat.read('notebook_with_an_ignore_tag_in_some_cells.ipynb', as_version=4)
notebook.cells = [cell for cell in notebook.cells if 'ignore' not in cell.metadata.get('tags', [])]
nbformat.write(notebook, 'filtered_notebook.ipynb')

@agriyakhetarpal
Copy link
Author

agriyakhetarpal commented May 17, 2024

Hi @agriyakhetarpal , thank you for opening this issue, and sharing this use case with such level of detail!

Thank you for a warm-hearted response to my query, @mwouts!

Thanks for the links, I'll take more time to explore them, indeed JupyterLite is promising and it's really great to see it loading so fast (but I don't know yet how to make it works with Jupytext)

Ah, JupyterLite is most likely embedding the interface for JupyterLab in a web browser, which already supports Jupytext notebooks. The kernel is WASM-powered; that is the difference more or less (I might not be completely correct about this, however).

Is it possible that the code that you want to filter from the notebook would be at a better place somewhere else, e.g. in a Sphinx or a Jupyter Book plugin (if such a concept exists, I am not directly familiar with Sphinx) ?

Maybe you could also share your use case with the MyST-NB developers

It should be possible to include the notebook from a different location or pair up the notebook from elsewhere. However, that would most likely require duplicating the contents of the notebook, and then inserting a cell that contains the directives for it to be able to host the notebook onto the webpage – I do not think that would be trivial. Interacting with the MyST-NB developers would be a good idea, sure!

In Jupytext we put a strong emphasis on the round trip conversion so there is no option to remove any cell inputs, but...

I understand, I didn't think about that aspect at all when writing this even when I knew it – based on that, this feature request could also be a bit out of scope, considering that this is a pretty unique one, haha! Please feel free to close this if you feel so, or keep it open for visibility out of the chance that someone else might need this feature someday.

I would be happy to help you write a short Python script that removes some specific cells (see below)

Now re a script that would remove the cells with an "ignore" tag:

I was able to manage loading this short snippet (with a minor change/fix) as a very minimal Sphinx extension coupled with a subprocess – here's a brief in case it would be interesting to other readers:

Tap to expand code
from sphinx.application import Sphinx
from pathlib import Path

HERE = Path(__file__).parent

def preprocess_notebooks(app: Sphinx, *args, **kwargs):
    """Preprocess notebooks to convert them to IPyNB and remove Sphinx directives."""
    import subprocess
    import sys

    import nbformat

    print("Converting Markdown files to IPyNB...")
    subprocess.check_call(
        [
            sys.executable,
            "-m",
            "jupytext",
            "--to",
            "ipynb",
            f"{HERE / '*.md'}",
        ]
    )

    for notebook in Path(HERE).glob('*.ipynb'):
        print(f"Removing Sphinx directives from {notebook}...")
        converted_notebook = nbformat.read(notebook, as_version=4)
        converted_notebook.cells = [
            cell for cell in converted_notebook.cells
            if "true" not in cell.metadata.get("ignore", [])
        ]
        print(f"Removed Sphinx directives from {notebook}.")
        nbformat.write(converted_notebook, notebook)


def setup(app):
    app.connect("builder-inited", preprocess_notebooks)

and it works perfectly (edit: there is another error, which is unrelated to this process and is more about how the notebooks are copied to the built docs). Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants