Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Populate home dir if it is empty #1478

Open
benz0li opened this issue Oct 12, 2021 · 26 comments · Fixed by #1512
Open

Populate home dir if it is empty #1478

benz0li opened this issue Oct 12, 2021 · 26 comments · Fixed by #1512
Labels
type:Enhancement A proposed enhancement to the docker images

Comments

@benz0li
Copy link
Contributor

benz0li commented Oct 12, 2021

What docker images this feature is applicable to?

jupyter/base-notebook

What changes do you propose?

Populate home dir if it is empty.

How does this change will affect users?

If a user mounts to /home/{raw_username} it will be populated from /home/jovyan on initial startup.
ℹ️ NB_USER set to {raw_username}; the unescaped username (e.g. from JupyterHub).

@benz0li benz0li added the type:Enhancement A proposed enhancement to the docker images label Oct 12, 2021
@benz0li
Copy link
Contributor Author

benz0li commented Oct 12, 2021

diff --git a/base-notebook/start.sh b/base-notebook/start.sh
index 05c1037..9e59ab2 100644
--- a/base-notebook/start.sh
+++ b/base-notebook/start.sh
@@ -123,6 +123,15 @@ if [ "$(id -u)" == 0 ] ; then
                     exit 1
                 fi
             fi
+        # The home directory could be bind mounted. Populate it if it is empty
+        elif [[ "$(ls -A "/home/${NB_USER}" 2> /dev/null)" == "" ]]; then
+            _log "Populating home dir /home/${NB_USER}..."
+            if cp -a /home/jovyan/. "/home/${NB_USER}/"; then
+                _log "Success!"
+            else
+                _log "Failed to copy data from /home/jovyan to /home/${NB_USER}!"
+                exit 1
+            fi
         fi
         # Ensure the current working directory is updated to the new path
         if [[ "${PWD}/" == "/home/jovyan/"* ]]; then

|| ln -s /home/jovyan "/home/${NB_USER}" is obsolete, because the home dir is copied [and not moved] now.

@benz0li
Copy link
Contributor Author

benz0li commented Oct 12, 2021

In my custom JupyterLab docker stack a duplicate of /home/jovyan is kept at /var/tmp/jovyan. If a user bind mounts to /home/jovyan itself and NB_USER is set to jovyan, the home dir is pre-populated by the following script:

/usr/local/bin/start-notebook.d/populate.sh:

#!/bin/bash

set -e

if [[ "$(ls -A "/home/jovyan" 2> /dev/null)" == "" ]]; then
    cp -a /var/tmp/jovyan/. /home/jovyan
fi

@maresb
Copy link
Contributor

maresb commented Nov 8, 2021

Which are the files here that you care about? Is your home directory totally empty, like missing .bashrc? Or is it the work folder? I'm trying to understand what test cases we should write.

@benz0li
Copy link
Contributor Author

benz0li commented Nov 8, 2021

Yes, the home directory is entirely empty if bind mounted to /home/{raw_username}. As it exists, ! -e "/home/${NB_USER}" is FALSE, thus nothing is copied from /home/jovyan/ to /home/${NB_USER}.
ℹ️ NB_USER set to {raw_username}

It's the whole content of /home/jovyan/ I care about.

@benz0li
Copy link
Contributor Author

benz0li commented Nov 8, 2021

For jupyters base-notebook this would take care about

-rw-rw-r-- 1 jovyan users  220 Feb 25  2020 .bash_logout
-rw-rw-r-- 1 jovyan users 3823 Nov  8 05:07 .bashrc
drwsrwsr-x 2 jovyan users 4096 Nov  8 05:09 .cache
drwsrwsr-x 1 jovyan users 4096 Nov  8 05:08 .conda
drwsrws--- 3 jovyan users 4096 Nov  8 05:09 .config
drwsrws--- 2 jovyan users 4096 Nov  8 05:09 .jupyter
-rw-rw-r-- 1 jovyan users  807 Feb 25  2020 .profile
-rw-rw-r-- 1 jovyan users  227 Nov  8 05:07 .wget-hsts
drwsrwsr-x 2 jovyan users 4096 Nov  8 05:07 work

For my custom jupyterlab image this would take care about

-rw-r--r--  1 jovyan users  220 Aug  4 20:25 .bash_logout
-rw-r--r--  1 jovyan users 3768 Nov  5 10:50 .bashrc
drwxr-xr-x  3 jovyan users 4096 Nov  5 10:50 .local
drwxr-xr-x 12 jovyan users 4096 Nov  5 10:50 .oh-my-zsh
-rw-r--r--  1 jovyan users  807 Aug  4 20:25 .profile
-rw-r--r--  1 jovyan users 4219 Nov  5 10:50 .zshrc

Only for the initial/first startup and if the mounted home directory at /home/${NB_USER} is empty.

@maresb
Copy link
Contributor

maresb commented Nov 10, 2021

@consideRatio the merge seems to have unintentionally caused this issue to be closed prematurely. (Probably I shouldn't have had "close #1478" in my description! 😆)

@benz0li
Copy link
Contributor Author

benz0li commented Mar 9, 2022

Or is it the work folder?

I find the suggestion of -v "${PWD}":/home/jovyan/work at https://github.com/jupyter/docker-stacks#quick-start > 'Example 2' rather odd, because this does not preserve dotfiles (user-specific application configuration) - e.g. ~/.local, ~/.config, etc.

That's one of many reasons why I'm building my own Jupyter docker stack incorporating the changes listed above and mounting the entire home directory.

@benz0li
Copy link
Contributor Author

benz0li commented Apr 12, 2022

Closing due to inactivity.

@mathbunnyru
Copy link
Member

mathbunnyru commented Aug 23, 2023

@benz0li I do like this proposal (sorry I haven't answered 2 years ago 🙂).

I think the best way for us would be to do the following:

  1. Backup /home/${NB_USER} as the last stage of Dockerfile.
  2. In the start.sh script restore these files to /home/{raw_username} (and we'll need to fix permissions in some cases).
    We'll only be restoring a file or directory if it doesn't exist already.

I see 2 current issues, which will be solved by such a behaviour:

  1. New conda environment issue with another username #1792
  2. Colorize terminal through /etc/bash.bashrc as /home/jovyan/.bashrc tends to be wiped #815

Or is it the work folder?

I find the suggestion of -v "${PWD}":/home/jovyan/work at https://github.com/jupyter/docker-stacks#quick-start > 'Example 2' rather odd, because this does not preserve dotfiles (user-specific application configuration) - e.g. ~/.local, ~/.config, etc.

That's one of many reasons why I'm building my own Jupyter docker stack incorporating the changes listed above and mounting the entire home directory.

I checked that the example works fine though, because:

  1. We mount work subdir
  2. We don't change the NB_USER.
    In such a case we have default .bashrc and other files from the image.

@maresb
Copy link
Contributor

maresb commented Aug 23, 2023

A fresh thought on this... Linux has a default template in /etc/skel for templating home directories of new users. I'm not sure whether or not it would make sense to use this in this case as the "backup" of /home/${NB_USER}.

@benz0li
Copy link
Contributor Author

benz0li commented Aug 23, 2023

@mathbunnyru Check the difference from https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/scripts/usr/local/bin/start.sh to the current file of this repository.
ℹ️ There will be some additional stuff because I use Zsh as default shell and have code-server installed.

I populate with https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/scripts/usr/local/bin/start-notebook.d/populate.sh using a start-notebook.d hook.

@mathbunnyru
Copy link
Member

Thanks @maresb, that's a good point.

We have some files, which are more our image specific, like .jupyter subdir, which might create environments, added by users.
Is it a good idea to put such files/dirs to /etc/skel as well?

I don't know if we can actually use /etc/skel unfortunately.
If a user has a custom NB_USER and mounts the homedir, what is the behaviour of /etc/skel?
Is it gonna ignore existing files/dirs, break or overwrite?

@mathbunnyru
Copy link
Member

mathbunnyru commented Aug 23, 2023

Thanks @benz0li!

Could you please tell, how do you backup /home/jovyan files?
Manually in each image (adding newly-created or changed files) or automatically just by archiving the whole directory (using some backup.sh-like script)?

@mathbunnyru
Copy link
Member

mathbunnyru commented Aug 23, 2023

It's nice to see someone using our start-notebook.d startup hook, because it is not tested at all 😆

@maresb
Copy link
Contributor

maresb commented Aug 23, 2023

Using /etc/skel may be a bad idea. It will depend on the use case. Also, changing /etc/skel could interfere with the behavior of existing images. I just wanted to point out its existence, but unfortunately I don't have the headspace at the moment to evaluate the merits in this instance.

@benz0li
Copy link
Contributor Author

benz0li commented Aug 23, 2023

Could you please tell, how do you backup /home/jovyan files?

https://github.com/b-data/jupyterlab-python-docker-stack/blob/e0295b86406246873f73d5bc5763866729d9c9d6/base/latest.Dockerfile#L270-L271

Automatically in each image (adding newly-created or changed files) or just by archiving the whole directory?

I originally wanted this done only in the so called base images of my JupyterLab docker stacks.

There is one exception, though: https://github.com/b-data/jupyterlab-r-docker-stack/blob/3e81912c6763d0f198901ac0c19a5ce027cfa03f/qgisprocess/latest.Dockerfile#L249-L250

@mathbunnyru
Copy link
Member

I actually changed my mind - I think it's better to manually backup files we want to preserve (like .bashrc), rather than backing up whole dir each time:

  1. It's gonna be more explicit.
  2. It's easy to see in which image file was created.
  3. We have more control, and we probably don't want copy something like .lesshst or wget history file and so on.
  4. I think we'll have most of the copying in the docker-stacks-foundation image and other images won't change at all.
  5. Downside - if someone changes file/dir location, then the image will break - which is obviously rare and not such a big deal.

@mathbunnyru
Copy link
Member

Thanks @benz0li!
I see you're copying the whole HOME in base, and one specific dir in some inherited image.

@benz0li
Copy link
Contributor Author

benz0li commented Aug 23, 2023

If a user has a custom NB_USER and mounts the homedir, what is the behaviour of /etc/skel?
Is it gonna ignore existing files/dirs, break or overwrite?

If I remember correctly, the bind mounted home directory is not populated [with /etc/skel].

I am quite sure, that was the reason I copied to /var/backups/skel.

@mathbunnyru
Copy link
Member

mathbunnyru commented Aug 23, 2023

`> If I remember correctly, the bind mounted home directory is not populated.

Yes, you're right, but we're calling usermod/useradd-like commands on mounted dir, which might or might not copy/overwrite files from /etc/skel in the new userdir.
So, @maresb was suggesting a bit different implementation (which would essentially have the same goal and result).

@mathbunnyru
Copy link
Member

@benz0li Also, if you can share issues (if you had any) or something we need to be aware of (when using such a backup approach), that would be great.
I appreciate your feedback and ideas.

As far as I understand, you don't manually change ownership of backup to new user (if NB_UID was manually set for example), which might not work, I suppose? (unless user runs with CHOWN_HOME).

@mathbunnyru
Copy link
Member

mathbunnyru commented Aug 23, 2023

So, my plan is:

  1. Move run-hooks to a separate file
  2. Test this file
  3. Manually add some files to a backup and a populate.sh script.
  4. Also, add a test for many possible use cases:
    • NB_USER set or not
    • NB_UID set or not
    • CHOWN_HOME set or not
    • New ${HOME} is mounted or not (where some files might be already present).

@benz0li
Copy link
Contributor Author

benz0li commented Aug 23, 2023

@benz0li Also, if you can share issues (if you had any) or something we need to be aware of (when using such a backup approach), that would be great. I appreciate your feedback and ideas.

Bind mounting a home directory a quite delicate matter. My JupyterLab docker stacks allow bind mounting the same home directory by any image so the init.sh scripts (before-notebook.d hook) were far trickier. E.g.

As far as I understand, you don't manually change ownership of backup to new user (if NB_UID was manually set for example), which might not work, I suppose? (unless user runs with CHOWN_HOME).

Correct. See also https://github.com/b-data/jupyterlab-python-docker-stack#create-home-directory


Use case: https://demo.jupyter.b-data.ch

@benz0li
Copy link
Contributor Author

benz0li commented Aug 23, 2023

I checked that the example works fine though, because:

  1. We mount work subdir
  2. We don't change the NB_USER.
    In such a case we have default .bashrc and other files from the image.

Correct. Exactly this example is the only exception.

@mathbunnyru mathbunnyru changed the title Populate home dir if it is empty [ENH] - Populate home dir if it is empty Aug 28, 2023
@mathbunnyru mathbunnyru changed the title [ENH] - Populate home dir if it is empty Populate home dir if it is empty Sep 10, 2023
@benz0li
Copy link
Contributor Author

benz0li commented Feb 23, 2024

@mathbunnyru ℹ️ I found a way to enable bind mounting a subfolder of the home directory for arbitrary $NB_USERs and thus resolve b-data/jupyterlab-python-docker-stack#1.

Users can now choose whether to (bind) mount the entire home directory or just a subfolder within it.

@mathbunnyru
Copy link
Member

@benz0li that's great, I see the issue this resolves.

To be honest, I don't understand everything about your implementation (I am not great at shell scripting), though.
I wonder if Python implementation would make it better or worse.

Anyone who wants to implement a similar thing in this repo, it would be really nice (but we will have to add extensive testing).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Enhancement A proposed enhancement to the docker images
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants