Skip to content
This repository has been archived by the owner on Dec 14, 2023. It is now read-only.

Fix disk space "leak" in extract-and-vector, limit disk usage on all services #809

Open
2 tasks
pypt opened this issue Sep 24, 2021 · 2 comments
Open
2 tasks
Labels

Comments

@pypt
Copy link
Contributor

pypt commented Sep 24, 2021

extract-and-vector workers tend to fill up /var/tmp with gigabytes of pretty much identical files which are of the size of either 0 or 3332489:

$ docker exec -it 689b33c92426 bash
mediacloud@689b33c92426:/var/tmp$ ls -la
total 3314808
drwxrwxrwt 1 root       root         36864 Sep 24 16:07 .
drwxr-xr-x 1 root       root          4096 Jul 23 13:38 ..
-rw------- 1 root       root       3332489 Aug 31 06:34 jieba.cache
<...>
-rw------- 1 mediacloud mediacloud 3332489 Sep 13 03:20 tmp0fductxs
-rw------- 1 mediacloud mediacloud       0 Sep 23 11:17 tmp0gvvyssa
-rw------- 1 mediacloud mediacloud 3332489 Sep  9 19:46 tmp0hnsk5dl
<...>
-rw------- 1 mediacloud mediacloud 3332489 Sep 22 00:18 tmp0u38habl
-rw------- 1 mediacloud mediacloud       0 Sep 24 03:20 tmp0uaqfvu8
-rw------- 1 mediacloud mediacloud 3332489 Sep 12 05:47 tmp0uu31qqo
<...>
-rw------- 1 mediacloud mediacloud 3332489 Sep  4 08:31 tmp15uwsawk
-rw------- 1 mediacloud mediacloud       0 Sep 24 04:45 tmp163pb8nu
-rw------- 1 mediacloud mediacloud 3332489 Sep 16 20:46 tmp16nra4na
<...>
-rw------- 1 mediacloud mediacloud 3332489 Sep 12 08:41 tmp1toho273
-rw------- 1 mediacloud mediacloud       0 Sep 22 21:48 tmp1uc_jdij

It took me a while to notice that a temporary file with a random name and a temporary file with a not-so-random name have identical file sizes:

-rw------- 1 root       root       3332489 Aug 31 06:34 jieba.cache
<...>
-rw------- 1 mediacloud mediacloud 3332489 Sep 13 03:20 tmp0fductxs

Jieba is a Python library which does Chinese language tokenization for us. Given that it uses a dictionary to do that, it has to pre-load some stuff:

# Prebuild Jieba dictionary cache
COPY bin/build_jieba_dict_cache.py /
RUN \
/build_jieba_dict_cache.py && \
rm /build_jieba_dict_cache.py && \
true

but it seems that the resulting /var/tmp/jieba.cache does not become accessible by the users as that file gets created with root:root owner and 600 permissions while its users run as mediacloud:mediacloud, so Jieba resorts to rebuilding that cache file on every call.

@jtotoole, could you:

  1. Fix jieba.cache's file permissions at build time so that Jieba library could access it; probably you just need to run that cache creation script with a different user in Dockerfile
  2. Limit the storage that gets used by all service containers in production's docker-compose.yml where appropriate - you'll probably need storage_opt for that
    • We try to put at least a liberal cap on all services' resources so that if it goes rogue with CPU / RAM usage, it doesn't have impact on the host machine and doesn't make other services crash. It now turns out that we can run out of disk space too, and while systems monitoring would be a good yet reactive way to deal with that, we need to be proactive about it too and not let containers burn through the host machine's root partition (or if they do, the disk space limitation should be isolated to the container).
@pypt
Copy link
Contributor Author

pypt commented Oct 14, 2021

what do you think is a sensible upper bound for storage_opt?

Dunno, as I don't really know what does it exactly limit :) Is it a cap on container image's own files? Or the files that the container creates while running? Could you test it out for me? Also, what happens if the container exceeds that limit? Does it get killed, or it just can't write stuff anymore?

Generally containers aren't supposed to do much writing to their own root partitions (only to the volumes) while running, and our containers don't write much anywhere. Some exceptions:

  • That Jieba thing, but the cache file was supposed to get created once, so it's a bug that it does any writing at all;
  • Elasticsearch (elasticsearch-base) specifies a temporary directory (
    ## JVM temporary directory
    -Djava.io.tmpdir=/var/tmp
    ) but I'm not sure what does it write there (if anything); could you SSH into servers that run images based on elasticsearch-base (only elk-elasticsearch for now), docker exec into a running container and see what's out there in /var/tmp?
  • Python and Perl code use temporary files (via tempfile and File::Temp) here and there; could you grep for uses of those? Some of the users that I remember about is copy_from and copy_to which accommodate PostgreSQL's COPY so the CSVs that are being copied from / to can get quite large sometimes, but then we still want to ensure that there aren't too many of them
  • elk-filebeat probably stores logs that it has collected somewhere. Could you check its usage too?

If, say, storage_opt limits the amount of data that gets written to a running container, maybe a good liberal upper cap could be 5 GB or something like that? Or 10 GB?

If you can, find out what does storage_opt do exactly (and does it work at all), report back here, and then we'll figure out what we can do with it.

also, do you think the right place to put that is as part of the x-common-configuration section of apps/docker-compose.dist.yml, or would it be better to add it for only certain apps (e.g. extract-and-vector, our original problem child here) ?

All apps can decide to write things, so we'd be looking into adding a storage caps on all apps I'd think.

x-common-configuration sets common environment variables on (most) services; I think storage_opt gets set somewhere else.

@jtotoole
Copy link
Contributor

If you can, find out what does storage_opt do exactly (and does it work at all), report back here, and then we'll figure out what we can do with it.

Looks like this is for setting the container's rootfs size at creation time: https://docs.docker.com/engine/reference/commandline/run/#set-storage-driver-options-per-container

From the docs:

This option is only available for the devicemapper, btrfs, overlay2, windowsfilter and zfs graph drivers. For the devicemapper, btrfs, windowsfilter and zfs graph drivers, user cannot pass a size less than the Default BaseFS Size. For the overlay2 storage driver, the size option is only available if the backing fs is xfs and mounted with the pquota mount option. Under these conditions, user can pass any size less than the backing fs size.

The problem is that it only works for overlay over xfs, and in our case we use ext4, so this isn't a compatible option in our case. Per our discussion earlier, I'm just gonna go ahead and fix the jieba cache issue and call it a day.

@jtotoole jtotoole removed their assignment Dec 1, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

2 participants