Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure blob secondary storage #1152

Open
ahojnnes opened this issue Aug 29, 2022 · 17 comments
Open

Azure blob secondary storage #1152

ahojnnes opened this issue Aug 29, 2022 · 17 comments
Labels
feature New or improved feature

Comments

@ahojnnes
Copy link

ahojnnes commented Aug 29, 2022

Is there an interest in accepting a pull request that adds azure blob storage as a secondary backend? The idea would be to use the existing httplib 3rd party library and implement the necessary put/get/delete calls, given an azure connection string and container name. An alternative could be to use the official azure-sdk-for-cpp as a new dependency, but that would be a quite heavy dependency to pull in for just a few simple REST calls. Before embarking on this journey, I would like to get some positive feedback from the maintainers of this repository that such a feature would be merged (if adhering to the coding standards, etc.).

@ahojnnes ahojnnes added the feature New or improved feature label Aug 29, 2022
@jrosdahl
Copy link
Member

Thanks for asking. Yes, an implementation that doesn't introduce new dependencies will most likely be welcome.

@afbjorklund
Copy link
Contributor

I think you can also use "blobfuse" (and FileStorage) for this ?

@ahojnnes
Copy link
Author

ahojnnes commented Sep 5, 2022

Thanks for the blobfuse suggestion (tried it using blobfuse2). This seems to work well so far!

For anybody's future reference, I integrated this into our cmake project using a custom wrapper around ccache:

#!/bin/bash
export CCACHE_DIR="path/to/cmake/source/dir/ccache/local"
export CCACHE_BASEDIR="path/to/cmake/source/dir"
export CCACHE_SECONDARY_STORAGE="file://path/to/cmake/source/dir/ccache/shared"
/usr/local/bin/ccache "$@"

... note that you should set CCACHE_BASEDIR, if you want to share the cache across machines, which might checkout your code into different directories.

@afbjorklund
Copy link
Contributor

Normally one uses CMAKE_HOST_C_COMPILER_LAUNCHER and CMAKE_HOST_CXX_COMPILER_LAUNCHER

But you can of course use a custom shell wrapper, to set up the environment variables. Or put cache inside "source" :-)

@ahojnnes
Copy link
Author

ahojnnes commented Sep 6, 2022

I appreciate the feedback. We are calling this wrapper script through the suggested compiler launcher mechanism already, however I couldn't come up with a better way to isolate repository specific settings for ccache that don't affect every other repository on the same machine. The compiler launcher mechanism does not seem to support passing custom arguments to ccache. Happy to learn about alternatives. Thanks.

@afbjorklund
Copy link
Contributor

I guess you would normally set them in the environment, before calling ninja ?

At least the basics, and then use the ccache.conf to tweak the detailed settings.

@ahojnnes
Copy link
Author

ahojnnes commented Sep 6, 2022

I might be wrong but I could not find a way to set environment variables inside cmake that are passed through to ninja/make during build time. I could create a custom target that everything else depends on and set the environment there, but that seems more invasive than a wrapper script. Setting options in a global ccache config is not an option because it affects other repos. Ccache does not seem to search for a local config file in the current folder hierarchy.

@igrr
Copy link

igrr commented Sep 6, 2022

@ahojnnes perhaps you can consider using the direnv tool. You can then place an .envrc file into the project directory and define project-specific environment variables there (such as CCACHE_CONFIGPATH). The tool will automatically source the .envrc file when you change into the project directory.

@afbjorklund
Copy link
Contributor

afbjorklund commented Sep 8, 2022

@ahojnnes maybe you can add a short section about it to https://github.com/ccache/ccache/wiki/File-storage ?

Link: https://docs.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux (v2 is in Preview)

Would also be interesting* to know how using this (Azure Blob) compares to a regular NFS disk (Azure Files) ?

https://docs.microsoft.com/en-us/azure/storage/files/storage-files-quick-create-use-linux (e.g. performance vs cost)

But that is probably better off as a blog post or something ?

Don't want to keep lots of external information in ccache.


* If you feel up to it, Azure Cache would also be interesting...

as in https://azure.microsoft.com/en-us/services/cache/

@afbjorklund
Copy link
Contributor

afbjorklund commented Sep 8, 2022

I might be wrong but I could not find a way to set environment variables inside cmake that are passed through to ninja/make during build time.

I just meant something like:

cmake -G Ninja ..
CCACHE_DIR=/foo ninja

But maybe you have a more complex setup. As far as I know, cmake only needs to add the "ccache " prefix to the compiler...

@ahojnnes
Copy link
Author

ahojnnes commented Sep 8, 2022

  • After running this setup for a little bit, it turns out that blobfuse2/fuse consumes a lot of CPU cycles in the background, even if no build has been running for a long while. I am going to invest some time and see if the NFS option does not suffer from this issue.
  • I tried Azure Cache for Redis. It works very well and was very fast in my experiments, but suffers from the issue of not being persistent, unless you go for the pricier tiers.
  • Implementing direct Azure blob storage support should be quite easy, if we rely on the azure-sdk-for-cpp. I know that this would be quite performant, as sccache has support for it (but some other limitations that rule it out as an option for us). azure-sdk-for-cpp could be an optional dependency (enabled through a cmake option and compiler definitions), similar to how Redis is also optional and relies on external code. I am still wondering whether there is an appetite for accepting such a change?

@afbjorklund
Copy link
Contributor

Another alternative would be to set up a nginx proxy. Similar to server: https://github.com/ccache/ccache/wiki/HTTP-storage

Using REST directly typically doesn't work, because of possible HTTPS requirements and because of the auth/tls overhead.

@afbjorklund
Copy link
Contributor

I think it will need a plugin system, before it can depend on external libraries like SSL or SDK (without being turned OFF)

@ahojnnes
Copy link
Author

After playing some more with this, using NFS also doesn't provide good performance. The plugin system sounds like a good solution, even though I don't know about all the internals of ccache. I am still wondering why azure-sdk-for-cpp could not be linked as an optional dependency (hidden behind a cmake option)? So, it would only be compiled and linked into ccache for folks that are actually interested in using azure blob storage as a backend?

For completeness of the discussion, sccache was mainly ruled out as an alternative due it's lack of support for compiling from different build/source folders and significantly lower cache hit ratio.

@afbjorklund

This comment was marked as duplicate.

@jrosdahl
Copy link
Member

@ahojnnes wrote:

I am still wondering why azure-sdk-for-cpp could not be linked as an optional dependency (hidden behind a cmake option)? So, it would only be compiled and linked into ccache for folks that are actually interested in using azure blob storage as a backend?

There is no technical reason for why it couldn't be done that way. It has more to do with project maintainability, documentation and distribution. I have tried to summarize some of my thoughts on this in #1214. Also, since #1214 is what I'm currently aiming for, I'm not keen on adding more backends at the moment.

Regarding why I think that it's OK to optionally depend on Hiredis:

  • Hiredis is prepackaged in almost all environments (Linux distributions, etc.). This means that Redis support will be enabled in practice when packaging ccache for an OS distribution, so the documentation can describe Redis support as being included and end users won't be confused.
  • Hiredis is small and fast to load, so it does not measurably make things slower if enabled at compile time.

In contrast, Azure SDK for C++ is new and evolving, so to use it I assume that one has to compile it separately. I'm also assuming that Azure SDK for C++ depends on cURL, OpenSSL and friends, so it will be to heavy-weight to enable for a default package in an OS distribution.

Yes, this is a Linux-centric view, but that's the ccache project's main target.

@ahojnnes
Copy link
Author

ahojnnes commented Dec 4, 2022

Fair enough, thanks for taking the time to respond. Just trying to understand the rationale, as somebody who is not deeply familiar with ccache's architecture and maintenance considerations. Thanks.

I am not going to invest time on this until the plugin system or another mechanism is in place. I'd appreciate a ping on this thread here. Thanks. FYI, we are only interested in getting this to run on Linux from our side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New or improved feature
Projects
None yet
Development

No branches or pull requests

4 participants