Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directory-specific configuration file #1393

Open
jrosdahl opened this issue Feb 3, 2024 · 19 comments
Open

Directory-specific configuration file #1393

jrosdahl opened this issue Feb 3, 2024 · 19 comments
Labels
feature New or improved feature

Comments

@jrosdahl
Copy link
Member

jrosdahl commented Feb 3, 2024

There are currently four ways to specify ccache configuration:

  • on the command line
  • with environment variables
  • in a cache-specific configuration file (overridable with CCACHE_CONFIGPATH)
  • in a system configuration file (if CCACHE_CONFIGPATH is not set)

What's missing is an easy way to set configuration for a directory (project), either in a static file or in a generated file that can be read in addition to the other configuration files. See also discussion in #747.

One solution would be for ccache to search in the current working directory and its parents for a file with a known name, e.g. .ccache.conf. It would however not be a good idea to enable such a search by default. This is because ccache then (if no directory-specific configuration file is found) will search in all parent directories, including e.g. /home/.ccache.conf on Unix systems, and such stat calls can be very slow if the directory is an autofs mount point or similar.

Here is my currently best ideas:

  • Add support for @file at the command line before the compiler, extending the ccache [KEY=VALUE …​] compiler [compiler options] form to ccache [@file ...] [KEY=VALUE …​] compiler [compiler options]. Example:
    ccache @example.conf gcc -c example.c
    
    This is a similar in spirit to the @file syntax understood by many compilers. file could be an absolute path or a simple filename in which case parent directories are searched.
  • Add a CCACHE_DIRCONFIG/dir_config configuration option to get a similar effect to @file.
@jrosdahl jrosdahl added the feature New or improved feature label Feb 3, 2024
@icarus-sparry
Copy link

For the @file syntax, if the filename is absolute, how does

   ccache @/some/file/path gcc -c example.c

differ from

     CCACHE_CONFIGPATH=/some/file/path ccache gcc -c example.c

Using process substitution you can (on unix like systems) handle the simple filename case as well with a simple while loop.

@jrosdahl
Copy link
Member Author

jrosdahl commented Feb 11, 2024

One difference is that CCACHE_CONFIGPATH specifies the only configuration file to use. That is: If set, neither the standard cache-specific config nor the system config are read. @file would specify a file to read in addition to the standard ones. Another difference is that CCACHE_CONFIGPATH does not look up relative paths, which is what it @file is primarily meant for.

Also, it's possible to specify a CMake compiler launcher with e.g. cmake -D CMAKE_C_COMPILER_LAUNCHER="ccache;@/some/file/path", but CCACHE_CONFIGPATH can't be used that way portably.

@BenPortner
Copy link

Hi @jrosdahl,

Thanks for opening the issue. This is a feature that I would really like to see in ccache and I think it would make ccache a lot easier to configure, if implemented right. Here are my thoughts:

One solution would be for ccache to search in the current working directory and its parents for a file with a known name, e.g. .ccache.conf. It would however not be a good idea to enable such a search by default. This is because (...) such stat calls can be very slow if the directory is an autofs mount point or similar.

I don't know anything about autofs or its performance. However, if its that slow, how probable is it that someone would try to compile from an autofs mount in the first place? The idea is that the ccache.conf would be in the same directory as the source code, after all? Assuming that it is probable and hence undesirable: Maybe we can learn from other tools who offer project-specific configuration options, e.g. git?

Add support for @file at the command line

For my specific use case (using ccache with cmake), passing command line options to ccache is difficult. In fact, it would require a hack similar to the one mentioned in #747. The same goes for the second proposal, introducing a new environment variable. I am afraid the "default search" (option 1) is the only one that would actually make ccache easier to configure when using cmake.

Since I am not the only one using ccache, I would really like to hear other peoples' opinion on this. @srohmen already worked on something similar. Maybe he has something to add?

@jrosdahl
Copy link
Member Author

jrosdahl commented Feb 19, 2024

I don't know anything about autofs or its performance. However, if its that slow, how probable is it that someone would try to compile from an autofs mount in the first place?

The issue is not that building on a network filesystem is slow (well, it is, but with OK latency) – it is that queries to automount a file system can be much slower.

Let's say that the current working directory is /home/user/project/build. When ccache is executed, it would probe /home/user/project/build/.ccache.conf, /home/user/project/.ccache.conf, /home/user/.ccache.conf, /home/.ccache.conf and /.ccache.conf (and stop at the first one found, if any). If directories under /home are automounted, then trying to open /home/.ccache.conf can be very slow since it can trigger queries to other systems to see if there is a user called ".ccache.conf", etc. Not always, depending on caches, but sometimes. (Yes, similar things have caused me trouble at work for tools that behave like this.)

To clarify, the problem is not if .ccache.conf is located in the project but if there is no .ccache.conf in parent directories, which will be the case for most users.

Assuming that it is probable and hence undesirable: Maybe we can learn from other tools who offer project-specific configuration options, e.g. git?

Git does not have the same problem since there is always a .git directory to look for, so the search can stop before reaching outside the project.

The same goes for the second proposal, introducing a new environment variable. I am afraid the "default search" (option 1) is the only one that would actually make ccache easier to configure when using cmake.

How do you specify the location of the cache directory and its maximum size?

@BenPortner
Copy link

BenPortner commented Mar 4, 2024

Hi @jrosdahl,

thanks for the clarification! I trust your experience with autofs and the likes and agree that this could turn into a problem. Nevertheless, I am confident it can be solved.

Git does not have the same problem since there is always a .git directory to look for, so the search can stop before reaching outside the project.

Not technically true.

/home/ben $mkdir test
/home/ben $cd test
/home/ben/test $git init
Initialized empty Git repository in /home/ben/test/.git
/home/ben/test $mkdir subdir
/home/ben/test $cd subdir
/home/ben/test/subdir $git status
On branch main

No commits yet

nothing to commit (create/copy files and use "git add" to track)

Git does look for .git folders in parent directories. So maybe they have a good strategy how to deal with autofs, too?

How do you specify the location of the cache directory and its maximum size?

In the user options under /home/ben/.config/ccache. I don't need these to be project-specific (yet).

@BenPortner
Copy link

BenPortner commented Mar 5, 2024

Just found this in the git documentation:

GIT_CEILING_DIRECTORIES controls the behavior of searching for a .git directory. If you access directories that are slow to load (such as those on a tape drive, or across a slow network connection), you may want to have Git stop trying earlier than it might otherwise, especially if Git is invoked when building your shell prompt.
https://git-scm.com/book/en/v2/Git-Internals-Environment-Variables

So another environment variable that would be hard to set when using cmake...

Another thing I was wondering about: Would this "config searching" be triggered on every invokation of ccache, i.e. on every object that is built? Or is there some sort of "config caching" or similar going on?

@jrosdahl
Copy link
Member Author

jrosdahl commented Mar 6, 2024

Nevertheless, I am confident it can be solved.

I agree that it would be great if things just work! Unfortunately I don't see a good way in this case.

Git does look for .git folders in parent directories. So maybe they have a good strategy how to deal with autofs, too?

Yes, I didn't mean to imply otherwise. With "there is always a .git directory to look for" I meant that there is always a .git directory in one of the parent directories of a Git project that will be found before accessing the autofs mountpoint.

No, Git doesn't deal with the problem and it doesn't have to. Even if Git would try to access, say, /home/.git (which it will do if you run git outside a git repo), it doesn't matter much since it's done one or a few times. The problem with doing it as part of compilation is that it will be done once per ccache invocation ≈ once per compiler invocation. That's a problem if you build a project with lots of compilation units.

Another thing I was wondering about: Would this "config searching" be triggered on every invokation of ccache, i.e. on every object that is built? Or is there some sort of "config caching" or similar going on?

Yes, on every invocation.

In the user options under /home/ben/.config/ccache. I don't need these to be project-specific (yet).

OK. How do you configure CMake to use ccache?

@BenPortner
Copy link

BenPortner commented Mar 9, 2024

The problem with doing it as part of compilation is that it will be done once per ccache invocation ≈ once per compiler invocation. That's a problem if you build a project with lots of compilation units.

Okay, I see the problem now.

OK. How do you configure CMake to use ccache?

With an ugly hack:

set(CMAKE_C_COMPILER_LAUNCHER ${CMAKE_COMMAND} -E env ${ccacheEnv} ${CCACHE_PROGRAM})

More details on the hack here.

The problem boils down to this: Environmental Variables that are set in cmake are used during the setup stage but do not persist during the build stage. Hence, setting up ccache with environment variables is hacky. Adding command line options for ccache would be similarly hacky. Instead, it would be nice to have a way to say to ccache: "For all the work you do in this folder and the directories below, use this config". Furthermore, we want to prevent ccache from looking for the config on each invocation, as this process can be very slow. Instead, it would be beneficial to setup ccache once and have it apply the same configuration throughout the build process. The challenge is to make sure that ccache does not apply set configuration to other projects. Perhaps this could be prevented by checking the path of the passed source code file? In that case, one could say to ccache "Hey, if the file path starts with /home/ben/myproject1, use config 1. If it starts with /home/ben/myotherproject, use config 2". What do you think @jrosdahl ?

@jrosdahl
Copy link
Member Author

Yes, I agree that the ${CMAKE_COMMAND} -E env ${ccacheEnv} part is hacky (which is what lead to this GitHub issue in the first place). I guess you would like it to look like this?

set(CMAKE_C_COMPILER_LAUNCHER ${CCACHE_PROGRAM})

With my proposal, it would look like this:

set(CMAKE_C_COMPILER_LAUNCHER ${CCACHE_PROGRAM} @ccache.conf)

Do you mean that you think that this is too hacky?

I don't think that anybody has suggested setting environment variables inside CMake scripts. Setting CCACHE_DIRCONFIG would be an option for people who configure ccache the traditional way with other CCACHE_* variables outside the build system.

Furthermore, we want to prevent ccache from looking for the config on each invocation, as this process can be very slow. Instead, it would be beneficial to setup ccache once and have it apply the same configuration throughout the build process. The challenge is to make sure that ccache does not apply set configuration to other projects. Perhaps this could be prevented by checking the path of the passed source code file? In that case, one could say to ccache "Hey, if the file path starts with /home/ben/myproject1, use config 1. If it starts with /home/ben/myotherproject, use config 2". What do you think @jrosdahl ?

Sorry, I don't understand your idea. I think you'll need to describe it in more detail to get comments.

@BenPortner
Copy link

BenPortner commented Mar 11, 2024

set(CMAKE_C_COMPILER_LAUNCHER ${CCACHE_PROGRAM} @ccache.conf)

This is not bad. What about the overhead for searching the config file in this case? In a project like this:

my_project/
│
├── CMakeLists.txt
├── ccache.conf
│
├── src_a/
│   └── main_a.cpp
│
├── src_b/
│   └── main_b.cpp
│

We would descend into src_a and src_b, so ccache would have to look in the parent directories for the config file. If none exists, we would run into the autofs/performance-issues again, wouldn't we?

Sorry, I don't understand your idea. I think you'll need to describe it in more detail to get comments.

My suggestion is to have a mapping of source-file paths to config paths in the user-specific config file, e.g. like this:

#/home/ben/.config/ccache/ccache.conf

cache_dir = /path/to/cache/dir
max_size = 10.0G
config_dir = {
    /home/ben/my_project: /home/ben/my_project/ccache.conf
    /home/ben/my_project2: /some/other/path/ccache.conf
    *: /etc/ccache/ccache.conf"
}

I know that this is also not ideal. I'm just brainstorming here.

@jrosdahl
Copy link
Member Author

This is not bad. What about the overhead for searching the config file in this case? In a project like this:
[...]
We would descend into src_a and src_b, so ccache would have to look in the parent directories for the config file.

If @ccache.conf is specified as a ccache argument, the search would be started from the current working directory, not the directory of the source file. (I guess it's not unreasonable to want the lookup to start from the source file, but the source file location is not known before parsing the argument list, and the configuration is read before that happens.) So if the build directory is my_project/build, there will be a file access for my_project/build/ccache.conf and then my_project/ccache.conf will be found. That's not slow, so it's nothing to worry about.

If none exists, we would run into the autofs/performance-issues again, wouldn't we?

No, because if there is no ccache.conf file in your project then you wouldn't pass @ccache.conf to ccache in the first place, right?

Note that you would also have the option to specify exactly the file you want inside the CMake script, thus not relying on searching for the file:

set(CMAKE_C_COMPILER_LAUNCHER ${CCACHE_PROGRAM} @${CMAKE_SOURCE_DIR}/ccache.conf)

My suggestion is to have a mapping of source-file paths to config paths in the user-specific config file, e.g. like this:
[...]

Thanks for the clarification and providing ideas.

Although it would not be possible to add syntax similar to your example since it would be backward incompatible, it could indeed be valuable to be able to have project-specific settings in the main configuration file. It would be possible with another syntax, though. That said, I think that such a feature would be a complement rather than an alternative to the @file functionality.

@BenPortner
Copy link

Hello @jrosdahl,

Thanks for your detailed explanation, as always. I now see that your solution avoids the performance issues of my initial solution. I also agree that

set(CMAKE_C_COMPILER_LAUNCHER ${CMAKE_COMMAND} ${CCACHE_PROGRAM} @ccache.conf)

is much more elegant than

set(ccacheEnv
    CCACHE_BASEDIR="${PARENT_DIR}"
    CCACHE_NOHASHDIR=true
    ...
)
set(CMAKE_C_COMPILER_LAUNCHER ${CMAKE_COMMAND} -E env ${ccacheEnv} ${CCACHE_PROGRAM})

I think this solution is really good. The only thing that still bothers me is that it requires the user to adapt the CMake configuration file. If you have any patience left (I hope), here is another idea:

How about a strongly limited search for project-specific config files? It could work like this: Ccache looks for a ccache.conf in the CWD. If it doesn't find one, it stops there and uses the user- and system-specific confs by default. However, the user has the option to adapt the search depth by setting an environment variable, e.g. CCACHE_CONFIGSEARCHDEPTH. Setting the variable to 2 would mean that ccache looks for the project-specific config in the CWD (0), the parent directories (1) and the parent's parent directories (2).
This solution offers the comfort of project-specific configs without significant performance impact and without the need for tampering with the CMake files. The search depth can be set in the user-specific config and it is left to the user to ensure that autofs and the likes are not used before dialing it up. What do you think?

@jrosdahl
Copy link
Member Author

jrosdahl commented May 1, 2024

Hi,

strongly limited search for project-specific config files

If we continue using CMake as the example build system (generator) which you want to configure to use ccache, you create a temporary build directory and start the build there. For ccache to find a configuration file in CWD the file must be located in the temporary build directory, so a configuration file bundled with the project won't be found. So the user must do something for ccache to find project-specific configuration, either:

  1. Set CCACHE_CONFIGSEARCHDEPTH to make ccache search in the parent directories, or
  2. Modify the CMake scripts to point ccache to the configuration, or
  3. Specify -D CMAKE_C_COMPILER_LAUNCHER="ccache;@ccache.conf" to CMake to override the CMake script's setting (if possible), or
  4. Copy the configuration file to the build directory.

The first one does not seem easier than setting CCACHE_DIRCONFIG=ccache.conf. The second and third ones are not different with your suggestion.

The last one would indeed be possible with your suggestion. Is that what you had in mind? It doesn't feel more ergonomic to me than the other options.

To sum it up, I'm afraid I don't understand which actual problem this would solve.

There is also a security aspect to this that I haven't mentioned before: the ccache configuration includes several settings that can modify which compilation command is invoked. Let's assume that ccache by default would look for configuration in CWD. If the user has set up ccache to masquerade as the compiler, then it would suddenly be potentially dangerous run e.g. gcc --version from a source code directory that carries a malicious or erroneous ccache configuration (which could execute an arbitrary script that sends the user secrets somewhere or destroys something, etc.). This is a bit similar to the problem described in CVE-2022-24765 for Git.

@srohmen
Copy link
Contributor

srohmen commented May 3, 2024

Unfortunately, I could not solve my use-case by using CMAKE_CXX_COMPILER_LAUNCHER, as this variable is completely disregarded for the CMake Visual Studio generator. msbuild (or whatever executes the build) will per default execute a compiler named cl.exe (aka MSVC). Every attempt to overwrite the compiler executable and command line in that environment failed (sometimes just strange race conditions in the build system, long painful story...). Thus, my solution is to use the compiler impersonation feature provided by ccache. So I create a symlink to ccache.exe into the build folder, named cl.exe.

Now, I have project specific settings (e.g. the base_dir configuration) and my modified ccache version picks up a ccache-extra.conf file from the current working directory implicitly. So far this work quite well, but I understand that there might be a better solution.

I am afraid that it is not possible to use the @<config-file> approach for cl.exe, as MSVC uses the same syntax to pass response files (*.rsp) to the compiler. These response files contain usually parts of the compile line if the command line becomes too long:
https://learn.microsoft.com/en-us/cpp/build/reference/at-specify-a-compiler-response-file?view=msvc-170

I thought it would be possible to tell ccache in the impersonation mode, still to "consume the following parameter as ccache parameter" and not pass it down to the real compiler. But I cannot find it in the documentation anymore. Maybe I just dreamed it or I am blind. However, the @<config-file> syntax without it would collide with the cl.exe command line interface.

EDIT:
Seems like I mixed it up with --ccache-skip: https://ccache.dev/manual/4.9.1.html#_extra_options
But this reads as it is actually exactly the opposite from what I would require.

@BenPortner
Copy link

BenPortner commented May 4, 2024

If we continue using CMake as the example build system (generator) which you want to configure to use ccache, you create a temporary build directory and start the build there.

Well, CMake doesn't really enforce out-of-source builds but you are right, it is the standard to do so. So a more reasonable default for the search depth would be 1 (instead of 0). This would still be manageable performance-wise, I assume?

The first one does not seem easier than setting CCACHE_DIRCONFIG=ccache.conf.

The main difference is that this config will complement the other configs, right? Also, I would argue that my idea is still simpler, because CCACHE_DIRCONFIG would have to be adapted for each project, whereas the default search depth can be set once in the user config.

There is also a security aspect to this that I haven't mentioned before

This is indeed a problem. But then again: If a code repository contains malicious code, it is a bad idea to compile it in any case?

@jrosdahl
Copy link
Member Author

@srohmen wrote:

I am afraid that it is not possible to use the @<config-file> approach for cl.exe, as MSVC uses the same syntax to pass response files (*.rsp) to the compiler.

Right, the @file syntax is only for ccache @<config-file> <compiler> <compiler-options> ("Add support for @file at the command line before the compiler"), not <ccache-masquerading-as-the-compiler> <compiler-options>, so there is no syntax collision in that case. The @file syntax to load configuration from a file was chosen precisely because it is similar to what compilers use to load options from a file.

The masquerading mode has never been able to parse compiler parameters as ccache parameters.

Do I understand correctly that you have not been able to make the output from the CMake Visual Studio generator run ccache.exe <compiler> <compiler-options>, but it would be possible to let it run <compiler> <extra-options> <compiler-options>? If so, we could consider adding support for something like --ccache-config @ccache.conf in the <compiler-options> part even in masquerading mode.

@jrosdahl
Copy link
Member Author

@BenPortner wrote:

Well, CMake doesn't really enforce out-of-source builds but you are right, it is the standard to do so. So a more reasonable default for the search depth would be 1 (instead of 0). This would still be manageable performance-wise, I assume?

I agree that 1 would be a better default.

The first one does not seem easier than setting CCACHE_DIRCONFIG=ccache.conf

The main difference is that this config will complement the other configs, right? Also, I would argue that my idea is still simpler, because CCACHE_DIRCONFIG would have to be adapted for each project, whereas the default search depth can be set once in the user config.

I guess we're talking about two separate things at once now:

  1. How to tell ccache to search for a directory-specific config file.
  2. When found, should this config file complement the ordinary config files or should the processing stop here?

Regarding the first: I don't understand why CCACHE_DIRCONFIG would have to be adapted for each project. Perhaps you could expand on why you think that would be the case? Can't you set CCACHE_DIRCONFIG/dir_config once in the user config just like you could set CCACHE_CONFIGSEARCHDEPTH/config_search_depth in the user config once?

Regarding the second: I think that skipping the ordinary configuration files does not sound like a good idea. For example, it would mean that cache-specific configuration like maximum cache size suddenly wouldn't be applied if there is a directory config in some project.

If a code repository contains malicious code, it is a bad idea to compile it in any case?

If a user executes a build script (or any script or bundled program for that matter) found in an unknown repository then that script could be malicious, and it's easy to realize that this could be a problem and take appropriate actions. What would not be easy to realize is that merely running a system command inside an unknown repository could execute arbitrary code from that repository. That's why I gave gcc --version as the example, not building code in the repository.

@srohmen
Copy link
Contributor

srohmen commented May 24, 2024

@jrosdahl

Do I understand correctly that you have not been able to make the output from the CMake Visual Studio generator run ccache.exe <compiler> <compiler-options>, but it would be possible to let it run <compiler> <extra-options> <compiler-options>? If so, we could consider adding support for something like --ccache-config @ccache.conf in the <compiler-options> part even in masquerading mode.

Yes, as long --ccache-config @ccache.conf is consumed by the masquerading ccache (and not passed to the real compiler) that should work. That would allow to pass --ccache-config @ccache.conf as additional compiler parameter via CMake using add_compile_options(--ccache-config @ccache.conf) or similar.

@rkapl123
Copy link

rkapl123 commented Jun 2, 2024

Unfortunately, I could not solve my use-case by using CMAKE_CXX_COMPILER_LAUNCHER, as this variable is completely disregarded for the CMake Visual Studio generator. msbuild (or whatever executes the build) will per default execute a compiler named cl.exe (aka MSVC). Every attempt to overwrite the compiler executable and command line in that environment failed (sometimes just strange race conditions in the build system, long painful story...). Thus, my solution is to use the compiler impersonation feature provided by ccache. So I create a symlink to ccache.exe into the build folder, named cl.exe.

Now, I have project specific settings (e.g. the base_dir configuration) and my modified ccache version picks up a ccache-extra.conf file from the current working directory implicitly. So far this work quite well, but I understand that there might be a better solution.

I am afraid that it is not possible to use the @<config-file> approach for cl.exe, as MSVC uses the same syntax to pass response files (*.rsp) to the compiler. These response files contain usually parts of the compile line if the command line becomes too long: https://learn.microsoft.com/en-us/cpp/build/reference/at-specify-a-compiler-response-file?view=msvc-170

I thought it would be possible to tell ccache in the impersonation mode, still to "consume the following parameter as ccache parameter" and not pass it down to the real compiler. But I cannot find it in the documentation anymore. Maybe I just dreamed it or I am blind. However, the @<config-file> syntax without it would collide with the cl.exe command line interface.

Hi, I'd like to add my use-case here, which is much simpler (although I tried the suggested config-file route as well). I only need to pass the base_dir or hash_dir option to ccache in order to facilitate caching across different source trees (typically when you pull different tags/branches and you like to work/test on them quickly. When dealing with large codebases this "quickly" becomes "painfully slow").

I successfully integrated ccache into cmake/Visual Studio (using ninja as the build system) using the suggested way here: https://github.com/ccache/ccache/wiki/MS-Visual-Studio#usage-with-cmake.

However I never could get the two options to pass through to the cl.exe (ccache)...

-regards,
Roland

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New or improved feature
Projects
None yet
Development

No branches or pull requests

5 participants