Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using bioconductor-data-packages behind proxy #47347

Open
votti opened this issue Apr 19, 2024 · 0 comments
Open

Using bioconductor-data-packages behind proxy #47347

votti opened this issue Apr 19, 2024 · 0 comments

Comments

@votti
Copy link

votti commented Apr 19, 2024

Usecase

Working with sensitive data we are forced to work in an environment with limited internet access and conda/R/bioconductor repository mirrors.

Issue

While we have access to a proxied conda repository as well as a custom bioconductor repository, using bioconda packages using bioconductor-data-packages fails with the error:

$ mamba create -n test_bc -c bioconda -c conda-forge bioconductor-genomeinfodbdata`
...
Executing transaction:
ERROR conda.core.link:_execute(740): An error occurred while installing package 'bioconda::bioconductor-genomeinfodbdata-1.2.11-r43hdfd78af_1'.                                                
Rolling back transaction: done                                                                                                                                                                 
class: LinkError                                                                                                                                                                               
message:                                                                                                                                                                                       
post-link script failed for package bioconda::bioconductor-genomeinfodbdata-1.2.11-r43hdfd78af_1                                                                                               
location of failed script: /project/home/vizano/mambaforge/envs/test/bin/.bioconductor-genomeinfodbdata-post-link.sh                                                                           
==> script messages <==                                                                                                                                                                        
<None>                                                                                                                                                                                         
==> script output <== 
...                                                                                                                                                                         
stdout:                                                                                                                 
+ curl -L https://bioconductor.org/packages/3.18/data/annotation/src/contrib/GenomeInfoDbData_1.2.11.tar.gz                                                                                    
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current                                                                                                                
                                 Dload  Upload   Total   Spent    Left  Speed                                                                                                                  
  0     0    0     0    0     0      0      0 --:--:--  0:05:00 --:--:--     0                                                                                                                 
curl: (28) Failed to connect to bioconductor.org port 443 after 300878 ms: Timeout was reached                                                                                                 
                                                                                                                                                                                               
return code: 28   
...

Diagnosis

The reason for the error is that the URLs to download the data packages from the bioconductor repository are hardcoded in the dataURLs.json file of the bioconductor-data-packages recipe.

In our environment we only have access to our custom bioconductor repository, so the public one is not accessible.

Proposed solution

It would be great if there would be a way to customize the dataURLs.json, eg providing additional URLs via an environment variable.

Ugly workaround

I was able to get the package installed by:

  1. Finding the cached bioconductor-data-packages package in my conda installation: cd <conda installation path>/pkgs/bioconductor-data-packages-20231203-hdfd78af_0/share/bioconductor-data-packages
  2. Backup dataURLs.json: cp dataURLs.json dataURLs.json.b
  3. Replace the bioconductor URL with our custom repository: sed -i 's/https:\/\/bioconductor.org\//https:\/\/< custom domain path >\//g' dataURLs.json

Now the installation worked, but (rightfully) gave the warning:

SafetyError: The package for bioconductor-data-packages located at /project/home/vizano/mambaforge/pkgs/bioconductor-data-packages-20231203-hdfd78af_0
appears to be corrupted. The path 'share/bioconductor-data-packages/dataURLs.json'
has an incorrect size.
  reported size: 1152107 bytes
  actual size: 1219005 bytes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant