Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

running OmaStandalone without network access #5

Open
EricDeveaud opened this issue Nov 24, 2021 · 9 comments
Open

running OmaStandalone without network access #5

EricDeveaud opened this issue Nov 24, 2021 · 9 comments

Comments

@EricDeveaud
Copy link

EricDeveaud commented Nov 24, 2021

Hello,
our cluster compute nodes does not have access to internet, so oma fails while trying to download at first http://purl.obolibrary.org/obo/go.obo

I may execute a run on a machine that have internet access and provide the $HOME/.cache/oma/GOdata.drw or for our users

but I saw that darwinlib/Taxonomy also perform a download from http://www.uniprot.org/taxonomy/?query=*&compress=yes&format=tab
is there a way that I can download and process (ConvertRawFile) this file and provide the resulting UniProtTaxonomy.drw file to our users in order to be abble to run oma without internet access.
this way oma will be really Standalone ;-)
regards

Eric

edit typo

@EricDeveaud
Copy link
Author

@alpae
Copy link
Member

alpae commented Nov 25, 2021

Hi Eric,

the uniprot taxonomy is only needed in very special config settings, e.g. with DoHierarchicalGroups := 'top-down';. So usually this is not needed. If you want to make sure that OmaStandalone is able to run without access to the internet in any configuration, you can download and convert the taxonomy with the following command:

bin/omadarwin -E << EOF
     datadirname := getenv('HOME').'/.cache/oma2';
     CallSystem('mkdir -p '.datadirname);
     GOdownload();
     TaxonomyDownload();
EOF 

This should create all the necessary files in the ~/.cache/oma folder of the current user (Gene Ontology and UniProt Taxonomy).

Cheers Adrian

@EricDeveaud
Copy link
Author

Hi Adiran.

many thanks for the input.

best regards

Eric

@EricDeveaud
Copy link
Author

hi Adiran.
works well I was abble to get the necessary files
thanks again.

I have few more question

  1. from ToyExample/parameters.drw one can read:
# Folder where auxillary data (e.g. GeneOntology definitions, etc)
# will be stored. The folder must be writable by the user. If not set
# or commented, the default will be ~/.cache/oma/
AuxDataPath := 'data/';

if I understand right when AuxDataPath is set on parameters.drw file it superseed datadirname set on $omadir/darwinlib/darwinit
is this right ?

  1. and when it is said The folder must be writable by the user. is there any other files than GOdata.drw.gz and UniProtTaxonomy.drw.gz that will be stored to this directory ?
    I ask because the installation scheme on our cluster is done on Read Only shared file system, So i must be sure that I can host the files on this one.
    If not I will have to provide some solution for users to be abble to store the required files

best regards

Eric

@alpae
Copy link
Member

alpae commented Nov 26, 2021

Hi Eric,

indeed, when you set AuxDataPath in the parameter file, this superseeds the default datadirname. The two files (and two symlinks) are the only files that are used from this folder. So in principle I think it would be ok to set the an absolute path for AuxDataPath in the parameters.drw file in the installation folder. when users generate a new parameter file for their project with oma -p, that path will already be set and used.

However, maybe it would be more sensible to have an environment variable that can be set as default. then, we could have set the path to these auxiliary data like:

  1. set in parameter file (AuxDataPath parameter)
  2. set to path from an environment variable if set
  3. use ~/.cache/oma as fall-back

would that make sense from your point and would simplify setting up the package on an HPC system?

Cheers Adrian

@EricDeveaud
Copy link
Author

thanks Adrian,
I endend with the same schema that you describe.

the "default" parametes.drw I provide have the following AuxDataPah set like this.
AuxDataPath := getenv('OMA_DATA');

and on OMA_DATA path we provide the GOdata.drw.gz and UniProtTaxonomy.drw.gz files

and it seems to work

can you provide me some information about the ftp://ftp.uniprot.org/pub/databases/uniprot/knowledgebase/docs/speclist.txt url used in darwinlib/TaxTools library ?

and finaly are you the author of Darwin ?
I would suggest to embed a private copy of darwin libs where GetTmpDir() from Wrappers/Common instead is used instead of having '/tmp/ hardcoded on multiples places.
many cluster out there set a TMPDIR environment variable that points to fast scratch location instead of usual /tmp

best regards

Eric

@alpae
Copy link
Member

alpae commented Nov 26, 2021

Hi Eric,

yes, that seems like a good setup.

regarding your darwin questions: yes, I am a co-author of that language. The darwinlib/TaxTools functionality isn't needed by OmaStandalone at all, so you won't need to download that data.

about hardcoded /tmp dir - where did you find that? I don't think that this is used anywhere. The GetTmpDir() function actually already uses the TMPDIR environment variable...

@EricDeveaud
Copy link
Author

Adrian,

thanks for your feedback

regarding the use of hardcoded /tmp in darwinlib
you may find it just by doing

rpm_maker:src/OMA > wget -q https://omabrowser.org/standalone/OMA.2.5.0.tgz 
rpm_maker:src/OMA > tar xf OMA.2.5.0.tgz 
rpm_maker:src/OMA > cd OMA.2.5.0/darwinlib/
rpm_maker:OMA.2.5.0/darwinlib > grep -Rl '/tmp' 
Wrappers/Common
FigPlot
Plot2Gif
ParExecSlave
FileConv
Descriptions
Server/MassDynSearch
Server/TreeGen
Server/MassSearch
Server/TreeConstruction
Server/AllAll
Server/PepPepSearch
Server/TestNewFunction1
Server/MultAlign
Server/cbrg.server
Server/Gendb
Server/AllAllDB
Server/TestNewFunction
Server/mail_handler
Server/PredictGenes
Server/NuclPepSearch
Server/EvolutionaryAnalysis
Ontology
ParExec2
MBA_Toolkit
Taxonomy
MySQL
IPC
DBTools
HelpText.txt

I guess some of this library files are not used by OMA. but some are ;-)

regards

Eric

@alpae
Copy link
Member

alpae commented Nov 29, 2021

Hi Eric,

indeed, there are quite a few places in the darwinlib, but in OmaStandalone, only the function in Taxonomy and Ontology are used. I will make an attempt to update these functions before the next OmaStandalone release. Thanks for your valuable feedback!

Best wishes
Adrian

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants