Support and Requirements
Installing Base/Command Line Version
Concurrency
Built-in Webserver
A Note on Cache
Known Issues
Configure With Apache and Celeryd
This is a simpler to install with Docker packaging (similar to running in virtual box). However, you won't be able to (or it is inconvenient to) use the command line version of MetAnntoate.
It works on a Mac and on Linux, however, be sure that your laptop meets required RAM and space requirement.
Follow instructions here https://bitbucket.org/doxeylab/metannotateinstaller
The rest of the README is for Ubuntu 14.04+ users with fair technical knowledge. It is required if you want to use the command line version of MetAnnotate.
Debian/Ubuntu, with at least 6GB of space + the space required to store the Refseq database.
The following packages are required. Note that in One Command Install, the dependencies are _automatically_installed
- python-dev (version 2.7)
- build-essential
- default-jre
- git
- wget
- python-mysqldb
- rabbitmq-server
- libssl-dev
- libffi-dev
- sqlite3
For Ubuntu:14.04:
NOTE machine root password is required and will be asked for
Step 1:
#cd to home directory
sudo apt-get update
if [ ! `which git` ]; then
sudo apt-get install -y git
fi
git clone --depth=1 https://bitbucket.org/doxeylab/metannotate.git
cd metannotate
bash one_command_install.sh
# enter password as required
python run_metannotate.py --orf_files=data/MetagenomeTest.fa --hmm_files=data/hmms/RPOB.HMM --reference_database=data/ReferenceTest.fa --output_dir=test_output --tmp_dir=test_tmp --run_mode=both
Note that in the example above, a [tiny] test reference database was specified to make process faster. If not specified, the default data/Refseq.fa database is used. You should now see run outputs in test_output
directory.
For more options:
python run_metannotate.py --help
You can speed up metannotate by specifying a greater concurrency in metannotate/concurrency.txt. This will have the effect of increasing concurrency for HMMER, FastTree, and pplacer commands.
The web server is installed as part of One Command Install
There are two files that provide additional configuration:
- metagenome_directories_root.txt
- metagenome_directories.txt
These files are already placed in the main metannotate directory.
metagenome_directories_root.txt contains the root path for all metagenome directories that will be read by the program. metagenome_directories.txt lists all the directories in that root directory that should be read as metagenome directories (in the case that you have other directories in the root directory that shouldn't be interpreted as metagenome directories).
To start server:
bash start-server.sh
# enter password as prompted
To stop server:
bash stop.sh
# enter password as prompted
Output of server is at out.txt
for debugging purposes.
Occasionally (when no job is running), you can run rm tmp/*
to free up disk space.
MetAnnotate has a cache layer that is cleaned every week. You can modify your Cronjob by typing crontab -e
to edit the clean up interval.
Repeating the same HMM files is much faster with cache (95% speed up).
If an ORF file has no HMM hit, the script will terminate and job will error out.
Configuring the app to run with an Apache server and an existing celery daemon process is safer and ensures that the tool runs on startup.
- sudo adduser celery
- sudo passwd celery # Create a password. Remember it for the next step.
- vim celery.pw # Add celery password to file.
- sudo adduser celery www-data
- Install https://raw.githubusercontent.com/celery/celery/3.1/extra/generic-init.d/celeryd into /etc/init.d/celeryd
- sudo chmod a+x /etc/init.d/celeryd
- sudo vim /etc/default/celeryd
# Names of nodes to start
# most will only start one node:
CELERYD_NODES="worker1"
#CELERY_RESULT_BACKEND='amqp://'
CELERY_TASK_RESULT_EXPIRES=172800
# Absolute or relative path to the 'celery' command:
CELERY_BIN="/usr/local/bin/celery"
# App instance to use
CELERY_APP="tasks"
# Where to chdir at start.
CELERYD_CHDIR="" # Change as needed to directory containing tasks.py.
# Extra command-line arguments to the worker
CELERYD_OPTS="--concurrency=1"
# %N will be replaced with the first part of the nodename.
CELERYD_LOG_FILE="/var/log/celery/%N.log"
CELERYD_PID_FILE="/var/run/celery/%N.pid"
# Workers should run as an unprivileged user.
# You need to create this user manually (or you can choose
# a user/group combination that already exists, e.g. nobody).
CELERYD_USER="celery"
CELERYD_GROUP="celery"
# If enabled pid and log directories will be created if missing,
# and owned by the userid/group configured.
CELERY_CREATE_DIRS=1
- sudo update-rc.d celeryd defaults
- sudo /etc/init.d/celeryd restart
- sudo apt-get install libapache2-mod-wsgi
- sudo vim /etc/apache2/sites-available/000-default.conf
<VirtualHost *:80>
#ServerName www.example.com
ServerAdmin webmaster@localhost
DocumentRoot /var/www/html
WSGIDaemonProcess metannotate
WSGIProcessGroup metannotate
WSGIScriptAlias / /path/to/app.wsgi # Change as needed.
<Directory /path/to/dir/containing/app.wsgi> # change as needed.
Options All
AllowOverride All
Require all granted
</Directory>
ErrorLog ${APACHE_LOG_DIR}/error.log
CustomLog ${APACHE_LOG_DIR}/access.log combined
</VirtualHost>
- sudo chgrp www-data realinput/
- sudo chown www-data realinput/
- sudo chmod g+w realinput/
- sudo chgrp www-data input/
- sudo chown www-data input/
- sudo chmod g+w input/
- sudo chgrp www-data output/
- sudo chown www-data output/
- sudo chmod g+w output/
- sudo chgrp www-data databases/
- sudo chown www-data databases/
- sudo chmod g+w databases/
- sudo chgrp www-data tmp/
- sudo chown www-data tmp/
- sudo chmod g+w tmp/
- sudo chgrp www-data session/
- sudo chown www-data session/
- sudo chmod g+w session/
Upon a restart of the system, all components will be restarted as well. If necessary, the components can be restarted manually using the following commands:
- sudo /etc/init.d/rabbitmq-server restart
- sudo /etc/init.d/celeryd restart
- sudo /etc/init.d/apache2 reload