Skip to content

Latest commit

 

History

History
289 lines (197 loc) · 12.1 KB

INSTALL.md

File metadata and controls

289 lines (197 loc) · 12.1 KB

Setting up RCloud

Installing using VM or Docker images

There are DockerHub images as well as VM images available with ready-to-run RCloud for simple configurations.

Installing on your system

Prerequisites

RCloud requires R 3.1.0 or higher and several R packages. If you want to compile R and all necessary packages from sources these are the necessary dependencies:

## Ubuntu 14.04 (or higher), Debian 8 (or higher) - dependencies:
sudo apt-get install gcc g++ gfortran libcairo-dev libreadline-dev libxt-dev libjpeg-dev \
libicu-dev libssl-dev libcurl4-openssl-dev subversion git automake make libtool \
libtiff-dev libpcre2-dev liblzma-dev libbz2-dev gettext redis-server rsync curl

## jupyter support
sudo apt-get install -y jupyter python-ipython python-ipykernel python-nbconvert python-nbformat python-jupyter-client python-jupyter-core

## to install R from the CRAN PPA - Ubuntu 14.04 or Debian 8 only (skip this step on 16.04 or 9!):
sudo add-apt-repository ppa:marutter/rrutter
sudo apt-get update

## install R - alternatively you can install from sources
sudo apt-get install r-base-dev

## RedHat/CentOS 6+
sudo yum install gcc-gfortran gcc-c++ cairo-devel readline-devel libXt-devel libjpeg-devel \
bzip2-devel xz-devel libicu-devel boost-devel openssl-devel libcurl-devel subversion git automake redis

## install R
sudo yum install R

## jupyter support
sudo yum install python3-jupyter-core python3-ipykernel python3-nbconvert python3-nbformat python3-jupyter-client

If you have already R installed you may need only a subset of the above.

Installing from GitHub

Check out the RCloud repository to a place that will be your RCloud root directory. For illustration purposes we will use /data/rcloud as the root, but it can be any other directory.

cd /data

## check out RCloud from GitHub
git clone https://github.com/att/rcloud.git
cd rcloud

## install all dependent R packages
sh scripts/bootstrapR.sh

Now is the time to edit the configuration

## copy configuration template and edit it
cp conf/rcloud.conf.samp conf/rcloud.conf
vi conf/rcloud.conf

You probably want to set the Host: configuration option and Github authentication - see the next section for details. Once you have done that, you can start RCloud

sh scripts/fresh_start.sh

You can use the above also for re-starts. If you didn't touch any code and want to restart, you can add --no-build for a quick re-start without re-building any packages.

Once it is running, you can go to http://your-host:8080/login.R to login to your RCloud instance. We don't supply index.html as part of the sources so you can customize the user experience on your own.

Gist storage

There are three different possible was to store gists in RCloud:

  1. gitgist: local git repositories. This is the most simple setup where each notebook is a local git repository. It only works if you use a single compute node since there is no multi-node access (and no locking).
  2. GitHub: a GitHub installation (either public github.com for GitHub Enterprise) is used to manage notebooks, RCloud is registered as an application. In additon, GitHub can provide authentication using OAUTH. RCloud uses gists for storage and Github accounts for authentication.
  3. gist service: a Java-based server (rcloud-gist-service) that uses git repositories locally, but exposes them using gist API. This is the recommended setup for multi-user and multi-node deployments, but the most complex to setup.

Using gitgist

Simply uncomment the following lines in rcloud.conf:

gist.backend: gitgist
gist.git.root: ${ROOT}/data/gists

The latter allows you to specify the directory where to store the git repositories.

Using GitHub

You'll need to create a github application. This github application will need to point to the location you will deploy RCloud (let's assume you're only testing it for now, so 127.0.0.1 works). In that case, your application's URL will most likely be http://127.0.0.1:8080, and your Callback URL must be http://127.0.0.1:8080/login_successful.R. (the host and port need to match the application URL, and the path must be login_successful.R).

Then, you need to create a file under your configuration root (typically that's /conf) called rcloud.conf (see rcloud.conf.samp in the distribution of an example starting point and the rcloud.conf WiKi page for full details). If you're using github.com, then your file will look like this:

github.client.id: your.20.character.client.id
github.client.secret: your.40.character.client.secret
github.base.url: https://github.com/
github.api.url: https://api.github.com/
github.gist.url: https://gist.github.com/

The last three lines are the base URL of the github website, the entry point for the github API and the entry point for gists.

Enterprise Github deployment

If you have an enterprise github deployment where the gists URL ends with /gist instead of beginning with gist., you may need to omit github.gist.url.

If you'd like to control which Github users are allowed to log in to your RCloud deployment, you can add a whitelist to your configuration:

github.user.whitelist: user1,user2,user3,etc

Using gist service

The gist service can be obtained from https://github.com/att/rcloud-gist-services. It also requires RCloud setup with a SessionKeyServer (see below) for authentication as the service itself doesn't provide authentication. See its documentation for configuration. Once setup, it is configured in RCloud using the same method as GitHub with few additional twists. Assuming a host rcloud.example.com and a gist service running on gist.example.com and SessionKeyServer running on sks.example.com the configuration could look like this:

session.server: https://sks.example.com:4301
github.client.id: server1
github.client.secret: X
github.api.url: https://gist.example.com:13020/
github.auth: exec.token
github.auth.forward: https://rcloud.example.com/login_successful.R
rational.githubgist: true

The client.id is arbitrary but can be used to manage multiple SessionKeyServers mappings if multiple authentication servers are used (rare). The client.secret is arbitrary but must be present. github.auth.forward must be set to specify where to forward successful authentication. In the GitHub setup this was specified in the application, but since there is no application configuration here, you have to tell RCloud itself. Finally rational.githubgist tells RCloud that it doesn't have to work around some idiosyncrasies and bugs in GitHub's implementation of the API.

Hostnames

If your computer doesn't resolve its hostname to what you will be using, (let's say 127.0.0.1) you may also want to add:

host: 127.0.0.1

Then go to http://127.0.0.1:8080/login.R and authorize access to your account. You can also use Cookie.domain: instead if you want to have more control over the domain that will hold authentication tokens. A special setting Cookie.domain: * can be used in cases where the depolyment location is not known ahead of time (e.g., a floating container or VM) in which case the browser is left to decide on the domain setting.

Installing from a distribution tar ball

If you don't have internet access on the target machine, it is possible to install RCloud from a distribution tar ball which has all dependent packages included. The process is essentially identical to the above, only that you don't use git to check out the sources, but unpack the tar ball instead.

Make sure R 3.1.0 or higher is installed. Download the distribution tar ball, change to the directory where you want to install it, e.g. /data and run

$ tar fxz rcloud-2.1.tar.gz
$ cd rcloud
$ sh scripts/bootstrapR.sh

It will install the packages included in the release tar ball. Copy conf/rcloud.conf.samp to conf/rcloud.conf and edit it to match your GitHub setup. Then start RCloud via

$ sh scripts/fresh_start.sh

Will you be hacking on the code?

If you're just running RCloud, skip this session. If you're going to be hacking the code, you'll need to install a recent version of node.js. Then, in your shell:

$ npm install

This will install the node.js dependencies necessary to concatenate and minify the JavaScript files used in RCloud.

Starting rcloud

The safest way to install rcloud currently is to simply run the scripts/fresh_start.sh script. This will reinstall the rcloud.support package, recompile the javascript files (if you have node.js and the necessary dependencies installed), kill any old instances of RCloud running, deauthorize all login tokens (only if SessionServer is not used), and start a new version of RCloud. For repeated starts it is also possible to use

sh scripts/fresh_start.sh --no-build

Pitfalls

If you have trouble with authentication, make sure your hostname is FQDN (fully qualified domain name) and it matches your external name. You can use hostname -f to check that. The reason is that the cookie domain defaults to the hostname if not otherwise specified. If either of the above is not true, then create conf/rcloud.conf file with

Cookie.Domain: myserver.mydomain

Alternatively, you can set Host: instead with the same effect (Host is used in other places not just the cookie domain).

Also if things are failing, make sure you have the latest R packages installed. You can use update.packages including both CRAN and http://rforge.net as the repository. Also you can run

sh scripts/build.sh --all

to re-build all packages in RCloud.

Optional functionality

Redis

It is strongly recommended to use Redis as the back-end for key/value storage in RCloud. Install Redis server (in Debian/Ubuntu sudo apt-get install redis-server) and add rcs.engine: redis to the rcloud.conf configuration file.

Note: the default up until RCloud 1.0 is file-based RCS back-end which is limited and deprecated and thus the default may become Redis in future releases.

Search

RCloud 1.0 uses Apache Solr to index gists and provide search functionality if desired. See conf/solr/README.md for details. Quick start: install Java JDK (Debian/Ubuntu sudo apt-get install openjdk-8-jdk) and run

cd $ROOT/conf/solr
sh solrsetup.sh $ROOT/services/solr

assuming $ROOT is set to your RCloud root directory. It will download Solr, setup the configuration, start Solr and create a collection used by RCloud. Then add

solr.url: http://127.0.0.1:8983/solr/rcloudnotebooks

to rcloud.conf.

SessionKeyServer

For enhanced security RCloud can be configured to use a session key server instead of flat files. To install the reference server (it requires Java so e.g. sudo apt-get install openjdk-8-jdk), use

cd $ROOT
mkdir services
cd services
git clone https://github.com/s-u/SessionKeyServer.git
cd SessionKeyServer
make
sh run &

Then add Session.server: http://127.0.0.1:4301 to rcloud.conf.

PAM authentication

This is the most advanced setup so use only if you know how this works. If you want to use user switching and PAM authentication, you can compile PAM support in the session server - make sure you have setup the session server (see above) and PAM is available (e.g. sudo apt-get install libpam-dev on Ubuntu/Debian), then

cd $ROOT/services/SessionKeyServer
make pam

You may need to edit the Makefile if you're not on Ubuntu/Debian since it assumes java-7-openjdk-amd64to find the Java components. Common configuration in that case:

Exec.auth: pam
Exec.match.user: login
Exec.anon.user: nobody
HTTP.user: www-data

This setup allows RCloud to switch the execution environment according to the user than has authenticated. For this to work, RCloud must be started as root. Again, use only if you know what you're doing since misconfiguring RCloud run as root can have grave security implications.