Skip to content

V0.7.0 release notes

Ben McClelland edited this page Feb 10, 2015 · 5 revisions

V0.7.0 release is focused on Validated Database and RAS Monitoring capabilities.

Base Operating System for v0.7.0 Release Validation: Centos 6.5

Outstanding issues known for this release:

  • Issue 39 ORCM socket errors retry forever and would eventually fill up the local disk if being logged
  • Issue 44 Occasionally, orcmd does not respond to a killall orcmd (SIGTERM)
  • Issue 70 Orcmd will complain if a compute node connects and no orcmsched is running
  • Issue 75 Nodepower reports suspicious values on the aggregator node
  • Issue 84 orcm wont build with libtool 2.4.3 or greater
  • Issue 90 Amount of data logged into the DB is different for aggregator and compute daemons
  • Issue 103 ORCM OPAL_PREFIX and LD_LIBRARY_PATH conflicts with openmpi.
  • Issue 106 OMPI --map-by board causes reentrant event loop
  • Issue 107 ORCM CLI/Job Launch tools --help shows non-posix switches.
  • Issue 129 ORCM orun still tries to execute binary for misconfigured MPI job
  • Issue 140 ORCMD - Inventory Gathering - Numeric fields for Inventory not stored in the proper format
  • Issue 144 ORCM logs negative values when asked to log only SIGAR component data
  • Issue 145 _ORCM aggregator does not writte monitoring data into the DB at all for some nodes _
  • Issue 147 ORCM orun does not wire up MPI processes
  • Issue 154 ORCM job launch CN warning -- shmem mmap error --
  • Issue 159 Aggregator daemon does not reconnect to database if for some reason the DB becomes unavailable during monitoring

Non-validated binaries included but not supported in this release:

  • orun
  • osub

A note on collecting sensor data:

Database performance is a key consideration when it comes to storing the sensor data that is being collected by the nodes. The actual database’s performance is highly dependent on a particular setup: the hardware characteristics and the DBMS configuration.

ORCM maintains a queue of database store requests. A slower database will mean that requests will get queued and there will be a lag before they reach the database. There are ways to tune ORCM appropriately to manage this. The amount of data that is written in the database depends on the number of nodes, the number of active sensors and the sampling rate. So, based on the current setup, these parameters can be tuned to achieve the desired performance.

Clone this wiki locally