1.2 Core Features

Plugin architecture based on Module Component Architecture (MCA)
- Sophisticated auto-select algorithms based on system size, available resources
- Binary proprietary plugin support
- On-the-fly updates for maintenance*
- Addition of new plugin capabilities/features without requiring system-wide restart if compatibility requirements are met
Hardware discovery support
- Automatic reporting of hardware inventory on startup
- Automatic updating upon node removal and replacement
Scalable overlay network*
- Supports multiple topologies, including both tree and mesh plugins
- Automatic route failover and restoration, messages cached pending comm recovery
- Both in-band and out-of-band transports with auto-failover between them
Sensors
- Both push and pull models supported
- Read as a group at regular intervals according to a specified rate, or individual sensors can be read at their own regular interval, or individual readings of any combination of sensors can be polled upon request
- Polling requests can return information directly to the caller, or can include the reading in the next database update, as specified by the caller
- Data collected locally and sent to an aggregator for recording into a database at specified time intervals
- Environment sensors
  - Processor temperature - on-board sensor for reading processor temperatures when coretemp kernel module loaded
  - Processor frequency - on-board sensor for reading processor frequencies. Requires read access to /sys/devices/system/cpu directory
  - IPMI readings of AC power, cabinet temperature, water and air temperatures, etc.
  - Processor power - reading processor power from on-board MSR. Only supported for Intel SandyBridge, IvyBridge, and Haswell processors
- Resource utilization
  - Wide range of process and node-level resource utilization, including memory, cpu, network I/O, and disk I/O
- Process failure
  - Monitors specified file(s) for programmatic access within specified time intervals and/or file size restrictions
- Heartbeat monitoring
Analytics*
- Supports in-flight reduction of sensor data
- User-defined workflows for data analysis using available plugins connected in user-defined chains
- Analysis chain outputs can be included in database reporting or sent to requestor at user direction
- Plugins can support event and alert generation
- "Tap" plugin directs copied stream of selected data from specified point to remote requestor
- Chains can be defined at any aggregation point in system
Diagnostics*
- Supports system diagnostic capabilities includes memory diagnostic, processor diagnostic and Ethernet diagnostic
- Launch diagnostic from octl on specific node or a list of nodes
Database
- Both raw and processed data can be stored in one or more databases
- Supports both SQL and non-SQL* databases
  - Multiple instances of either type can be used in parallel*
  - Target database for different data sources and/or types can be specified using MCA parameters during configure and startup, and can be altered by command during operation*
- Cross-data correlation maintained
  - Relationship between job, sensor, and performance data tracked and linked for easy retrieval*
Fault Tolerance
- Self-healing communication system (see above)
- Non-heartbeat detection of node failures
- Automatic state recovery based on retrieval of state information from peers*

*indicates areas of development

Following are Sensys software tool components:

orcmd – An Sensys daemon which runs on all compute nodes with root privileges. This daemon is used for RAS Monitoring data collection (In-Band) sensor data from the compute nodes.
orcmd aggregator – Another Sensys daemon (orcmd) to collect OOB and In-Band RAS monitoring data from the compute nodes from a service node (Rack Controller, Row Controller or from a system management server). OOB data collection requires an IPMI access to BMC on the compute nodes. In-Band data collection goes through a management network. The orcmd is same as above with role assigned as aggregator in the Sensys configuration file (orcm-site.xml).
orcmsched – A scheduler daemon runs on the SMS node or Head node. This daemon is used for managing the compute resources in a cluster.
octl – An administrative command line interface

Home -> 1 Sensys

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.2 Core Features

Clone this wiki locally