Skip to content

3.7 octl

lrrajesh edited this page Sep 14, 2016 · 1 revision

Admin-focused tool for interacting with Sensys. This tool has the ability to run as an interactive shell or as a single one-shot command. Currently the tool provides information about configured resources, sensors and, allows to modify the workflows of Sensys.

The octl command itself takes the following options:

Usage: octl [OPTIONS]
  Open Resilient Cluster Manager "octl" Tool

   -am <arg0>            Aggregate MCA parameter set file list
   -gomca|--gomca <arg0> <arg1>
                         Pass global MCA parameters that are applicable to
                         all contexts (arg0 is the parameter name; arg1 is
                         the parameter value)
-h|--help                This help message
-l|-config-file|--config-file <arg0>
                         Logical group configuration file for this orcm
                         chain
   -omca|--omca <arg0> <arg1>
                         Pass context-specific MCA parameters; they are
                         considered global if --gomca is not used and only
                         one context is specified (arg0 is the parameter
                         name; arg1 is the parameter value)
   -tune <arg0>          Application profile options file list
-v|--verbose             Be Verbose
-V|--version             Show version information

Interactive shell:
Use 'quit' or 'exit' - for exiting the shell
Use '<tab>' or '<?>' for help

The subcommands have the option to take arguments specific to that command as well.

3.1.1.1 Interactive CLI

The interactive mode of the CLI is invoked by running the command without any subcommands. Optional arguments such as MCA parameters can be specified as well.

% octl
*** WELCOME TO OCTL ***
 Possible commands:
   resource             Resource Information
   diag                 Diagnostics
   sensor               sensor
   notifier             notifier
   grouping             Logical Grouping Information
   Workflow             Workflow information
   query                Query data from DB
   chassis-id           Enable/Disable chassis identify LED.
   quit                 Exit the shell
octl>

Once in the interactive shell, the <tab> key can be used to either autocomplete unambiguous partial commands or list possible completions for ambiguous partial commands. The <?> key will display more information about all of the commands at the current hierarchy.

For example, pressing <tab> after entering res will autocomplete the resource command, and pressing <tab> after the resource command is fully entered, will show:

octl> resource
        status add remove drain resume
octl> resource

As another example, pressing <?> after entering sensor, will show:

octl> sensor 
Possible commands:
   set                  Set Sensor Commands
   get                  Get Sensor Commands
   enable               Enable sampling for the current datagroup or sensor for a node-list: enable <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
   disable              Disable sampling for the current datagroup or sensor for a node-list: disable <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
   reset                Reset sampling to service load defaults for the current datagroup or sensor for a node-list: reset <node-list> <datagroup|"all"[:{sensor_label|"all"}]>
   store                Sensor store Commands

octl> sensor

Notice how in both cases, control is returned to the user to complete the command as desired.

To exit interactive mode type 'quit' from the command prompt.

In the following sections, examples for each command will be given using normal one-shot execution mode. However, they can also be executed in interactive mode.

3.1.1.2 Resource

The resource command set (i.e. status, add/remove, drain/resume) is used to display and change information about the resources (nodes) configured in the system. Currently resource add/remove is not supported administratively.

(1) resource status

The current implementation displays the node connection state: either up(U), down(D), or unknown(?) and the job state: allocated or unallocated. The node specification is an Sensys node regex.

Usage:
% octl resource status

Example output:

TOTAL NODES : 10
NODES                : STATE  SCHED_STATE
-----------------------------------------
node001              : U      UNALLOCATED
node[3:2-10]         : ?            UNDEF

(2) resource drain

This command only takes the nodes out of the available resource pool of the scheduler. It has nothing to do with the runningstatus of the nodes.

Usage:

% octl resource drain <nodelist>

Example:

% octl resource drain c[2:1-10]

(3) resource resume

This command only brings the nodes into the available resource pool of the scheduler. It has nothing to do with the running status of the nodes.

Usage:

% octl resource resume <nodelist>

Example:

% octl resource resume c[2:1-10]

3.1.1.3 Diag

The diagnostic command set allows running diagnostics on remote Sensys daemons. These commands require a node regex specification for determining which remote daemons to run on. See the 3.3 Sensys Node Regular Expressions section for details on how to construct the regex.

Example 1: run cpu diagnostics on node001 through node010

% octl diag cpu node[3:1-10]
Success

Example 1: run ethernet diagnostics on node001 through node010

% octl diag eth node[3:1-10]
Success

Example 1: run memory diagnostics on node001 through node010

% octl diag mem node[3:1-10]
Success

3.1.1.4 Workflow

The workflow command allows the user to add/remove/list workflows from any management node on a cluster.

Example 1: Adding the workflow

        % octl workflow add workflow.xml [target]

After using the above command, user will be able to see the workflow_id on the console. Here Target is the aggregator list and is an optional argument. User can use the target argument and overwrite the aggregator info provided in the XML file

Example 2: list the workflows

        % octl workflow list target

After using the above command, user will be able to see a list of workflows running on a target/aggregator with workflow names and workflow ids

Example 3: Remove the workflow

        % octl workflow remove target workflow_name workflow_id

After using the above command, user will be able to remove a workflow running in the target. Here, wild-card character "*" is allowed for workflow_name and workflow_id. Remember that if wild-card character is used for workflow_name, all the workflows in target will be removed regardless of what the workflow_id will be.

Sample workflow XML files can be found at section 3-Sensys-User-Guide/3.9-Data-Smoothing-Algorithms-Analytics.

It is mandatory to have name attribute for workflow and step tags. In every workflow, it is mandatory to have "filter" as the first step. Parser will not accept an attribute or element other than name or step used in workflow element.

Default loading of workflow file is supported and the workflow file should be located at the following location for the feature to work.

        <user install directory>/etc/orcm-default-config.xml

If default workflow is foind at the above location, it is loaded regardless of the aggregator tag present in the XML file

3.1.1.5 Sensor Inventory Listing

This command is designed to retrieve from the inventory database the list of sensors for given node(s) that can be used in the Analytics plugins (described immediately above in 3.1.1.6) The command syntax is:

% octl sensor get inventory <node-regex|logical-group> [sensor_search_string]

Or in the interactive shell:

octl> sensor get inventory <node-regex|logical-group> [sensor_search_string]

The node-regex is described in section 3.3 and the logical grouping is described in section 3.5. The ''sensor_search_string'' at this time is limited to searching the stored plugin names. Valid formats include omitting the parameter to get all sensors matching the regex or logical name. Using '*' for sensor_search_string does the same action as omitting sensor_search_string. Other search syntax include partial plugin names (NOTE: search is done as if your string was followed by the wildcard character '*').

Examples:

% octl sensor get inventory node01 ip

This will match all plugins that start with 'ip' like ipmi. An equivalent would be:

% octl sensor get inventory node01 'ip*'

This explicitly has the wildcard implied by the first example. All wildcard usage must be a “starts-with” wildcard. In 99% of use cases this is sufficient. If this is not ideal for a specific case then the suggestion would be to make a script filter in bash, python or other language appropriate to the job. An example:

% octl sensor get inventory node01 '*pm*'

This will not match the ipmi plugin and will in-fact return an error.

The output is in standard CSV format with double quotes for both machine parsing (importable into most spreadsheet programs) and human readability. The first line for each new node output is a CSV header with names for the columns. The nodes are output one after another since each node may have different sensors listed in the database. This feature on the OCTL tool does not merge results into one set or union of sensors.

3.1.1.6 Sensor Sample Rate Listing

This command is designed to retrieve the sample rate for the individual sensor for a list of nodes. The command syntax is:

% octl sensor get sample-rate <sensor-name> <node-regex|logical-group>

Or in the interactive shell:

octl> sensor get sample-rate <sensor-name> <node-regex|logical-group>

Examples:

% octl sensor get sample-rate base c[2:1-10]
% octl sensor get sample-rate coretemp c[2:1-10]
% octl sensor get sample-rate freq c[2:1-10]

3.1.1.7 Sensor Sample Rate Setting

This command is designed to set the sample rate for the individual sensor for a list of nodes. Note that for per sensor (e.g. coretemp, freq) sample rate, this command will take effect only if the sensor is started in a progress thread. The setting of the base sample rate does not have this requirement. The command syntax is:

% octl sensor set sample-rate <sensor-name> <node-regex|logical-group>

Or in the interactive shell:

octl> sensor set sample-rate <sensor-name> <node-regex|logical-group>

Examples:

% octl sensor set sample-rate base 10 c[2:1-10]

Setting per sensor sample rate in the examples below will take effect only if the sensor is started in a progress thread.

Examples:

% octl sensor set sample-rate coretemp 10 c[2:1-10]
% octl sensor set sample-rate freq 10 c[2:1-10]

3.1.1.8 Sensor Sampling Control

The sensor sampling control is designed to enable, disable, or reset the sensor sampling for datagroups (plugin names). The orcmd service startup state of the sensor data sampling is controlled by the following set of MCA paramaters:

sensor_{sensor-name|"base"}collect_metrics = "true" | "false"

Using "base" turns on ("true") or off ("false") all sensors loaded using the "sensor" MCA parameter (only loaded sensors can be controlled not plugins excluded from being loaded). Using the sensor_name instead of "base" overrides the sensor_base_collect_metrics MCA parameter. The default values of individual datagroup (plugin) is inherited from sensor_base_collect_metrics at orcmd service load time. This octl command requires the orcmsched be running in the cluster.

The command and interactive shell

octl sensor {enable|disable|reset} {node-regex|logical-group} {<datagroup|"all"}

For example:

$ octl sensor disable node01 all
$ octl sensor enable node01 errcounts

After these commands node01 will only be logging data from the errcounts datagroup (plugin). Then:

$ octl sensor reset node01 all

restores node01 to its original sampling state (defined by MCA parameters at orcmd service load time).

For finer control the following syntax can be used

octl sensor {enable|disable|reset} {node-regex|logical-group} {datagroup|"all"}:{sensor-label-name}

where sensor-label-name is and individual label from the datagroup. This effectively gives control over individual sensor data items. For example in the coretemp datagroup the sensor labels use the naming convention "core_N_" where N is the zero based core number. So the following steps will have orcmd sample only "core3" from the "coretemp" datagroup on "node01".

$ octl sensor disable node01 all
$ octl sensor enable node01 errcounts:core3

The string "all" can also be used for labels except when the datagroup is specified as "all". Using "all:all" is not legal, instead just use "all". Also using "all:{sensor-label-name}" is not legal since there is no commonality of sensor-label-name between datagroups.

Not all current sensor datagroups respond to sensor sampling control. Notable exceptions are

  • heartbeat - This is not a real senor datagroup.
  • dmidata - No control possible as this doesn't collect periodic data.
  • resusage - Only datagroup control. No sensor level control possible.
  • nodepower - Only datagroup control. Only one sensor label is returned.
  • mcedata - Only datagroup control. Only one sensor label is returned.
  • syslog - Only datagroup control. Only one sensor label is returned.

3.1.1.9 Sensor Storage Control

Sensor Storage control is designed to control environmental(raw) and event data. There are couple of MCA parameters provided to control them respectively. analytics_base_store_raw_data controls environmental data and analytics_base_store_event_data controls event data.

But, to controls these MCA parameters on fly, following four OCTL commands are provided.

To store Environmental data:

$ octl sensor store raw_data target

Note that this command controls only sensor environmental data. It doesn't turn on/off the other storage policies (like event data)

To store Event data:

$ octl sensor store event_data target

Note that this command controls only event data. It doesn't turn on/off the other storage policies (like environmental data)

To store both Environmental and Event data

$ octl sensor store all target

To disable storing both Environmental and Event data

$ octl sensor store none target

For octl sensor store command Target means the aggregator list.

3.1.1.10 Query

This component is used to query the database in order to obtain basic information of the nodes and its sensors. In order to get the query component working you need to meet this environment:

  • Set the following flags on the openmpi-mca-params.conf file:
db_postgres_database=orcmdb
db_postgres_user=orcmuser:orcmpassword
db_postgres_uri=localhost
  • Launch an octl console:
    $ bin/octl

Whenever you need to specify a node list in a query command you can use:

  • Simple comma separated list. Example: host01,host02,host03
  • Sensys nodes regular expression: Example: host[2:1-3] to specify every node.
  • Sensys nodes with wildcards: Example: host* to specify every node.
  • '*' to specify every node.

Query commands:

  • HISTORY:
    • SYNTAX:
        query history [start-date [end-date]] <nodelist>
- USE: Returns all the data logged by the provided nodes (<nodelist>) during the specified time.
- Date has the format: YYYY-MM-DD hh:mm:ss
                       YYYY-MM-DD (00:00:00 will be added)
                       hh:mm:ss (current day will be added)
- EXAMPLE:
        $ query history 2015-11-13 15:00:00 2015-11-13 16:00:00 master4
- OUTPUT:
TIME,HOSTNAME,DATA ITEM,VALUE,UNITS
2015-10-21 15:31:47,master4,procstat_orcmd_pid,11960,
2015-10-21 15:31:48,master4,procstat_running_processes,0,
2015-10-21 15:31:48,master4,procstat_zombie_processes,0,
2015-10-21 15:31:49,master4,coretemp_core0,23,degrees C
4 rows were found (0.091 seconds)
  • SENSOR:
    • SYNTAX:
        query sensor <sensor-list> [start-date [end-date]]  <upper-bound lower-bound> [nodelist]
- USE: Returns the logged data corresponding to the given sensor, time and node list. Additionally, the query can be constrained within a range of values defined by an upper and lower bound. Notice that logs containing the bound values will not be included in the result.
- Date has the format: YYYY-MM-DD hh:mm:ss
                       YYYY-MM-DD (00:00:00 will be added)
                       hh:mm:ss (current day will be added)
- EXAMPLE:
        $ query sensor coretemp* 2015-11-13 14:00:00 2015-11-13 16:00:00 0.1 1 master4
- OUTPUT:
TIME,HOSTNAME,DATA ITEM,VALUE,UNITS
2015-10-21 15:31:38,master4,coretemp_core 0,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 1,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 2,35,degrees C
2015-10-21 15:31:38,master4,coretemp_core 3,35,degrees C
...
10 rows were found (0.141 seconds)
  • LOG:
    • SYNTAX:
        query log <search word> [start-date [end-date]] [nodelist]
- USE: Returns the logged data coming from the syslog of the corresponding to the given nodes and search word. The search word accepts the '*' wildcard.
- Date has the format: YYYY-MM-DD hh:mm:ss
                       YYYY-MM-DD (00:00:00 will be added)
                       hh:mm:ss (current day will be added)
- EXAMPLE:
        $ query log *access* master4,c01
- OUTPUT:
HOSTNAME,SENSOR_LOG,MESSAGE
master4,syslog_log_message_0,<86>Oct 28 08:30:29 c01: access granted for user root (uid=0)
c01,syslog_log_message_0,<86>Oct 28 08:30:33 master4: access granted for user root (uid=0)
2 rows were found (0.056 seconds)
  • IDLE:
    • SYNTAX:
        query idle [minimum idle time] <nodelist>
- USE: Returns the nodes in <nodelist> that has been idle for the given time or more.
- Minimum idle time has the format: #.#U
                                    U = H for hours, M for minutes, S for seconds (default)
                                    hh:mm:ss
- EXAMPLE:
        $ query idle 60 master4
        $ query idle 60S master4
- OUTPUT:
        NODE,IDLE_TIME
        master4,03:13:59.024666
        1 rows were found (0.016 seconds)
  • STATUS:
    • SYNTAX:
        query node status <nodelist>
- USE: Returns the status logged in the data base for the nodes in <nodelist>.
- EXAMPLE:
        $ query node status master4
- OUTPUT:
    There is no output for now as this query returns the field "status" on the "node" table and this field is not being populated yet.
  • EVENT
    • DATA:
      • SYNTAX:
            query event data [start-date [end-date]] <nodelist>
    - USE: Returns events from database.
    - Date has the format: YYYY-MM-DD hh:mm:ss
                           YYYY-MM-DD (00:00:00 will be added)
                           hh:mm:ss (current day will be added)
    - EXAMPLE:
           $query event data 2016-02-16 08:22:00 2016-02-16 08:22:14 master4
     - OUTPUT:
EVENT_ID,TIME,SEVERITY,TYPE,HOSTNAME,MESSAGE
66846,2016-02-16 08:22:04,CRITICAL,EXCEPTION,master4,core 0 value 44.00 degrees C,greater than threshold 25.00 degrees C
66847,2016-02-16 08:22:04,CRITICAL,EXCEPTION,master4,core 1 value 42.00 degrees C,greater than threshold 25.00 degrees C
66865,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 2 value 44.00 degrees C,greater than threshold 25.00 degrees C
66866,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 3 value 41.00 degrees C,greater than threshold 25.00 degrees C
66881,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 4 value 45.00 degrees C,greater than threshold 25.00 degrees C
66882,2016-02-16 08:22:09,CRITICAL,EXCEPTION,master4,core 5 value 46.00 degrees C,greater than threshold 25.00 degrees C
6 rows were found (0.396 seconds)
- SENSOR-DATA:
    - SYNTAX:
            query event sensor-data <event-id> [interval] <sensor-list> [nodelist]
    - USE: Returns the sensor data around an event.
    - Interval has the format: [before/after] #.#U
                               U = H for hours, M for minutes (default), S for seconds
                               [before/after] hh:mm:ss
    - EXAMPLE:
            $ query event sensor-data 1 after 10S coretemp* master4
    - OUTPUT:
TIME,HOSTNAME,DATA_ITEM,VALUE,UNITS
2016-02-09 13:53:52,master4,coretemp_core 0,23,degrees C
2016-02-09 13:53:52,master4,coretemp_core 1,23,degrees C
2016-02-09 13:53:52,master4,coretemp_core 2,23,degrees C
3 rows were found (0.396 seconds)

Please notice that due to the amount of information that could result from the usage of the history or sensor subcommands, this tool is currently limiting those subcommands to return only 100 rows. If users need to inspect more data or require more SQL-oriented functionality, they are advised to access the DB directly by other means.

3.1.1.11 Notifier

The 'notifier' commands are used for configuring and querying the policies for error and exception notification during runtime.

  • Following command line options are used for setting up severity levels and corresponding actions:

  • NOTIFIER:

    • SYNTAX:
        notifier set policy <severity> <action> <nodelist>
- USE: Returns success - when operation is successful or Returns failed - with an error message.
- EXAMPLE:
$ octl notifier set policy emerg  smtp   rack01
$ octl notifier set policy alert  smtp   rack01
$ octl notifier set policy crit   smtp   rack01
$ octl notifier set policy error  syslog rack01
$ octl notifier set policy warn   syslog rack01
$ octl notifier set policy notice syslog rack01
$ octl notifier set policy debug  syslog rack01
- OUTPUT:
        Notifier set policy on Node:rack01

        Success
  • Following command line options are used for retrieving severity levels and corresponding actions:

  • NOTIFIER:

    • SYNTAX:
        notifier get policy <nodelist>
- USE: Returns success - when operation is successful or Returns failed - with an error message.
- EXAMPLE:
        $ octl notifier get policy rack01
- OUTPUT:
Node          Severity      Action
-----------------------------------
rack01        EMERG         smtp
rack01        ALERT         syslog
rack01        CRIT          syslog
rack01        ERROR         syslog
rack01        WARN          syslog
rack01        NOTICE        syslog
rack01        INFO          syslog
rack01        DEBUG         syslog
  • Following command line options are used for changing the existing smtp configuration:

  • NOTIFIER:

    • SYNTAX:
        notifier set smtp-policy <key> <value> <nodelist>
- USE: Returns success - when operation is successful or Returns failed - with an error message.
- EXAMPLE:
$ octl notifier set smtp-policy server_name <email-server>  rack01
$ octl notifier set smtp-policy server_port <portno>  rack01
$ octl notifier set smtp-policy to_addr <email-addr>  rack01
$ octl notifier set smtp-policy from_addr <email-addr>  rack01
$ octl notifier set smtp-policy from_name <name>  rack01
$ octl notifier set smtp-policy subject <email-subject>  rack01
$ octl notifier set smtp-policy body_preffix <email-body-preffix>  rack01
$ octl notifier set smtp-policy body_suffix <email-body-suffix>  rack01
$ octl notifier set smtp-policy priority <1-high,2-normal,3-low>  rack01
- OUTPUT:
        Notifier set smtp-policy on Node:rack01

        Success
  • Following command line options are used for retrieving smtp configuration information from a given nodelist:

  • NOTIFIER:

    • SYNTAX:
        notifier get smtp-policy <nodelist>
- USE: Returns success - when operation is successful or Returns failed - with an error message.
- EXAMPLE:
        $ octl notifier get smtp-policy rack01
- OUTPUT:
NODE            SMTP_KEY          SMTP_VALUE
-----------------------------------------------
rack01          server_name   emailserver.com
rack01          server_port   25
rack01          to_addr       [email protected]
rack01          from_addr     [email protected]
rack01          from_name     RAS Monitoring System
rack01          subject       EVENT-NOTIFICATION
rack01          body_prefix   NOTIFICATION MESSAGE BEGIN
rack01          body_suffix   NOTIFICATION MESSAGE END
rack01          priority      1

3.1.1.12 Chassis-id

The chassis-id command is used for enabling or disabling the chassis ID LED from a given node. Every chassis-id action will be stored as an event into database. The command sintax is:

octl > chassis-id {state|enable [seconds]|disable} <node_list>

Where node_list could be

  • Sensys regex
  • logical group
  • a node list separated by a comma

The state sub-command retrieves the current state of the chassis ID LED on the requested nodes.

Usage:
    octl> chassis-id state myNode

Output:

    Node          Chassis ID LED
    -----------------------------------
    myNode               OFF

The enable sub-command turns ON the chassis ID LED on the requested nodes following the next rules:

  • The chassis ID LED will be turned ON the specified seconds.
  • If no seconds are specified, the chassis ID LED will be turned ON indefinitely.
  • The maximun value for seconds is 255 seconds.
Usage:
    octl> chassis-id enable [seconds] myNode

Output:
    myNode: Success!
Or:
    myNode: Failed
    Failure information

The disable sub-command turns OFF the chassis ID LED on the requested nodes

Usage:
    octl> chassis-id disable myNode

Output:
    myNode: Success!
Or:
    myNode: Failed
    Failure information

NOTE: This command is hardware-dependent and it might not be supported on all platforms.

NOTE: Ipmi support is required for this feature.

NOTE: Ipmi and nodepower sensors may interfere with chassis-id command due to limitations in the BMC and/or the ipmiutil driver.

NOTE: Both enable and disable subcommands perform write operations. It is important to configure BMC access with the proper privilege level, otherwise the chassis-id feature might not be able to change the LED state.

Clone this wiki locally