Skip to content

2.1.10 Troubleshooting

lrrajesh edited this page Sep 14, 2016 · 1 revision

####Problems with Yum:

Note: Yum installation requires privileged access.

Do you have an accessible yum repository? Check in /etc/yum.repos.d/rhel-source.repo: If the RedHat repos are unavailable, you can try to add an alternate, e.g.:

[centos]
name=Centos
baseurl=http://vault.centos.org/6.4/os/x86_64/
enabled=1
gpgcheck=0

####Problems running orcmd: If you get an error such as the following after completing the build process:

shell$ orcmd
[chris-linux-01:11937] [[0,0],INVALID] ORTE_ERROR_LOG: Not found in file
../../../../orcm/mca/cfgi/base/cfgi_base_fns.c at line 764

Check that you have the correct orcm-site.xml in <install_folder>/etc. The 'make install' process overwrites this file with a default version.

####Problems with orcmd startup: Add verbosity switches

--omca mca_base_verbose 100
--omca oob_base_verbose 100
--omca db_base_verbose 100
--omca sensor_base_verbose 100

####Duplicate key error when storing sensor data to the database

If Sensys is unable to store sensor data because an error message is returned from the database explaining that a duplicate key violates a unique constraint, this most likely means there is an issue with that data that is being collected, possibly due to an incorrect configuration on one or more nodes. For example, check and make sure that all the nodes on the system have unique hostnames. If multiple nodes have the same hostname, when trying to store sensor data from these nodes, it will appear that same node is trying to store multiple values for the same metric at the same time (thus generating a unique key violation in the database).

####Unusual errors when trying to store data to the database

If Sensys is able to connect to the database, but is unable to store some or any data to the database and returns unusual database errors (e.g. errors that indicate that a certain table or procedure does not exist), this most likely means that there is a version mismatch between Sensys and the schema. Please make sure that you're using the schema that corresponds to a particular Sensys release (included with each release).

####Problems with orcmd connectivity in large cluster environments

When running Sensys in large cluster with more than 1024 compute nodes, the expected configuration is to setup one dedicated node to act as a multiple aggregator node, for instructions on how to set orcm-site.xml please check section "Running multiple aggregators on one node" within 3.4 Sensys CFGI User Guide

By design, running all aggregators in a single compute node would mean that incoming data from a large number of compute nodes will reach the same physical aggregator(s) node, for that matter we recommend to increase some system configuration boundaries which are usually set by default in the operating system.

i.e. for a 2048 node system the recommended settings are:

sysctl net.core.somaxconn=1024
sysctl net.core.netdev_max_backlog=2000
sysctl net.ipv4.tcp_max_syn_backlog=2048
sysctl net.ipv4.neigh.default.gc_thresh3=4096

In case your network is using ipv6 you should set:

sysctl net.ipv6.neigh.default.gc_thresh3=4096
Clone this wiki locally