IBM Tivoli Monitoring Unix OS agent fails to start on some AIX servers


Tivoli Monitoring 6.2.3 FP1 Unix Agent fails to start on some AIX servers.
The aixdp_daemon fails to initialize AIX Perfagent tools SPMI library. The error messages are as follows:
SpmiInit error: [no msg] attempt:1
SpmiInit error: [no msg] attempt:2
SpmiInit error: [no msg] attempt:3
SpmiInit error: [no msg] attempt:4
SpmiInit error: [no msg] attempt:5
SpmiInit error:
(5061A9EE.0001-1:aixdp_daemon.cpp,200,”main”) FATAL error,
Initialization failure



This might be because of the shared memory corruption in AIX SPMI library.

Diagnosing the problem

Verify the Unix OS agent logs in <ITM install dir>/logs directory.

Resolving the problem

Sometimes when the shared memory in the underlying operating system is corrupted, it could result in data not being fetched in the portal. Alternately, bad data could also be reported on the portal. Unix OS agent makes SPMI library calls in order to retrieve data for AIX Premium attributes. Shared memory corruption affects data collection. In order to correct the problem, shared memory needs to be cleared and the Unix OS agent restarted for correct data to start flowing in. The steps to accomplish that is as follows:
a) Stop the Unix OS agent

b) List all processes that are using libSpmi.a:
# genld -l | grep -p Spmi | grep Proc

c) Kill those processes:
# genld -l | grep -p Spmi | grep Proc | awk ‘{print $2}’ | xargs kill

d) Repeat step (b). If the processes still show up, individually kill -9 <pid of the process>

e) Clean up the stale ipcs:
Run the command:
# ipcs -a | grep 0x78 | awk ‘{print $2}’

If there are any listed from the above command, remove them by running:
# ipcrm -m <id as returned by step i) one at a time>
Rerun the “ipcs -a | grep 0x78 | awk ‘{print $2}’ ” to confirm there is no ipcs left out.

f) Then run:

g) Remove any *Spmi* files from the /tmp directory.

h) Restart Unix OS agents and ensure that data is being retrieved.

Shared memory corruption could occur due to a library version mismatch. It could also occur due to memory overflows and memory overwrites. On AIX 6.1, the APARs IZ56426 and IZ64808 document the problem. On AIX 5.3, the corresponding APAR number is IZ56425. If the fixes for the APARs are applied, problems due to shared memory corruption should be minimal.

Note: In addition, to problems due to shared memory corruption, there are some other underlying AIX problems related to the fileset which cause the failure of Unix OS agent data collection. Refer to the following link for details:

The versions of mentioned in the above article should also fix problems related to Shared memory corruption.