IBM Link: padmin login fails with “INIT: failed write of utmp entry”

Problem(Abstract)

The PowerVM Virtual I/O Server (VIOS) prime adiministrator (padmin) user fails to login remotely.

Symptom

Attempt to login to the VIOS via network using ssh or telnet fail with the following error:
INIT: failed write of utmp entry: ” cons”

 

Cause

The message is commonly due to I/O issue with rootvg filesystem(s).

Environment

VIOS 2.2 or higher managed by an HMC

Diagnosing the problem

Common causes may include (but are not limited to) any of the following:

1. Permissions issue for /etc/utmp or /var/adm/wtmp
2. Full rootvg file systems.
3. Hardware failure related to VIOS rootvg, such as disk or adapter failure.
4. rootvg file systems are in read-only mode

Determine if VIOS has console access by logging in to the HMC via GUI or Comman Line Interface (CLI) as hscroot user. (This will require the HMC to have Secure Shell access enabled.
Using HMC GUI
In the navigation pane, open Systems Management > click Servers > click on the Managed System name where the VIOS in question reside.
In the work pane, select the VIOS partition > click Tasks > Console Window > Open Terminal Window .
Using HMC CLI
At HMC command line, type vtmenu > enter Number of the Managed System name > enter number of VIOS partition in question.
If the console window is non-responsive, you may see something similar to the following:

Opening Virtual Terminal On Partition t720vio1 . . .
Open in progress
Open Completed.
IBM Virtual I/O Server
login: padmin
/dev/vty0: You must "exec" login from the lowest login shell.
INIT: failed write of utmp entry: "          cons"
INIT: failed write of utmp entry: "          cons"
INIT: failed write of utmp entry: "          cons"
INIT: Command is respawning too rapidly. Check for possible errors.If console access fails with above errors, check if VIOS commands can be run from the HMC CLI using viosvrcmd command. There are a couple of ways to do this:
Method #1 – Run padmin commandhscroot@hmchost:~> command=`printf "ioslevel"`; viosvrcmd -m VIRT-9117-MMB-SN10F6B1R -p p7virtvios1 -c "$command"
2.2.4.10 < this is the command output
where ioslevel is the padmin command, VIRT-9117-MMB-SN10F6B1R is your managed system name, and p7virtvios is your VIOS partition name in question.

Method #2 – Run command via oem_setup_env/AIX root shell

hscroot@hmchost:~> command=`printf "oem_setup_env\nls -l /etc/utmp"`; viosvrcmd -m VIRT-9117-MMB-SN10F6B1R -p p7virtvios1 -c "$command"

-rw-r--r--    1 root     system        35640 Jun 20 15:55 /etc/utmp
where ls -l /etc/utmp is the command
hscroot@hmchost:~> command=`printf "oem_setup_env\nls -l /var/adm/wtmp"`; viosvrcmd -m VIRT-9117-MMB-SN10F6B1R -p p7virtvios1 -c "$command"
-rw-rw-r--    1 adm      adm         3906792 Jun 20 15:55 /var/adm/wtmp

If VIOS is reachable through viosvrcmd command, check the Probable Causes below. On the other hand, if the VIOS is still not reachable, then, it may be in a “hung” condition. For how to troubleshoot a VIOS hung condition, contact your local IBM SupportLine Representative for OS Dumps support.

Resolving the problem

Probable Cause #1 – Permissions issue for /etc/utmp or /var/adm/wtmp

    viosvrcmd method #2 reflects the expected permissions for /etc/utmp and /var/adm/wtmp. So if your output matches the above, then your permissions are correct. If your permissions are different, change them to match the above output. Then try to login again.

Probable Cause #2 – Full rootvg file system

      Check for full file system through viosvrcmd by substituting the command with

df -g

      . If there are any file systems full or nearly full, address the issue before re-attempting the login. For more details see

Diagnosing Full File Systems in PowerVM VIOS

    .

Probable Cause #3 – Hardware failure related to VIOS rootvg

      Substitute command in viosvrcmd with errlog (padmin) or errpt command (if in oem_setup_env/AIX root shell) to determine if there are disk or adapter errors related to rootvg.

If there are no hardware errors related to rootvg, check if there are any LVM or file system errors associated with rootvg.
Probable Cause #4 – rootvg file systems are in read-only mode

      This can happen if the systems looses its path to the rootvg disks. To test this check for ioscli.log in /home/padmin and try making a copy of the file (cp ioscli.log test123.log) through viosvrcmd to see if it completes successfully or if it errors, i.e.

HSCL2970 The IOServer command has failed because of the following reason:
Unable to open file: /home/ios/logs/ioscli_global.trace for append
Error from cliCheckFile:-1
Unable to open file: /home/ios/logs/ioscli_global.trace for append
Error from cliCheckFile:-1
...
Unable to open file: /home/ios/logs/ioscli_global.trace for append
Error from cliCheckFile:-1
cp: test123.out: Read-only file system     <-------
rc=1
...

Note: If VIOS boots from SAN disk, contact your SAN administrator to check if the SAN/boot disk may have been put in read-only mode. This is known to happen when disk copy utilities are used by the storage.

If no hardware errors were found in error log, but a “read-only file system” message is encountered, run mount command using viosrvcmd method #2. The mount command should show how many read-only filesystems there are. Possible output may include:

 node    mounted    mounted over    vfs       date        options
------ ----------- --------------  ------ ------------ ---------------
       /dev/hd4    /                jfs2  Mar 15 09:47 rw,log=/dev/hd8
       /dev/hd2    /usr             jfs2  Mar 15 09:47 rw,log=/dev/hd8
...

OR

 node    mounted    mounted over    vfs       date        options
------ ----------- --------------  ------ ------------ ---------------
       /dev/hd4    /                jfs2  Mar 03 01:25 ro,degraded
       /dev/hd2    /usr             jfs2  Mar 03 09:47 ro,degraded
...

        If all file systems are in read only mode, this generally happens when the file system does not have access to the underlying disks and/or when writes to the disk fails (basically I/O errors). In that situation JFS2 puts the file system in

read-only

        mode on purpose so that there is no further writes to the disks to avoid file system corruption due to I/O errors.
        In such case, the VIOS should be booted in maintenance mode to verify the file system integrity by doing a thorough fsck to bring them to a clean state and to make sure the disk is reachable. If the system is in such state, it is very likely that the boot disk has issues but without access to the error log and snap data, it is difficult to confirm categorically.
      Since the only option at that point is to reboot the VIOS to maintenance mode and run fsck on the file systems, a system dump may be worth taking while bringing down the system for further investigation.