Question
========
What data should I collect in order to provide the best possibility for problem source identification of JFS or JFS2 filesystem or data corruption?
Answer
======
Following is a comprehensive list of data which may be required for determining the source of filesystem or data corruption within a JFS or JFS2 filesystem. To be collected PRIOR to any corrective action having been taken to repair the problem filesystem.
1) Collect a snap
First and foremost, a full snap should be collected from the server before any corrective action is taken…
Remove any existing snap data…
# snap -r
Gather a full snap. Do not compress it yet. We’ll compress and upload in the last step after
adding additional data to the snap subdirectory…
# snap -a
Change directories to the empty “testcase” directory in the snap, in order to store output from the commands below
# cd /tmp/ibmsupt/testcase
2) Start a script session
To log subsequent commands along with their stdout and stderr to a file within the snap subdirectory execute the following command…
# script myscript.out
3) Collect dumpfs
Gather dumpfs of the filesystem while in it’s corrupted state (before any corrective action taken)…
# /usr/sbin/dumpfs /fsname > dumpfs.fsname
4) Collect filesystem metadata (JFS2 only)
There are a number of possible reasons for filesystem metadata corruption, as noted in the
following technote…
AIX Filesystem Metacorruption
http://www.ibm.com/support/docview.wss?uid=isg3T1010896
If filesystem metadata corruption occurs and PSI is requested, collection and analysis of the filesystems metadata in it’s corrupted stated (ie, prior to any corrective action) can be crucial. The following technote describes how to collect a filesystems metadata for analysis…
Gathering JFS2 Metacapture For Problem Diagnosis
http://www.ibm.com/support/entdocview.wss?uid=isg3T1010897
*** Save all output into the /tmp/ibmsupt/testcase subdirectory using descriptive filenames.
*** The metacapture flag is only valid on JFS2 filesystems at AIX 5.2 TL8 and above, AIX 5.3 and AIX 6.1
5) Collect Fileplace information
If data corruption is suspected (vs filesystem metadata), the fileplace command can be useful to gather information which may help determine the cause of the data corruption.
Fileplace and ‘dd’ data collection procedures for data corruption analysis
http://www.ibm.com/support/docview.wss?uid=isg3T1011158
*** Save all output into the /tmp/ibmsupt/testcase subdirectory using descriptive filenames.
6) Repair the filesytem with fsck
*** If you ran fsck -yvv during step 4 please skip this step.
After collecting above information, if the filesystem is needed and cannot be left in it’s current state, the fsck command can be used to attempt to repair the metadata corruption. for additional verbosity, use the double “v” arguments…
# fsck -yvv /dev/
7) Collect fscklog output on JFS2 filesystems
The undocumented command ‘fscklog’ may/may not allow us to find the last fsck output or the one prior to the last (“-p” is prior log flag) for JFS2 filesystems. If this data is being collected prior to any corrective action, the output of fsck from Step 6 should be sufficient, however, if fsck has been exec’d prior to collecting this data, fscklog may be useful for seeing output of the prior execution, so it’s worth collecting.
To check most recent fsck log output for /fsname:
# /sbin/helpers/jfs2/fscklog /fsname > fscklog.out
To check previous fsck log output for /fsname:
# /sbin/helpers/jfs2/fscklog -p /fsname > fscklogprev.out
8) Upload the testcase for analysis
Exit the script session, compress and upload the snap directories for review…
# exit
# snap -c
# mv snap.pax.Z pmr#.branch#.000.snap.pax.Z
# ftp testcase.software.ibm.com
login as user ‘anonymous’ with your email address for the password
ftp> cd /toibm/aix
ftp> bin
ftp> put pmr#.branch#.000.snap.pax.Z
ftp> bye