Question
========
I’m having a problem with cron or at jobs. How can I find out what’s wrong?
Answer
======
Diagnosing Cron and At Issues
Here are some steps of information to gather, or places to look when diagnosing a cron or at job issue.
1. Is only a single user having a problem, or is this common across all users?
2. Is there a specific entry in the user’s crontab file that is failing?
3. Did this work before? If so what has changed?
4. Is this only at certain times of the day or year? (for example DST change)
5. Check that cron has the correct permissions and ownership. Note that cron is a SETUID root binary:
# ls -l /usr/sbin/cron
-r-s–S— 1 root cron 77152 Jul 03 12:10 /usr/sbin/cron
6. Check that cron has a “respawn” entry in the /etc/inittab file:
cron:23456789:respawn:/usr/sbin/cron
7. Check to see that cron is still running using ps -ef.
8. Get a zsnap which will include cron information. If you cannot get a zsnap then at least gather these from the customer:
In the /var/adm/cron directory
*allow files (if present)
*.deny files (if present)
log (cron log)
queuedefs
/etc/cronlog.conf
/var/spool/cron/crontabs/user (specific user’s crontab)
/var/spool/mail/user (specific user’s mailbox)
9. Check the user’s mailbox.
Any cron job that produces output either to STDOUT or STDERR will cause a mail message with that information to the customer.
Date: Mon, 4 Oct 2010 13:15:05 -0700
From: root
To: you
Subject: Output from cron job date, user@hostname, exit status 0
Cron Environment:
SHELL = /usr/bin/sh
PATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java5/jre/bin/usr/java5/bin
CRONDIR=/var/spool/cron/crontabs
ATDIR=/var/spool/cron/atjobs
LOGNAME=user
HOME=/home/user
Your “cron” job executed on hostname on Mon Oct 4 13:15:00 PDT 2010
date
produced the following output:
Mon Oct 4 13:15:04 PDT 2010
*****************************************************************
cron: The previous message is the standard output
and standard error of one of the cron commands.
A few things to note here:
1. The exit status of 0 means the job completed with no errors.
2. The time and date the job was executed.
3. The command that was run
4. The environment set up for the job.
5. The actual output of the job.
If cron schedules a job, but the job errors, then cron is most likely working fine, and the customer should investigate why that job did not run in the enviornment cron has set up for it.
10. Compare this email output with the cron log file to see what it says about the job.
user : CMD ( date ) : PID ( 426004 ) : Mon Oct 4 13:15:00 2010
Cron Job with pid: 426004 Successful
This gives
1. The user the job was run as.
2. The PID that was forked off of cron.
3. The command run with arguments.
4. The date and time it was run
5. Whether or not the job started successfully.
If your customer sees the job kicked off successfully, but has STDERR and STDOUT redirected to /dev/null, have them change that crontab entry.
From:
15 13 * * * date > /dev/null 2>&1
to:
15 13 * * * date
That way STDOUT and STDERR will be emailed to them as seen above.
11. If cron is hung, try to get a stack trace of it.
On AIX 5.3 and up you can use procstack to get a quick view of the current state of the process
# ps -ef | grep cron
root 258184 1 0 Oct 01 – 0:00 /usr/sbin/cron
# procstack 258184
258184: /usr/sbin/cron
0xd0383a34 read(??, ??, ??) + 0x1a8
0x10000c74 msg_wait() + 0xe8
0x10004240 idle(??, ??) + 0x4c
0x10004f50 main(??, ??) + 0x544
0x10000198 __start() + 0x98
12. Check to see if cron has forked off a child process that may be hung. This can easily be done with the proctree command
# proctree -a 258184
1 /etc/init
258184 /usr/sbin/cron
Adding the “-a” option allows you to see that init started cron from the inittab, and still lists it as a child process.
NOTE: both /usr/bin/procstack and /usr/bin/proctree are found in the bos.perf.protocols fileset.
Further Steps for At Command Issues
For the “at” command, which is also run via cron, more information can be gathered.
1. Was the at job scheduled or did it fail?
If it failed to schedule the user should have seen an error on the command-line STDOUT similar to:
at: 0481-098 The specified date is not in the correct format.
Remember that the /usr/bin/at command takes STDIN as the command to be run. This syntax is incorrect:
$ at now +1 minute command
But this will work:
$ echo “command” | at now + 1 minute
A correct job scheduling should come back to the user with a message similar to:
Job user.1286233998.a will be run at Mon Oct 4 16:13:18 PDT 2010.
2. If the time has not come for the job to run, check the at queue using
$ atq
user.1286233998.a Mon Oct 4 16:13:18 PDT 2010
3. You can also check the /var/spool/cron/atjobs directory for any jobs awaiting their run time:
# ls
user.1286234667.a
The job itself should include the environment for the job, plus the command to run:
# cat user.1286234667.a
REAL_USER=userLOGIN_USER=userREAL_GROUP=staffGROUPS=staff,SUADMINAUDIT_CLASES=general,tcpipRLIMIT_CPU=9223372036854775807RLIMIT_FSIZE=2097151RLIMIT_DATA=262144RLIMIT_STACK=65536RLIMIT_CORE=2097151RLIMIT_RSS=65536RLIMIT_NOFILE=2000RLIMIT_THREADS=9223372036854775807RLIMIT_NPROC=9223372036854775807RLIMIT_CPU_HARD=9223372036854775807RLIMIT_FSIZE_HARD=2097151RLIMIT_DATA_HARD=18014398509481984RLIMIT_STACK_HARD=8388608RLIMIT_CORE_HARD=18014398509481984RLIMIT_RSS_HARD=18014398509481984RLIMIT_NOFILE_HARD=9223372036854775807RLIMIT_THREADS_HARD=9223372036854775807RLIMIT_NPROC_HARD=9223372036854775807UMASK=22PAG_DATA=USRENVIRON:_=/usr/bin/atLANG=en_USLOGIN=userPATH=/usr/bin:/etc:/usr/sbin:/usr/ucb:/usr/bin/X11:/sbin:/usr/java5/jre/bin:/usr/java5/binLC__FASTMSG=trueLOGNAME=userMAIL=/usr/spool/mail/userLOCPATH=/usr/lib/nls/locUSER=userAUTHSTATE=compatSHELL=/usr/bin/kshODMDIR=/etc/objreposHOME=/home/userTERM=xtermMAILMSG=[YOU HAVE NEW MAIL]PWD=/home/userTZ=America/Los_AngelesA__z=!LOGNAMESYSENVIRON:LOGNAME=userNAME=userTTY=/dev/pts/3
umask 022
cd /home/user
oslevel
4. The user should see an email with any STDOUT or STDERR from the command similar to cron.
5. Check the permissions and ownership of the at command. Similar to cron, it is a SUID root binary:
# ls -l /usr/bin/at
-r-sr-sr-x 1 root cron 56566 Jul 03 12:13 /usr/bin/at