There is no such thing as bug free software. There are always bugs, nasty behaviour etc. This document shall help to solve these problems. The usual symptom will be that a client cannot connect to the server or the server is not responding.
An essential prerequisite of SICS is that the server is up and running. The system is configured to restart the SICServer whenever it fails. Only after a reboot or when the keepalive processes were killed (see below) the SICServer must be restarted. This is done for all instruments by typing:
startsicsat the command prompt. startsics actually starts two programs: one is the replicator application which is responsible for the automatic copying of data files to the laboratory server. The other is the SICS server. Both programs are started by means of a shell script called keepalive. keepalive is basically an endless loop which calls the program again and agaian and thus ensures that the program will never stop running.
When the SICS server hangs, or you want to enforce an reinitialization of everything the server process must be killed. This can be accomplished either manually or through a shell script.
All SICS processes can be stopped through the command:
killsicsgiven at the unix command line. You must be the instrument user (for example DMC) on the instrument computer for this to work properly.
The first thing when killing the SICS server manually is to find the server process. Log in as Instrument user on the instrument computer (for instance DMC on lnsa05). Type the command:
/home/DMC> ps -ANote the capital A given as parameter. The reward will be listing like this:
PID TTY S TIME CMD
0 ?? R 01:56:28 [kernel idle]
1 ?? I 1:24.44 /sbin/init -a
3 ?? IW 0:00.20 /sbin/kloadsrv
24 ?? S 40:39.58 /sbin/update
97 ?? S 0:04.87 /usr/sbin/syslogd
99 ?? IW 0:00.03 /usr/sbin/binlogd
159 ?? S 1:43.70 /usr/sbin/routed -q
285 ?? S 1:00.45 /usr/sbin/portmap
293 ?? S 6:03.45 /usr/sbin/ypserv
299 ?? I 0:00.37 /usr/sbin/ypbind -s -S psunix,lnsa05.psi.ch
307 ?? I 0:00.52 /usr/sbin/mountd -i
309 ?? I 0:00.07 /usr/sbin/nfsd -t8 -u8
311 ?? I 0:00.09 /usr/sbin/nfsiod 7
317 ?? S 5:51.54 /usr/sbin/automount -f /etc/auto.master -M /psi
370 ?? I 0:28.58 -accepting connections (sendmail)
389 ?? S 1:41.15 /usr/sbin/xntpd -g -c /etc/ntp.conf
419 ?? S 6:00.16 /usr/sbin/snmpd
422 ?? S 1:00.91 /usr/sbin/os_mibs
438 ?? S 34:29.67 /usr/sbin/advfsd
449 ?? I 3:16.29 /usr/sbin/inetd
482 ?? IW 0:11.53 /usr/sbin/cron
510 ?? IW 0:00.02 /usr/lbin/lpd
525 ?? I 5:31.67 /usr/opt/psw/psw_agent -x/dev/null -f/usr/opt/psw/psw_agent.conf
532 ?? I 0:00.74 /usr/opt/psw/psw_sensor_syswd 1 -x/dev/null
555 ?? I 0:00.58 /usr/bin/nsrexecd
571 ?? I 0:20.27 /usr/dt/bin/dtlogin -daemon
583 ?? S 1:38.27 lpsbootd -F /etc/lpsodb -l 0 -x 1
585 ?? IW 0:00.04 /usr/sbin/getty /dev/lat/620 console vt100
586 ?? IW 0:00.03 /usr/sbin/getty /dev/lat/621 console vt100
587 ?? I 35:59.85 /usr/bin/X11/X :0 -auth /var/dt/authdir/authfiles/A:0-aaarBa
657 ?? I 0:01.46 rpc.ttdbserverd
4705 ?? IW 0:00.05 dtlogin -daemon
9127 ?? I 0:00.37 /usr/bin/X11/dxconsole -geometry 480x150-0-0 -daemon -nobuttons -verbose -notify -exitOnFail -nostdin -bg gray
9317 ?? IW 0:00.73 dtgreet -display :0
14412 ?? S 0:39.71 netscape
15524 ?? I 0:00.57 rpc.cmsd
21678 ?? S 0:00.11 telnetd
31912 ?? S 0:10.65 /home/DMC/bin/SICServer /home/DMC/bin/dmc.tcl
584 console IW + 0:00.21 /usr/sbin/getty console console vt100
21978 ttyp1 S 0:00.63 -tcsh (tcsh)
22269 ttyp1 R + 0:00.10 ps -A
This is a listing of all running processes on the machine where this command
has been typed. Note, in this case, at the bottom in the line starting with
31912 ?? an entry for the SICS server. In this example the server
is running. If the server is down, no such entry would be present.
Suppose, the situation is that the SICS server does not respond anymore. It needs to be forcefully exited. Please note, that it is always better to close the server via the Sics_Exitus command typed with manager privilege in one of the command clients. In order to kill the server it is needed to find him first using the scheme given above. The information needed is the number given as first item in the same line where the server is listed. In this case: 31912. Please note, that this number will always be different. The command to force the server to stop is:
/home/DMC> kill -9 31912Note, the second parameter is the number found with ps -A. The SICServer will be restarted automatically by the system. Occasionally, it may happen, that you cannot connect to the SICS server after such an operation. This is due to some network buffering problems. Doing the killing again usually solves the problem.
This is done for you by the killsics shell script. Just type
killsicsat the unix command line. Here is what killsics does for you: In order to completely shutdown the SICS server two process must be killed: the actual SICS server and the process which automatically restarts the SICServer. The latter must be killed first. It can be found in the ps -A listing as a line reading keepalive SICServer . Kill that one as described above, then kill the SICServer. For restarting SICS after this, use the startsics command.
If nothing seems to work any more, no connections can be obtained etc, then the next guess is to restart everything. This is especially necessary if mechanics or electronics people were closer to the instrument then 400 meters.
Sometimes you might want to be sure that you have the latest SICS software. This is how to get it:
When there is trouble with SICS you may be asked by one of the SICS programmers to copy the most recent development reason of the SICS server to your machine. This is done as follows:
The SICS server hanging or crashing should not happen. In order to sort such problems out it is very helpful if any available debugging information is saved and presented to the programmers. Information available are the log files as written continously by the SICS server and posssible core files lying around. They have just this name: core. In order to save them create a new directory (for example dump2077) and copy the stuff in there. This looks like:
/home/DMC> mkdir dump2077 /home/DMC> cp log/*.log dump2077 /home/DMC> cp core dump2077The /home/DMC> is just the command prompt. Please note, that core files are only available after crashes of the server. These few commands will help to analyse the cause of the problem and to eventually resolve it.