* added PROJECT both in HM and driver code * added single detector support. - Removed several bugs in the AMOR data bit. - Updated documentation
238 lines
10 KiB
HTML
238 lines
10 KiB
HTML
<html>
|
|
<head>
|
|
<title>SICS Trouble Shooting</title>
|
|
</head>
|
|
<body>
|
|
|
|
<h1>SICS Trouble Shooting </h1>
|
|
<hr size=4 width="66%">
|
|
<H2>Inspecting Log Files</h2>
|
|
<p>
|
|
Suppose something went wrong over the weekend or during the night and
|
|
you are not absolutely sure what the problem was. In such a case it is
|
|
helpful to look at the SICS log files. They live in the log directory
|
|
of the instrument account. For each day (or after each restart of the
|
|
SICS server) a new log file is created. They are named according to the
|
|
following convention:
|
|
<pre>
|
|
autoYYYY-mm-dd@hh-MM-ss.log
|
|
</pre>
|
|
with YYYY denoting the year, mm the month, dd the day, hh the hour of
|
|
creation, MM the minute of creation and ss the seconds of
|
|
creation. The most recent log file can be looked at with the
|
|
<b>sicstail</b> command. <b>sicstail num</b> shows the last num lines
|
|
of the log file. Within SICS and especially in the SICS command line
|
|
client, the last 1000 lines of the log are accessible through the
|
|
<b>commandlog tail num</b> command. The command log is also accessible
|
|
through the WWW at lns00. The log file is equipped with hourly time
|
|
stamps which allow to find out when exactly a problem began to
|
|
appear.
|
|
</p>
|
|
<p>
|
|
Quite often the inspection of the log files will indicate problems
|
|
which are not software related such as:
|
|
<ul>
|
|
<li>Communication problems (usually network)
|
|
<li>Positioning problems of motors.
|
|
<li>BAD_EMERG_STOP: the motor emergency stop was engaged. It must be
|
|
released before the motors move again.
|
|
<li>BAD_STP: a motor had been switched off.
|
|
</ul>
|
|
</p>
|
|
<h2>Restarting SICS</h2>
|
|
<hr size=4 width="66%">
|
|
<p>
|
|
There is no such thing as bug free software. There are always bugs, nasty
|
|
behaviour etc. This document shall help to solve these problems. The usual
|
|
symptom will be that a client cannot connect to the server or the server is
|
|
not responding.
|
|
</p>
|
|
<p>
|
|
An essential prerequisite of SICS is that the servers are up
|
|
and running. The system is configured to restart the SICServer whenever it
|
|
fails. Only after a reboot or when the keepalive processes were killed (see
|
|
below) the SICServer must be restarted. This is done for all instruments by
|
|
typing:
|
|
<pre>
|
|
startsics
|
|
</pre>
|
|
at the command prompt. startsics actually starts several programs, see
|
|
the Setup section for details. All programs are started by means of a
|
|
shell script called
|
|
<b>keepalive</b>. keepalive is basically an endless loop which calls
|
|
the program again and agaian and thus ensures that the program will
|
|
never stop running.
|
|
</p>
|
|
<p>
|
|
When the SICS server hangs, or you want to enforce an reinitialization of
|
|
everything the server process must be killed. This can be accomplished either manually or through a shell script.
|
|
</p>
|
|
<h2>Stopping SICS</h2>
|
|
<p>
|
|
All SICS processes can be stopped through the command:
|
|
<pre>
|
|
killsics
|
|
</pre>
|
|
given at the unix command line. You must be the instrument user
|
|
(for example DMC) on the instrument computer for this to work properly.
|
|
</p>
|
|
|
|
<h2>Finding the SICS server</h2>
|
|
<p>The first thing when killing the SICS server manually is to find the
|
|
server process.
|
|
Log in as Instrument user on the instrument computer (for instance DMC on
|
|
lnsa05). Type the command:
|
|
<pre>
|
|
/home/DMC> ps -A
|
|
</pre>
|
|
Note the capital A given as parameter. The reward will be listing like this:
|
|
<pre width =132>
|
|
PID TTY S TIME CMD
|
|
0 ?? R 01:56:28 [kernel idle]
|
|
1 ?? I 1:24.44 /sbin/init -a
|
|
3 ?? IW 0:00.20 /sbin/kloadsrv
|
|
24 ?? S 40:39.58 /sbin/update
|
|
97 ?? S 0:04.87 /usr/sbin/syslogd
|
|
99 ?? IW 0:00.03 /usr/sbin/binlogd
|
|
159 ?? S 1:43.70 /usr/sbin/routed -q
|
|
285 ?? S 1:00.45 /usr/sbin/portmap
|
|
293 ?? S 6:03.45 /usr/sbin/ypserv
|
|
299 ?? I 0:00.37 /usr/sbin/ypbind -s -S psunix,lnsa05.psi.ch
|
|
307 ?? I 0:00.52 /usr/sbin/mountd -i
|
|
309 ?? I 0:00.07 /usr/sbin/nfsd -t8 -u8
|
|
311 ?? I 0:00.09 /usr/sbin/nfsiod 7
|
|
317 ?? S 5:51.54 /usr/sbin/automount -f /etc/auto.master -M /psi
|
|
370 ?? I 0:28.58 -accepting connections (sendmail)
|
|
389 ?? S 1:41.15 /usr/sbin/xntpd -g -c /etc/ntp.conf
|
|
419 ?? S 6:00.16 /usr/sbin/snmpd
|
|
422 ?? S 1:00.91 /usr/sbin/os_mibs
|
|
438 ?? S 34:29.67 /usr/sbin/advfsd
|
|
449 ?? I 3:16.29 /usr/sbin/inetd
|
|
482 ?? IW 0:11.53 /usr/sbin/cron
|
|
510 ?? IW 0:00.02 /usr/lbin/lpd
|
|
525 ?? I 5:31.67 /usr/opt/psw/psw_agent -x/dev/null -f/usr/opt/psw/psw_agent.conf
|
|
532 ?? I 0:00.74 /usr/opt/psw/psw_sensor_syswd 1 -x/dev/null
|
|
555 ?? I 0:00.58 /usr/bin/nsrexecd
|
|
571 ?? I 0:20.27 /usr/dt/bin/dtlogin -daemon
|
|
583 ?? S 1:38.27 lpsbootd -F /etc/lpsodb -l 0 -x 1
|
|
585 ?? IW 0:00.04 /usr/sbin/getty /dev/lat/620 console vt100
|
|
586 ?? IW 0:00.03 /usr/sbin/getty /dev/lat/621 console vt100
|
|
587 ?? I 35:59.85 /usr/bin/X11/X :0 -auth /var/dt/authdir/authfiles/A:0-aaarBa
|
|
657 ?? I 0:01.46 rpc.ttdbserverd
|
|
4705 ?? IW 0:00.05 dtlogin -daemon
|
|
9127 ?? I 0:00.37 /usr/bin/X11/dxconsole -geometry 480x150-0-0 -daemon -nobuttons -verbose -notify -exitOnFail -nostdin -bg gray
|
|
9317 ?? IW 0:00.73 dtgreet -display :0
|
|
14412 ?? S 0:39.71 netscape
|
|
15524 ?? I 0:00.57 rpc.cmsd
|
|
21678 ?? S 0:00.11 telnetd
|
|
31912 ?? S 0:10.65 /home/DMC/bin/SICServer /home/DMC/bin/dmc.tcl
|
|
584 console IW + 0:00.21 /usr/sbin/getty console console vt100
|
|
21978 ttyp1 S 0:00.63 -tcsh (tcsh)
|
|
22269 ttyp1 R + 0:00.10 ps -A
|
|
</pre>
|
|
This is a listing of all running processes on the machine where this command
|
|
has been typed. Note, in this case, at the bottom in the line starting with
|
|
<tt> 31912 ?? </tt> an entry for the SICS server. In this example the server
|
|
is running. If the server is down, no such entry would be present.
|
|
</p>
|
|
|
|
<h2> Killing a hanging SICS server </h2>
|
|
<p>
|
|
Suppose, the situation is that the SICS server does not respond anymore. It
|
|
needs to be forcefully exited. Please note, that it is always better to
|
|
close the server via the <tt>Sics_Exitus</tt> command typed with manager
|
|
privilege in one of the command clients. In order to kill the server it is
|
|
needed to find him first using the scheme given above. The information
|
|
needed is the number given as first item in the same line where the server
|
|
is listed. In this case: <tt>31912</tt>. Please note, that this number will
|
|
always be different. The command to force the server to stop is:
|
|
<pre>
|
|
/home/DMC> kill -9 31912
|
|
</pre>
|
|
Note, the second parameter is the number found with <tt>ps -A</tt>. The
|
|
SICServer will be restarted automatically by the system. Occasionally, it
|
|
may happen, that you cannot connect to the SICS server after such an
|
|
operation. This is due to some network buffering problems. Doing the killing
|
|
again usually solves the problem.
|
|
</p>
|
|
|
|
<h2> Shutting The SICS Server Down Completely</h2>
|
|
<p>
|
|
This is done for you by the killsics shell script. Just type
|
|
<pre>
|
|
killsics
|
|
</pre>
|
|
at the unix command line. Here is what killsics does for you:
|
|
In order to completely shutdown the SICS server two process must be killed:
|
|
the actual SICS server and the process which automatically restarts the
|
|
SICServer. The latter must be killed first. It can be found in the ps -A
|
|
listing as a line reading <b>keepalive SICServer </b>. Kill that one as
|
|
described above, then kill the SICServer. For restarting SICS after this,
|
|
use the startsics command.
|
|
</p>
|
|
<h2>Restart Everything</h2>
|
|
<p>
|
|
If nothing seems to work any more, no connections can be obtained etc, then
|
|
the next guess is to restart everything. This is especially necessary if
|
|
mechanics or electronics people were closer to the instrument then 400 meters.
|
|
<OL>
|
|
<LI> Reboot the histogram memory. It has a tiny button labelled RST. That' s
|
|
the one. Can be operated with a hairpin, a ball point pen or the like.
|
|
<LI> Restart the SICServer. Watch for any messages about things not being
|
|
connected or configured.
|
|
<LI> Restart and reconnect the client programs.
|
|
</OL>
|
|
If this fails (even after a second) time there may be a network problem which
|
|
can not be resolved by simple means.
|
|
</p>
|
|
<h2>Getting New SICS Software</h2>
|
|
<p>
|
|
Sometimes you might want to be sure that you have the latest SICS software.
|
|
This is how to get it:
|
|
<ol>
|
|
<li>Login to the instrument account.
|
|
<li>If you are no there type cd to get into the home directory.
|
|
<li>Type <b>killsics</b> at the unix prompt in order to stop the SICS server.
|
|
<li>Type <b>sicsinstall exe</b> at the unix prompt for copying new
|
|
SICS software from the general distribution area.
|
|
<li>Type <b> startsics</b> to restart the SICS software.
|
|
</ol>
|
|
</p>
|
|
<h2>Hot Fixes</h2>
|
|
<p>
|
|
When there is trouble with SICS you may be asked by one of the SICS
|
|
programmers to copy the most recent development reason of the SICS server
|
|
to your machine. This is done as follows:
|
|
<ol>
|
|
<li>Login to the instrument account.
|
|
<li>cd into the bin directory, for example: /home/DMC/bin.
|
|
<li>Type <b> killsics</b> at the unix prompt in order to stop the SICS server.
|
|
<li>Type <b>cp /data/koenneck/src/sics/SICServer .</b> at the unix prompt.
|
|
<li>Type <b> startsics</b> to restart the SICS software.
|
|
</ol>
|
|
<b>!!!!!! WARNING !!!!!!!. Do this only when advised to do so by a competent
|
|
SICS programmer. Otherwise you might be copying a SICS server in an
|
|
instable experimental state!</b>
|
|
</p>
|
|
<h2> HELP debugging!!!!</h2>
|
|
<p>
|
|
The SICS server hanging or crashing should not happen. In order to sort such
|
|
problems out it is very helpful if any available debugging information is
|
|
saved and presented to the programmers. Information available are the log
|
|
files as written continously by the SICS server and posssible core files
|
|
lying around. They have just this name: core. In order to save them create a
|
|
new directory (for example dump2077) and copy the stuff in there. This looks
|
|
like:
|
|
<pre>
|
|
/home/DMC> mkdir dump2077
|
|
/home/DMC> cp log/*.log dump2077
|
|
/home/DMC> cp core dump2077
|
|
</pre>
|
|
The <tt>/home/DMC> </tt> is just the command prompt. Please note, that core
|
|
files are only available after crashes of the server. These few commands
|
|
will help to analyse the cause of the problem and to eventually resolve it.
|
|
</p>
|
|
</body>
|
|
</html>
|
|
|