Initial revision

This commit is contained in:
cvs
2000-02-07 10:38:55 +00:00
commit fdc6b051c9
846 changed files with 230218 additions and 0 deletions

207
doc/user/trouble.htm Normal file
View File

@@ -0,0 +1,207 @@
<html>
<head>
<title>SICS Trouble Shooting</title>
</head>
<body>
<h1>SICS Trouble Shooting </h1>
<hr size=4 width="66%">
<p>
There is no such thing as bug free software. There are always bugs, nasty
behaviour etc. This document shall help to solve these problems. The usual
symptom will be that a client cannot connect to the server or the server is
not responding.
</p>
<p>
An essential prerequisite of SICS is that the server is up
and running. The system is configured to restart the SICServer whenever it
fails. Only after a reboot or when the keepalive processes were killed (see
below) the SICServer must be restarted. This is done for all instruments by
typing:
<pre>
startsics
</pre>
at the command prompt. startsics actually starts two programs: one is
the replicator application which is responsible for the automatic
copying of data files to the laboratory server. The other is the SICS
server. Both programs are started by means of a shell script called
<b>keepalive</b>. keepalive is basically an endless loop which calls
the program again and agaian and thus ensures that the program will
never stop running.
</p>
<p>
When the SICS server hangs, or you want to enforce an reinitialization of
everything the server process must be killed. This can be accomplished either manually or through a shell script.
</p>
<h2>Stopping SICS</h2>
<p>
All SICS processes can be stopped through the command:
<pre>
killsics
</pre>
given at the unix command line. You must be the instrument user
(for example DMC) on the instrument computer for this to work properly.
</p>
<h2>Finding the SICS server</h2>
<p>The first thing when killing the SICS server manually is to find the
server process.
Log in as Instrument user on the instrument computer (for instance DMC on
lnsa05). Type the command:
<pre>
/home/DMC> ps -A
</pre>
Note the capital A given as parameter. The reward will be listing like this:
<pre width =132>
PID TTY S TIME CMD
0 ?? R 01:56:28 [kernel idle]
1 ?? I 1:24.44 /sbin/init -a
3 ?? IW 0:00.20 /sbin/kloadsrv
24 ?? S 40:39.58 /sbin/update
97 ?? S 0:04.87 /usr/sbin/syslogd
99 ?? IW 0:00.03 /usr/sbin/binlogd
159 ?? S 1:43.70 /usr/sbin/routed -q
285 ?? S 1:00.45 /usr/sbin/portmap
293 ?? S 6:03.45 /usr/sbin/ypserv
299 ?? I 0:00.37 /usr/sbin/ypbind -s -S psunix,lnsa05.psi.ch
307 ?? I 0:00.52 /usr/sbin/mountd -i
309 ?? I 0:00.07 /usr/sbin/nfsd -t8 -u8
311 ?? I 0:00.09 /usr/sbin/nfsiod 7
317 ?? S 5:51.54 /usr/sbin/automount -f /etc/auto.master -M /psi
370 ?? I 0:28.58 -accepting connections (sendmail)
389 ?? S 1:41.15 /usr/sbin/xntpd -g -c /etc/ntp.conf
419 ?? S 6:00.16 /usr/sbin/snmpd
422 ?? S 1:00.91 /usr/sbin/os_mibs
438 ?? S 34:29.67 /usr/sbin/advfsd
449 ?? I 3:16.29 /usr/sbin/inetd
482 ?? IW 0:11.53 /usr/sbin/cron
510 ?? IW 0:00.02 /usr/lbin/lpd
525 ?? I 5:31.67 /usr/opt/psw/psw_agent -x/dev/null -f/usr/opt/psw/psw_agent.conf
532 ?? I 0:00.74 /usr/opt/psw/psw_sensor_syswd 1 -x/dev/null
555 ?? I 0:00.58 /usr/bin/nsrexecd
571 ?? I 0:20.27 /usr/dt/bin/dtlogin -daemon
583 ?? S 1:38.27 lpsbootd -F /etc/lpsodb -l 0 -x 1
585 ?? IW 0:00.04 /usr/sbin/getty /dev/lat/620 console vt100
586 ?? IW 0:00.03 /usr/sbin/getty /dev/lat/621 console vt100
587 ?? I 35:59.85 /usr/bin/X11/X :0 -auth /var/dt/authdir/authfiles/A:0-aaarBa
657 ?? I 0:01.46 rpc.ttdbserverd
4705 ?? IW 0:00.05 dtlogin -daemon
9127 ?? I 0:00.37 /usr/bin/X11/dxconsole -geometry 480x150-0-0 -daemon -nobuttons -verbose -notify -exitOnFail -nostdin -bg gray
9317 ?? IW 0:00.73 dtgreet -display :0
14412 ?? S 0:39.71 netscape
15524 ?? I 0:00.57 rpc.cmsd
21678 ?? S 0:00.11 telnetd
31912 ?? S 0:10.65 /home/DMC/bin/SICServer /home/DMC/bin/dmc.tcl
584 console IW + 0:00.21 /usr/sbin/getty console console vt100
21978 ttyp1 S 0:00.63 -tcsh (tcsh)
22269 ttyp1 R + 0:00.10 ps -A
</pre>
This is a listing of all running processes on the machine where this command
has been typed. Note, in this case, at the bottom in the line starting with
<tt> 31912 ?? </tt> an entry for the SICS server. In this example the server
is running. If the server is down, no such entry would be present.
</p>
<h2> Killing a hanging SICS server </h2>
<p>
Suppose, the situation is that the SICS server does not respond anymore. It
needs to be forcefully exited. Please note, that it is always better to
close the server via the <tt>Sics_Exitus</tt> command typed with manager
privilege in one of the command clients. In order to kill the server it is
needed to find him first using the scheme given above. The information
needed is the number given as first item in the same line where the server
is listed. In this case: <tt>31912</tt>. Please note, that this number will
always be different. The command to force the server to stop is:
<pre>
/home/DMC> kill -9 31912
</pre>
Note, the second parameter is the number found with <tt>ps -A</tt>. The
SICServer will be restarted automatically by the system. Occasionally, it
may happen, that you cannot connect to the SICS server after such an
operation. This is due to some network buffering problems. Doing the killing
again usually solves the problem.
</p>
<h2> Shutting The SICS Server Down Completely</h2>
<p>
This is done for you by the killsics shell script. Just type
<pre>
killsics
</pre>
at the unix command line. Here is what killsics does for you:
In order to completely shutdown the SICS server two process must be killed:
the actual SICS server and the process which automatically restarts the
SICServer. The latter must be killed first. It can be found in the ps -A
listing as a line reading <b>keepalive SICServer </b>. Kill that one as
described above, then kill the SICServer. For restarting SICS after this,
use the startsics command.
</p>
<h2>Restart Everything</h2>
<p>
If nothing seems to work any more, no connections can be obtained etc, then
the next guess is to restart everything. This is especially necessary if
mechanics or electronics people were closer to the instrument then 400 meters.
<OL>
<LI> Reboot the Macintosh PC by switching it off at the silver button on the
left. Press deep and a few seconds to achieve an effect. The LED right to the
button should be off, before you press again to boot the Macintosh.
<LI> Reboot the histogram memory. It has a tiny button labelled RST. That' s
the one. Can be operated with a hairpin, a ball point pen or the like.
<LI> Wait 5 minutes. The Macintosh may take that time to come up again.
<LI> Restart the SICServer. Watch for any messages about things not being
connected or configured.
<LI> Restart and reconnect the client programs.
</OL>
If this fails (even after a second) time there may be a network problem which
can not be resolved by simple means.
</p>
<h2>Getting New SICS Software</h2>
<p>
Sometimes you might want to be sure that you have the latest SICS software.
This is how to get it:
<ol>
<li>Login to the instrument account.
<li>If you are no there type cd to get into the home directory.
<li>Type <b>killsics</b> at the unix prompt in order to stop the SICS server.
<li>Type <b>sicsinstall exe</b> at the unix prompt for copying new
SICS software from the general distribution area.
<li>Type <b> startsics</b> to restart the SICS software.
</ol>
</p>
<h2>Hot Fixes</h2>
<p>
When there is trouble with SICS you may be asked by one of the SICS
programmers to copy the most recent development reason of the SICS server
to your machine. This is done as follows:
<ol>
<li>Login to the instrument account.
<li>cd into the bin directory, for example: /home/DMC/bin.
<li>Type <b> killsics</b> at the unix prompt in order to stop the SICS server.
<li>Type <b>cp /data/koenneck/src/sics/SICServer .</b> at the unix prompt.
<li>Type <b> startsics</b> to restart the SICS software.
</ol>
<b>!!!!!! WARNING !!!!!!!. Do this only when advised to do so by a competent
SICS programmer. Otherwise you might be copying a SICS server in an
instable experimental state!</b>
</p>
<h2> HELP debugging!!!!</h2>
<p>
The SICS server hanging or crashing should not happen. In order to sort such
problems out it is very helpful if any available debugging information is
saved and presented to the programmers. Information available are the log
files as written continously by the SICS server and posssible core files
lying around. They have just this name: core. In order to save them create a
new directory (for example dump2077) and copy the stuff in there. This looks
like:
<pre>
/home/DMC> mkdir dump2077
/home/DMC> cp log/*.log dump2077
/home/DMC> cp core dump2077
</pre>
The <tt>/home/DMC> </tt> is just the command prompt. Please note, that core
files are only available after crashes of the server. These few commands
will help to analyse the cause of the problem and to eventually resolve it.
</p>
</body>
</html>