fixed structure

This commit is contained in:
Jeff Hill
2004-01-27 19:34:40 +00:00
parent 5bd8fe9148
commit 791fba7274

View File

@@ -1160,47 +1160,49 @@ Interval.</a></p>
<h4><a name="Server1">A Server's IP Address Was Changed</a></h4>
<p>Starting with EPICS R3.14.4 the Channel Access Client Library was modified
so that when communication over a circuit times out, then the disconnect
callback handler for each channel attached to that circuit is called, but the
circuit is not disconnected until TCP/IP's internal keep alive timer expires.
The disconnected channels remain attached to the beleaguered circuit and no
attempt is made to search for, or to reestablish, a new circuit. If, at some
time in the future, the circuit becomes responsive again, then the reconnect
handlers are called for each channel that is attached to the circuit, and any
monitor subscriptions that updated while the channel was disconnected are
refreshed. This behavior is more robust during periods of
CPU/network/resource/mbuf congestion. Of course, if at any time the library
receives an indication from the operating system that the beleaguered circuit
has shutdown or was disconnected then the library will immediately attempt to
find a new server and build a circuit to it.</p>
<p>When communication over a virtual circuit times out, then each channel
attached to the circuit enters a disconnected state and the disconnect
callback handler specified for the channel is called. However, the circuit is
not disconnected until TCP/IP's internal, typically long duration, keep alive
timer expires. The disconnected channels remain attached to the beleaguered
circuit and no attempt is made to search for, or to reestablish, a new
circuit. If, at some time in the future, the circuit becomes responsive
again, then the attached channels enter a connected astate again and
reconnect call back handlers are called. Any monitor subscriptions that
received an update message while the channel was disconnected are also
refreshed. If at any time the library receives an indication from the
operating system that a beleaguered circuit has shutdown or was disconnected
then the library will immediately reattempt to find servers for each channel
and connect circuit to them.</p>
<p>In the past the beleaguered circuit was immediately closed when
communication over it timed out, any attached channels were immediately
searched for, and after successful search responses arrived then attempts
were made to build a new circuit. This behavior could result in undesirable
load fluctuations during periods of CPU/network/resource/mbuf congestion.
There could be undesirable CPU consumption resulting from periodic circuit
setup and teardown overhead.</p>
<p>A well known negative side effect is that R3.14.5 CA clients will wait the
full (typically long) duration of TCP/IP's internal keep alive timer prior to
reconnecting under the following scenario (all of the following occur):</p>
<p>A well known negative side effect of the above behavior is that CA clients
will wait the full (typically long) duration of TCP/IP's internal keep alive
timer prior to reconnecting under the following scenario (all of the
following occur):</p>
<ul>
<li>An IOC's operating system crashes (or is abruptly turned off) or a
vxWorks system is stopped by any means</li>
<li>An server's (IOC's) operating system crashes (or is abruptly turned
off) or a vxWorks system is stopped by any means</li>
<li>This operating system does not immediately reboot using the same IP
address</li>
<li>A duplicate of the IOC is started appearing at a different IP
address</li>
<li>A duplicate of the server (IOC) is started appearing at a different IP
address</li>
</ul>
<p>It is unlikely that any rational organization will advocate the above
scenario while the system is operational. Nevertheless, this <em>is</em>
undoubtedly a negative side effect because there are opportunities for users
to become confused during control system development, but it is felt that the
improvements in operational system robustness justify the confusion resulting
in the small number of situations where the above scenarios occur.</p>
scenario in a production system. Nevertheless, there <em>are</em>
opportunities for users to become confused during control system
<em>development</em>, but it is felt that the robustness improvements justify
isolated confusion during the system integration and checkout activities
where the above scenarios are most likely to occur.</p>
<p>Contrast the above behavior with the behavior of releases prior to R3.14.5
where the beleaguered circuit was immediately closed when communication over
it timed out. Any attached channels were immediately searched for, and after
successful search responses arrived then attempts were made to build a new
circuit. This behavior could result in undesirable load fluctuations during
periods of CPU / network / IP kernel buffer congestion. There could be
undesirable resource consumption resulting from periodic circuit setup and
teardown overhead (thrashing).</p>
<h3><a name="Problems">ENOBUFS Messages</a></h3>