fixed structure
This commit is contained in:
@@ -1160,47 +1160,49 @@ Interval.</a></p>
|
||||
|
||||
<h4><a name="Server1">A Server's IP Address Was Changed</a></h4>
|
||||
|
||||
<p>Starting with EPICS R3.14.4 the Channel Access Client Library was modified
|
||||
so that when communication over a circuit times out, then the disconnect
|
||||
callback handler for each channel attached to that circuit is called, but the
|
||||
circuit is not disconnected until TCP/IP's internal keep alive timer expires.
|
||||
The disconnected channels remain attached to the beleaguered circuit and no
|
||||
attempt is made to search for, or to reestablish, a new circuit. If, at some
|
||||
time in the future, the circuit becomes responsive again, then the reconnect
|
||||
handlers are called for each channel that is attached to the circuit, and any
|
||||
monitor subscriptions that updated while the channel was disconnected are
|
||||
refreshed. This behavior is more robust during periods of
|
||||
CPU/network/resource/mbuf congestion. Of course, if at any time the library
|
||||
receives an indication from the operating system that the beleaguered circuit
|
||||
has shutdown or was disconnected then the library will immediately attempt to
|
||||
find a new server and build a circuit to it.</p>
|
||||
<p>When communication over a virtual circuit times out, then each channel
|
||||
attached to the circuit enters a disconnected state and the disconnect
|
||||
callback handler specified for the channel is called. However, the circuit is
|
||||
not disconnected until TCP/IP's internal, typically long duration, keep alive
|
||||
timer expires. The disconnected channels remain attached to the beleaguered
|
||||
circuit and no attempt is made to search for, or to reestablish, a new
|
||||
circuit. If, at some time in the future, the circuit becomes responsive
|
||||
again, then the attached channels enter a connected astate again and
|
||||
reconnect call back handlers are called. Any monitor subscriptions that
|
||||
received an update message while the channel was disconnected are also
|
||||
refreshed. If at any time the library receives an indication from the
|
||||
operating system that a beleaguered circuit has shutdown or was disconnected
|
||||
then the library will immediately reattempt to find servers for each channel
|
||||
and connect circuit to them.</p>
|
||||
|
||||
<p>In the past the beleaguered circuit was immediately closed when
|
||||
communication over it timed out, any attached channels were immediately
|
||||
searched for, and after successful search responses arrived then attempts
|
||||
were made to build a new circuit. This behavior could result in undesirable
|
||||
load fluctuations during periods of CPU/network/resource/mbuf congestion.
|
||||
There could be undesirable CPU consumption resulting from periodic circuit
|
||||
setup and teardown overhead.</p>
|
||||
|
||||
<p>A well known negative side effect is that R3.14.5 CA clients will wait the
|
||||
full (typically long) duration of TCP/IP's internal keep alive timer prior to
|
||||
reconnecting under the following scenario (all of the following occur):</p>
|
||||
<p>A well known negative side effect of the above behavior is that CA clients
|
||||
will wait the full (typically long) duration of TCP/IP's internal keep alive
|
||||
timer prior to reconnecting under the following scenario (all of the
|
||||
following occur):</p>
|
||||
<ul>
|
||||
<li>An IOC's operating system crashes (or is abruptly turned off) or a
|
||||
vxWorks system is stopped by any means</li>
|
||||
<li>An server's (IOC's) operating system crashes (or is abruptly turned
|
||||
off) or a vxWorks system is stopped by any means</li>
|
||||
<li>This operating system does not immediately reboot using the same IP
|
||||
address</li>
|
||||
<li>A duplicate of the IOC is started appearing at a different IP
|
||||
address</li>
|
||||
<li>A duplicate of the server (IOC) is started appearing at a different IP
|
||||
address</li>
|
||||
</ul>
|
||||
|
||||
<p>It is unlikely that any rational organization will advocate the above
|
||||
scenario while the system is operational. Nevertheless, this <em>is</em>
|
||||
undoubtedly a negative side effect because there are opportunities for users
|
||||
to become confused during control system development, but it is felt that the
|
||||
improvements in operational system robustness justify the confusion resulting
|
||||
in the small number of situations where the above scenarios occur.</p>
|
||||
scenario in a production system. Nevertheless, there <em>are</em>
|
||||
opportunities for users to become confused during control system
|
||||
<em>development</em>, but it is felt that the robustness improvements justify
|
||||
isolated confusion during the system integration and checkout activities
|
||||
where the above scenarios are most likely to occur.</p>
|
||||
|
||||
<p>Contrast the above behavior with the behavior of releases prior to R3.14.5
|
||||
where the beleaguered circuit was immediately closed when communication over
|
||||
it timed out. Any attached channels were immediately searched for, and after
|
||||
successful search responses arrived then attempts were made to build a new
|
||||
circuit. This behavior could result in undesirable load fluctuations during
|
||||
periods of CPU / network / IP kernel buffer congestion. There could be
|
||||
undesirable resource consumption resulting from periodic circuit setup and
|
||||
teardown overhead (thrashing).</p>
|
||||
|
||||
<h3><a name="Problems">ENOBUFS Messages</a></h3>
|
||||
|
||||
|
||||
Reference in New Issue
Block a user