more changes for arrayPerformance; added vectorPerformanceMain.cpp
This commit is contained in:
@@ -38,7 +38,7 @@
|
||||
<h1>pvDatabaseCPP</h1>
|
||||
<!-- Maturity: Working Draft or Request for Comments, or Recommendation, and date. -->
|
||||
|
||||
<h2 class="nocount">EPICS v4 Working Group, Working Draft, 28-Aug-2013</h2>
|
||||
<h2 class="nocount">EPICS v4 Working Group, Working Draft, 04-Sep-2013</h2>
|
||||
<dl>
|
||||
<dt>Latest version:</dt>
|
||||
<dd><a
|
||||
@@ -46,10 +46,11 @@
|
||||
</dd>
|
||||
<dt>This version:</dt>
|
||||
<dd><a
|
||||
href= "pvDatabaseCPP_20130828.html">pvDatabaseCPP20130828.html</a>
|
||||
href= "pvDatabaseCPP_20130904.html">pvDatabaseCPP20130904.html</a>
|
||||
</dd>
|
||||
<dt>Previous version:</dt>
|
||||
<dd><a href="pvDatabaseCPP_20130725.html">pvDatabaseCPP20130725.html</a>
|
||||
<dd><a
|
||||
href= "pvDatabaseCPP_20130828.html">pvDatabaseCPP20130828.html</a>
|
||||
</dd>
|
||||
<dt>Editors:</dt>
|
||||
<dd>Marty Kraimer, BNL</dd>
|
||||
@@ -70,10 +71,15 @@ The framework can be extended in order to create record instances that implement
|
||||
The minimum that an extenson must provide is a top level PVStructure and a process method.
|
||||
</p>
|
||||
|
||||
<p>EPICS version 4 is a set of related products in the EPICS
|
||||
V4 control system programming environment:<br />
|
||||
<a href="http://epics-pvdata.sourceforge.net/relatedDocumentsV4.html">relatedDocumentsV4.html</a>
|
||||
</p>
|
||||
|
||||
|
||||
<h2 class="nocount">Status of this Document</h2>
|
||||
|
||||
<p>This is the 28-Aug-2013 version of of pvDatabaseCPP.</p>
|
||||
<p>This is the 04-Sep-2013 version of of pvDatabaseCPP.</p>
|
||||
<p><b>NOTE:</b>
|
||||
This is built against pvDataCPP-md NOT against pvDataCPP.
|
||||
To build you must also
|
||||
@@ -94,11 +100,6 @@ This project is ready for alpha users.
|
||||
<dd>The arguments that have type <b>int</b> should be changed to <b>size_t</b>
|
||||
This will not be done until pvDataCPP-md is merged into pvDataCPP.
|
||||
</dd>
|
||||
<dt>arrayPerformanceMain</dt>
|
||||
<dd>When this is run without a pvAccess client the performance is great.
|
||||
But when a pvAccess client is monitoring then the preformance slows
|
||||
more then expected.
|
||||
I do not understand and must spend more time determining why.</dd>
|
||||
<dt>Monitor Algorithms</dt>
|
||||
<dd>Monitor algorithms have not been implemented.
|
||||
Thus all monitors are onPut.</dd>
|
||||
@@ -106,8 +107,10 @@ This project is ready for alpha users.
|
||||
<dd>Needs more testing.
|
||||
Also none of the examples that use pvAccess can be run with gdb.
|
||||
</dd>
|
||||
<dt>Termination issues.</dt>
|
||||
<dd>longArrayMonitor has memory leaks when main terminates.</dt>
|
||||
<dt>High Performance Issues.</dt>
|
||||
<dd>When arrayPerformance is run with size=5000 and delay really small or zero,
|
||||
either arrayPerformance or longArrayMonitor will sometimes crash while running or at termination.
|
||||
Also, in the same environment, longArrayMonitor frequently gets the array with size = 0</dd>
|
||||
<dt>Create regression tests</dt>
|
||||
<dd>Currently only examples exist and have been used for testing.</dd>
|
||||
</dl>
|
||||
@@ -1608,14 +1611,17 @@ mrk>
|
||||
mrk> bin/linux-x86_64/arrayPerformanceMain
|
||||
arrayPerformance arrayPerformance 50000000 0.01 local 1 false
|
||||
...
|
||||
first 0 last 0 sum 0 elements/sec 525.011million changed {1, 2} overrun {}
|
||||
first 1 last 1 sum 50000000 elements/sec 494.511million changed {1, 2} overrun {}
|
||||
first 2 last 2 sum 100000000 elements/sec 515.34million changed {1, 2} overrun {}
|
||||
first 3 last 3 sum 150000000 elements/sec 154.402million changed {1, 2} overrun {}
|
||||
first 4 last 4 sum 200000000 elements/sec 513.414million changed {1, 2} overrun {}
|
||||
first 5 last 5 sum 250000000 elements/sec 473.672million changed {1, 2} overrun {}
|
||||
first 6 last 6 sum 300000000 elements/sec 503.855million changed {1, 2} overrun {}
|
||||
arrayPerformance value 8 time 1.66373 iterations/sec 4.80847 elements/sec 240.424million
|
||||
first 0 last 0 sum 0 elements/sec 529.007million changed {1, 2} overrun {}
|
||||
first 1 last 1 sum 50000000 elements/sec 510.686million changed {1, 2} overrun {}
|
||||
first 2 last 2 sum 100000000 elements/sec 520.114million changed {1, 2} overrun {}
|
||||
first 3 last 3 sum 150000000 elements/sec 514.842million changed {1, 2} overrun {}
|
||||
first 4 last 4 sum 200000000 elements/sec 507.642million changed {1, 2} overrun {}
|
||||
first 5 last 5 sum 250000000 elements/sec 505.598million changed {1, 2} overrun {}
|
||||
first 6 last 6 sum 300000000 elements/sec 517.081million changed {1, 2} overrun {}
|
||||
first 7 last 7 sum 350000000 elements/sec 516.508million changed {1, 2} overrun {}
|
||||
first 8 last 8 sum 400000000 elements/sec 513.711million changed {1, 2} overrun {}
|
||||
first 9 last 9 sum 450000000 elements/sec 505.309million changed {1, 2} overrun {}
|
||||
arrayPerformance value 11 time 1.08257 iterations/sec 10.161 elements/sec 508.049million
|
||||
...
|
||||
</pre>
|
||||
<h3>arrayPerformance</h3>
|
||||
@@ -1707,9 +1713,9 @@ The delay will be a millisecond.
|
||||
There will be a single monitor and it will connect directly
|
||||
to the local channelProvider, i. e. it will not use any network
|
||||
connection.</p>
|
||||
<p>The report shows that arrayPerformance can perform about 8 iterations per second
|
||||
and is putting about 350million elements per second.
|
||||
Since each element is an int64 this means about 2.8gigaBytes per second.
|
||||
<p>The report shows that arrayPerformance can perform about 10 iterations per second
|
||||
and is putting about 500million elements per second.
|
||||
Since each element is an int64 this means about 4gigaBytes per second.
|
||||
</p>
|
||||
<p>When no monitors are requested and a remote longArrayMonitorMain is run:<p>
|
||||
<pre>
|
||||
@@ -1717,9 +1723,86 @@ mr> pwd
|
||||
/home/hg/pvDatabaseCPP-md
|
||||
mrk> bin/linux-x86_64/longArrayMonitorMain
|
||||
</pre>
|
||||
<p>The performance drops to about 90million elements per second.
|
||||
In addition the time between reports varies from just over 1 second to 3 seconds.
|
||||
I do not understand why.</p>
|
||||
<p>The performance drops to about 300million elements per second.
|
||||
In addition the time between reports varies from just over 1 second to 1.3 seconds.
|
||||
The reason is contention for transfering data between main memory and local caches.
|
||||
The next section has an example that demonstrates what happens.
|
||||
Note that if the array size is small enouggh to fix in the local cache then running longArrayMonitor
|
||||
has almost no effect of arrayPerforance.
|
||||
</p>
|
||||
<h2>Vector Performance</h2>
|
||||
<p>This example demonstrates how array size effects performance.
|
||||
The example is run as:</p>
|
||||
<pre>
|
||||
bin/linux-x86_64/vectorPerformanceMain -help
|
||||
vectorPerformanceMain size delay nThread
|
||||
default
|
||||
vectorPerformance 50000000 0.01 1
|
||||
</pre>
|
||||
<p>Consider the following:</p>
|
||||
<pre>
|
||||
bin/linux-x86_64/vectorPerformanceMain 50000000 0.00 1
|
||||
...
|
||||
thread0 value 20 time 1.01897 iterations/sec 19.6277 elements/sec 981.383million
|
||||
thread0 value 40 time 1.01238 iterations/sec 19.7554 elements/sec 987.772million
|
||||
thread0 value 60 time 1.00878 iterations/sec 19.826 elements/sec 991.299million
|
||||
...
|
||||
bin/linux-x86_64/vectorPerformanceMain 50000000 0.00 2
|
||||
...
|
||||
thread0 value 21 time 1.00917 iterations/sec 9.90911 elements/sec 495.455million
|
||||
thread1 value 31 time 1.05659 iterations/sec 9.46443 elements/sec 473.221million
|
||||
thread0 value 31 time 1.07683 iterations/sec 9.28648 elements/sec 464.324million
|
||||
thread1 value 41 time 1.0108 iterations/sec 9.89312 elements/sec 494.656million
|
||||
...
|
||||
bin/linux-x86_64/vectorPerformanceMain 50000000 0.00 3
|
||||
thread0 value 7 time 1.0336 iterations/sec 6.77244 elements/sec 338.622million
|
||||
thread1 value 7 time 1.03929 iterations/sec 6.73534 elements/sec 336.767million
|
||||
thread2 value 7 time 1.04345 iterations/sec 6.70852 elements/sec 335.426million
|
||||
thread0 value 14 time 1.03335 iterations/sec 6.77406 elements/sec 338.703million
|
||||
thread1 value 14 time 1.03438 iterations/sec 6.76734 elements/sec 338.367million
|
||||
thread2 value 14 time 1.04197 iterations/sec 6.71805 elements/sec 335.903million
|
||||
...
|
||||
bin/linux-x86_64/vectorPerformanceMain 50000000 0.00 4
|
||||
thread2 value 5 time 1.00746 iterations/sec 4.96298 elements/sec 248.149million
|
||||
thread1 value 5 time 1.02722 iterations/sec 4.86751 elements/sec 243.376million
|
||||
thread3 value 5 time 1.032 iterations/sec 4.84496 elements/sec 242.248million
|
||||
thread0 value 6 time 1.18882 iterations/sec 5.04703 elements/sec 252.351million
|
||||
thread2 value 10 time 1.00388 iterations/sec 4.98068 elements/sec 249.034million
|
||||
thread3 value 10 time 1.02755 iterations/sec 4.86592 elements/sec 243.296million
|
||||
thread1 value 10 time 1.04836 iterations/sec 4.76936 elements/sec 238.468million
|
||||
thread0 value 11 time 1.01575 iterations/sec 4.92249 elements/sec 246.124million
|
||||
</pre>
|
||||
<p>As more threads are running the slower each thread runs.</p>
|
||||
<p>But now consider a size that fits in a local cache.<p>
|
||||
<pre>
|
||||
bin/linux-x86_64/vectorPerformanceMain 5000 0.00n/linux-x86_64/vectorPerformanceMain 5000 0.00 1
|
||||
...
|
||||
thread0 value 283499 time 1 iterations/sec 283498 elements/sec 1417.49million
|
||||
thread0 value 569654 time 1 iterations/sec 286154 elements/sec 1430.77million
|
||||
thread0 value 856046 time 1 iterations/sec 286392 elements/sec 1431.96million
|
||||
...
|
||||
bin/linux-x86_64/vectorPerformanceMain 5000 0.00 2
|
||||
...
|
||||
thread0 value 541790 time 1 iterations/sec 271513 elements/sec 1357.56million
|
||||
thread1 value 541798 time 1 iterations/sec 271418 elements/sec 1357.09million
|
||||
thread0 value 813833 time 1 iterations/sec 272043 elements/sec 1360.21million
|
||||
thread1 value 813778 time 1 iterations/sec 271979 elements/sec 1359.89million
|
||||
thread0 value 541790 time 1 iterations/sec 271513 elements/sec 1357.56million
|
||||
thread1 value 541798 time 1 iterations/sec 271418 elements/sec 1357.09million
|
||||
thread0 value 813833 time 1 iterations/sec 272043 elements/sec 1360.21million
|
||||
thread1 value 813778 time 1 iterations/sec 271979 elements/sec 1359.89million
|
||||
...
|
||||
bin/linux-x86_64/vectorPerformanceMain 5000 0.00 3
|
||||
...
|
||||
thread0 value 257090 time 1 iterations/sec 257089 elements/sec 1285.45million
|
||||
thread1 value 256556 time 1 iterations/sec 256556 elements/sec 1282.78million
|
||||
thread2 value 514269 time 1 iterations/sec 257839 elements/sec 1289.19million
|
||||
thread0 value 514977 time 1 iterations/sec 257887 elements/sec 1289.43million
|
||||
thread1 value 514119 time 1 iterations/sec 257563 elements/sec 1287.81million
|
||||
thread2 value 770802 time 1 iterations/sec 256532 elements/sec 1282.66million
|
||||
</pre>
|
||||
<p>Now the number of threads has a far smaller effect on the performance of each thread.
|
||||
</p>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
|
||||
Reference in New Issue
Block a user