2a9fd084ab93ba5b99142dd1aa3aceccb5f6fa0b
7
Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
2a9fd084ab |
TCPStreamPusher: post-zerocopy cleanup + fix queue-path backpressure drop
Follow-up simplifications after removing the zerocopy machinery, plus a real backpressure bug the cleanup surfaced: - SendImage(ZeroCopyReturnValue&) imposed a hard 2s deadline on enqueueing and then marked the connection broken. At high frame rate the 128-deep queue fills in tens of ms, so any filesystem stall longer than ~2s dropped the run even though the writer was alive and heartbeating -- defeating the whole BUSY-heartbeat backpressure design. Block instead while the peer is alive (!broken && active); the real liveness decision already lives in SendAll's peer-liveness timeout, which the writer's BUSY heartbeats keep fresh. This makes the queue path consistent with the send path: both wait out arbitrarily long stalls and only give up when the peer goes genuinely silent. - Drop the dead per-connection data_sent counter (written, never read) and the redundant ImagePusherQueueElement.image_data set on the TCP path (only the HDF5 pusher reads that field). - Add SetPeerLivenessTimeout() so the liveness window is tunable (and testable). Add TCPImageCommTest_StalledWriter_SurvivesViaHeartbeat: a controllable raw writer double connects, ACKs START, then stops draining for 4s while still sending BUSY heartbeats (peer-liveness window set to 2s). The run must ride out the stall on the zero-copy queue path and deliver all 1000 images. Verified to fail (115/1000 delivered, connection dropped) against the old 2s-deadline behavior. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
f859f8108f |
TCPStreamPusher: remove MSG_ZEROCOPY machinery, use plain blocking send
The MSG_ZEROCOPY path was the common factor behind the occasional mid-run writer disconnects and added substantial failure surface for no benefit at this throughput (tens of MB/s): the socket error queue raises POLLERR as a normal event (entangled with liveness detection), and the per-connection completion-id counter was reset every run while the kernel's sk_zckey is monotonic for the life of the socket, so on a persistent connection the bookkeeping diverged from run 2 onward. Replace it with a straightforward synchronous send(): - SendAll/SendFrame lose all zerocopy params; DATA payloads are sent with a plain ::send(MSG_NOSIGNAL), so the image-buffer slot is owned by the kernel once send() returns and WriterThread releases it immediately. - Drop ZeroCopyCompletionThread and the zc_pending/zc_mutex/zc_cv/zc_*_id state, SO_ZEROCOPY setup, and the errqueue include. - StopDataCollectionThreads now drains and releases any queued-but-unsent slots on the stalled-writer path (active==false makes WriterThread exit without draining) instead of Clear()-ing them, avoiding a slot leak. The BUSY-heartbeat peer-liveness timeout (backpressure tolerance) is kept; it is independent of zerocopy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> |
||
|
|
75e401f0e5 |
v1.0.0-rc.153 (#63)
Build Packages / Unit tests (push) Successful in 1h31m59s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 8m43s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 10m5s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m27s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m56s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 9m24s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 10m27s
Build Packages / build:rpm (rocky8) (push) Successful in 9m20s
Build Packages / build:rpm (rocky9) (push) Successful in 10m50s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 9m54s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 8m38s
Build Packages / DIALS test (push) Successful in 12m13s
Build Packages / XDS test (durin plugin) (push) Successful in 7m8s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 7m8s
Build Packages / XDS test (neggia plugin) (push) Successful in 7m50s
Build Packages / Generate python client (push) Successful in 16s
Build Packages / Build documentation (push) Successful in 50s
Build Packages / Create release (push) Skipped
This is an UNSTABLE release. It includes many experimental features, as well as many AI generated fixes. We recommend using rc.152 for production use. * jfjoch_broker: Add EXPERIMENTAL pixelrefine mode for image processing * jfjoch_broker: Allow to load user mask from 8-bit and 16-bit TIFF files * jfjoch_broker: Add ROI calculation in non-FPGA workflow * jfjoch_broker: Fixes to TCP image pusher * jfjoch_broker: Remove NUMA bindings * jfjoch_broker: Improvements to indexing * jfjoch_broker: For PSI EIGER, trimming energies are taken from the detector configuration (now compulsory) instead of hardcoded values * jfjoch_writer: Save ROI definitions and the per-pixel ROI bitmap in the master file; azimuthal ROIs support phi (angular) sectors * jfjoch_viewer: Major redesign with dockable panels and saved layouts, plus on-canvas creation/move/resize of box, circle and azimuthal ROIs * jfjoch_viewer: Run jfjoch_process reprocessing jobs from inside the GUI and overlay per-run results Reviewed-on: #63 |
||
|
|
fc68a9baed |
v1.0.0-rc.146 (#56)
Build Packages / Unit tests (push) Skipped
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m34s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 10m0s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 10m23s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 10m23s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 11m16s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 11m49s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 8m32s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 9m15s
Build Packages / XDS test (durin plugin) (push) Successful in 7m16s
Build Packages / Generate python client (push) Successful in 16s
Build Packages / build:rpm (rocky9) (push) Successful in 10m12s
Build Packages / Create release (push) Skipped
Build Packages / Build documentation (push) Successful in 47s
Build Packages / DIALS test (push) Successful in 10m18s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 5m46s
Build Packages / build:rpm (rocky8) (push) Successful in 1h41m2s
Build Packages / XDS test (neggia plugin) (push) Successful in 1h59m18s
This is an UNSTABLE release. The release has significant modifications for data processing - in case of troubles go back to 1.0.0-rc.144. jfjoch_process: Generate a dedicated file (_process.h5), which can be used as a replacement for the _master.h5 file for a reanalyzed dataset. jfjoch_process: Improve the performance of scaling and merging, implement on the fly scaling. jfjoch_writer: All final data analysis results are repopulated in the _master.h5 file. jfjoch_scale: Dedicated tool for rescaling/merging existing data. jfjoch_viewer: Fix bugs where pixel labels where displayed on a wrong pixel. WARNING! Scaling and merging are experimental at the moment, and may not provide reasonable results for the time being. Reviewed-on: #56 |
||
|
|
bb9f5c715f |
v1.0.0-rc.135 (#44)
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m55s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 10m28s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m56s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 11m47s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 13m7s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 12m31s
Build Packages / build:rpm (rocky8) (push) Successful in 12m59s
Build Packages / build:rpm (rocky9) (push) Successful in 14m5s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 15m30s
Build Packages / Generate python client (push) Successful in 1m18s
Build Packages / Build documentation (push) Successful in 1m3s
Build Packages / Create release (push) Has been skipped
Build Packages / build:rpm (ubuntu2404) (push) Successful in 10m8s
Build Packages / XDS test (durin plugin) (push) Successful in 9m16s
Build Packages / XDS test (neggia plugin) (push) Successful in 7m59s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 9m12s
Build Packages / DIALS test (push) Successful in 11m44s
Build Packages / Unit tests (push) Successful in 1h23m8s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.132. * Multiple small bug fixes scattered across the whole code base. (detected with GPT-5.4) * jfjoch_viewer: Improve image render performance Reviewed-on: #44 Co-authored-by: Filip Leonarski <filip.leonarski@psi.ch> Co-committed-by: Filip Leonarski <filip.leonarski@psi.ch> |
||
|
|
64002f1e29 |
v1.0.0-rc.129 (#36)
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 11m14s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 10m43s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 11m35s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 9m20s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 10m23s
Build Packages / Generate python client (push) Successful in 39s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 11m24s
Build Packages / Create release (push) Has been skipped
Build Packages / Build documentation (push) Successful in 1m0s
Build Packages / build:rpm (rocky8) (push) Successful in 10m35s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 10m35s
Build Packages / build:rpm (rocky9) (push) Successful in 11m17s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m9s
Build Packages / Unit tests (push) Failing after 1h18m57s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.124. * jfjoch_broker: Significant improvements in TCP image socket, as a viable alternative for ZeroMQ sockets (only a single port on broker side, dynamically change number of writers, acknowledgments for written files) * jfjoch_broker: Delta phi is calculated also for still data in Bragg prediction * jfjoch_broker: Image pusher statistics are accessible via the REST interface * jfjoch_writer: Supports TCP image socket and for these auto-forking option Reviewed-on: #36 Co-authored-by: Filip Leonarski <filip.leonarski@psi.ch> Co-committed-by: Filip Leonarski <filip.leonarski@psi.ch> |
||
|
|
f3e0a15d26 |
v1.0.0-rc.127 (#34)
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 10m51s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m0s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m6s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 10m7s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 9m47s
Build Packages / Generate python client (push) Successful in 29s
Build Packages / Build documentation (push) Successful in 43s
Build Packages / Create release (push) Has been skipped
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 10m46s
Build Packages / build:rpm (rocky8) (push) Successful in 9m33s
Build Packages / Unit tests (push) Has been skipped
Build Packages / build:rpm (ubuntu2204) (push) Successful in 8m47s
Build Packages / build:rpm (rocky9) (push) Successful in 9m55s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m4s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.124. * jfjoch_broker: Default EIGER readout time is 20 microseconds * jfjoch_broker: Multiple improvements regarding performance * jfjoch_broker: Image buffer allows to track frames in preparation and sending * jfjoch_broker: Dedicated thread for ZeroMQ transmission to better utilize the image buffer * jfjoch_broker: Experimental implementation of transmission with raw TCP/IP sockets * jfjoch_writer: Fixes regarding properly closing files in long data collections * jfjoch_process: Scale & merge has been significantly improved, but it is not yet integrated into mainstream code Reviewed-on: #34 |