Commit Graph
7 Commits
Author SHA1 Message Date
leonarski_fandClaude Opus 4.8 2a9fd084ab TCPStreamPusher: post-zerocopy cleanup + fix queue-path backpressure drop
Follow-up simplifications after removing the zerocopy machinery, plus a real
backpressure bug the cleanup surfaced:

- SendImage(ZeroCopyReturnValue&) imposed a hard 2s deadline on enqueueing and
  then marked the connection broken. At high frame rate the 128-deep queue
  fills in tens of ms, so any filesystem stall longer than ~2s dropped the run
  even though the writer was alive and heartbeating -- defeating the whole
  BUSY-heartbeat backpressure design. Block instead while the peer is alive
  (!broken && active); the real liveness decision already lives in SendAll's
  peer-liveness timeout, which the writer's BUSY heartbeats keep fresh. This
  makes the queue path consistent with the send path: both wait out arbitrarily
  long stalls and only give up when the peer goes genuinely silent.
- Drop the dead per-connection data_sent counter (written, never read) and the
  redundant ImagePusherQueueElement.image_data set on the TCP path (only the
  HDF5 pusher reads that field).
- Add SetPeerLivenessTimeout() so the liveness window is tunable (and testable).

Add TCPImageCommTest_StalledWriter_SurvivesViaHeartbeat: a controllable raw
writer double connects, ACKs START, then stops draining for 4s while still
sending BUSY heartbeats (peer-liveness window set to 2s). The run must ride out
the stall on the zero-copy queue path and deliver all 1000 images. Verified to
fail (115/1000 delivered, connection dropped) against the old 2s-deadline
behavior.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 15:55:49 +02:00
leonarski_fandClaude Opus 4.8 f859f8108f TCPStreamPusher: remove MSG_ZEROCOPY machinery, use plain blocking send
The MSG_ZEROCOPY path was the common factor behind the occasional mid-run
writer disconnects and added substantial failure surface for no benefit at
this throughput (tens of MB/s): the socket error queue raises POLLERR as a
normal event (entangled with liveness detection), and the per-connection
completion-id counter was reset every run while the kernel's sk_zckey is
monotonic for the life of the socket, so on a persistent connection the
bookkeeping diverged from run 2 onward.

Replace it with a straightforward synchronous send():
- SendAll/SendFrame lose all zerocopy params; DATA payloads are sent with a
  plain ::send(MSG_NOSIGNAL), so the image-buffer slot is owned by the kernel
  once send() returns and WriterThread releases it immediately.
- Drop ZeroCopyCompletionThread and the zc_pending/zc_mutex/zc_cv/zc_*_id
  state, SO_ZEROCOPY setup, and the errqueue include.
- StopDataCollectionThreads now drains and releases any queued-but-unsent
  slots on the stalled-writer path (active==false makes WriterThread exit
  without draining) instead of Clear()-ing them, avoiding a slot leak.

The BUSY-heartbeat peer-liveness timeout (backpressure tolerance) is kept;
it is independent of zerocopy.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-25 15:55:49 +02:00
leonarski_f 75e401f0e5 v1.0.0-rc.153 (#63)
Build Packages / Unit tests (push) Successful in 1h31m59s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 8m43s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 10m5s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m27s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m56s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 9m24s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 10m27s
Build Packages / build:rpm (rocky8) (push) Successful in 9m20s
Build Packages / build:rpm (rocky9) (push) Successful in 10m50s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 9m54s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 8m38s
Build Packages / DIALS test (push) Successful in 12m13s
Build Packages / XDS test (durin plugin) (push) Successful in 7m8s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 7m8s
Build Packages / XDS test (neggia plugin) (push) Successful in 7m50s
Build Packages / Generate python client (push) Successful in 16s
Build Packages / Build documentation (push) Successful in 50s
Build Packages / Create release (push) Skipped
This is an UNSTABLE release. It includes many experimental features, as well as many AI generated fixes. We recommend using rc.152 for production use.

* jfjoch_broker: Add EXPERIMENTAL pixelrefine mode for image processing
* jfjoch_broker: Allow to load user mask from 8-bit and 16-bit TIFF files
* jfjoch_broker: Add ROI calculation in non-FPGA workflow
* jfjoch_broker: Fixes to TCP image pusher
* jfjoch_broker: Remove NUMA bindings
* jfjoch_broker: Improvements to indexing
* jfjoch_broker: For PSI EIGER, trimming energies are taken from the detector configuration (now compulsory) instead of hardcoded values
* jfjoch_writer: Save ROI definitions and the per-pixel ROI bitmap in the master file; azimuthal ROIs support phi (angular) sectors
* jfjoch_viewer: Major redesign with dockable panels and saved layouts, plus on-canvas creation/move/resize of box, circle and azimuthal ROIs
* jfjoch_viewer: Run jfjoch_process reprocessing jobs from inside the GUI and overlay per-run results

Reviewed-on: #63
2026-06-23 20:29:49 +02:00
leonarski_f fc68a9baed v1.0.0-rc.146 (#56)
Build Packages / Unit tests (push) Skipped
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m34s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 10m0s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 10m23s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 10m23s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 11m16s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 11m49s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 8m32s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 9m15s
Build Packages / XDS test (durin plugin) (push) Successful in 7m16s
Build Packages / Generate python client (push) Successful in 16s
Build Packages / build:rpm (rocky9) (push) Successful in 10m12s
Build Packages / Create release (push) Skipped
Build Packages / Build documentation (push) Successful in 47s
Build Packages / DIALS test (push) Successful in 10m18s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 5m46s
Build Packages / build:rpm (rocky8) (push) Successful in 1h41m2s
Build Packages / XDS test (neggia plugin) (push) Successful in 1h59m18s
This is an UNSTABLE release. The release has significant modifications for data processing - in case of troubles go back to 1.0.0-rc.144.

jfjoch_process: Generate a dedicated file (_process.h5), which can be used as a replacement for the _master.h5 file for a reanalyzed dataset.
jfjoch_process: Improve the performance of scaling and merging, implement on the fly scaling.
jfjoch_writer: All final data analysis results are repopulated in the _master.h5 file.
jfjoch_scale: Dedicated tool for rescaling/merging existing data.
jfjoch_viewer: Fix bugs where pixel labels where displayed on a wrong pixel.

WARNING! Scaling and merging are experimental at the moment, and may not provide reasonable results for the time being.

Reviewed-on: #56
2026-05-28 18:48:35 +02:00
leonarski_f bb9f5c715f v1.0.0-rc.135 (#44)
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m55s
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 10m28s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m56s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 11m47s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 13m7s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 12m31s
Build Packages / build:rpm (rocky8) (push) Successful in 12m59s
Build Packages / build:rpm (rocky9) (push) Successful in 14m5s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 15m30s
Build Packages / Generate python client (push) Successful in 1m18s
Build Packages / Build documentation (push) Successful in 1m3s
Build Packages / Create release (push) Has been skipped
Build Packages / build:rpm (ubuntu2404) (push) Successful in 10m8s
Build Packages / XDS test (durin plugin) (push) Successful in 9m16s
Build Packages / XDS test (neggia plugin) (push) Successful in 7m59s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 9m12s
Build Packages / DIALS test (push) Successful in 11m44s
Build Packages / Unit tests (push) Successful in 1h23m8s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.132.

* Multiple small bug fixes scattered across the whole code base. (detected with GPT-5.4)
* jfjoch_viewer: Improve image render performance

Reviewed-on: #44
Co-authored-by: Filip Leonarski <filip.leonarski@psi.ch>
Co-committed-by: Filip Leonarski <filip.leonarski@psi.ch>
2026-04-16 11:59:59 +02:00
leonarski_f 64002f1e29 v1.0.0-rc.129 (#36)
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 11m14s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 10m43s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 11m35s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 9m20s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 10m23s
Build Packages / Generate python client (push) Successful in 39s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 11m24s
Build Packages / Create release (push) Has been skipped
Build Packages / Build documentation (push) Successful in 1m0s
Build Packages / build:rpm (rocky8) (push) Successful in 10m35s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 10m35s
Build Packages / build:rpm (rocky9) (push) Successful in 11m17s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m9s
Build Packages / Unit tests (push) Failing after 1h18m57s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.124.

* jfjoch_broker: Significant improvements in TCP image socket, as a viable alternative for ZeroMQ sockets (only a single port on broker side, dynamically change number of writers, acknowledgments for written files)
* jfjoch_broker: Delta phi is calculated also for still data in Bragg prediction
* jfjoch_broker: Image pusher statistics are accessible via the REST interface
* jfjoch_writer: Supports TCP image socket and for these auto-forking option

Reviewed-on: #36
Co-authored-by: Filip Leonarski <filip.leonarski@psi.ch>
Co-committed-by: Filip Leonarski <filip.leonarski@psi.ch>
2026-03-05 22:13:12 +01:00
leonarski_f f3e0a15d26 v1.0.0-rc.127 (#34)
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 10m51s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m0s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m6s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 10m7s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 9m47s
Build Packages / Generate python client (push) Successful in 29s
Build Packages / Build documentation (push) Successful in 43s
Build Packages / Create release (push) Has been skipped
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 10m46s
Build Packages / build:rpm (rocky8) (push) Successful in 9m33s
Build Packages / Unit tests (push) Has been skipped
Build Packages / build:rpm (ubuntu2204) (push) Successful in 8m47s
Build Packages / build:rpm (rocky9) (push) Successful in 9m55s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m4s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.124.

* jfjoch_broker: Default EIGER readout time is 20 microseconds
* jfjoch_broker: Multiple improvements regarding performance
* jfjoch_broker: Image buffer allows to track frames in preparation and sending
* jfjoch_broker: Dedicated thread for ZeroMQ transmission to better utilize the image buffer
* jfjoch_broker: Experimental implementation of transmission with raw TCP/IP sockets
* jfjoch_writer: Fixes regarding properly closing files in long data collections
* jfjoch_process: Scale & merge has been significantly improved, but it is not yet integrated into mainstream code

Reviewed-on: #34
2026-03-02 15:57:12 +01:00