145b300fa2a4e82cd4b02e6c8e6288239548ba03
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
97c3008c39 |
TCP stream: tolerate writer backpressure via BUSY heartbeats
Build Packages / Unit tests (push) Successful in 56m25s
Build Packages / DIALS test (push) Successful in 13m9s
Build Packages / XDS test (durin plugin) (push) Successful in 9m34s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 9m37s
Build Packages / XDS test (neggia plugin) (push) Successful in 5m57s
Build Packages / Generate python client (push) Successful in 14s
Build Packages / Build documentation (push) Successful in 40s
Build Packages / Create release (push) Skipped
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 9m57s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m47s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 10m11s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 9m57s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 10m20s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 10m41s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 10m19s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m42s
Build Packages / build:rpm (rocky8) (push) Successful in 11m4s
Build Packages / build:rpm (rocky9) (push) Successful in 11m43s
A slow filesystem could stall the writer's consume pipeline, propagating TCP backpressure to the pusher. The pusher then treated that backpressure as a dead peer and force-closed the connection mid-run; the writer reconnected as a brand-new connection outside the active session, so the rest of the run was silently dropped and the half-written HDF5 file was later finalized with holes. Replace the throughput/progress-based send timeout with a peer-liveness timeout: - Add TCPFrameType::BUSY (wire version 2 -> 3). - TCPImagePuller runs a heartbeat thread that sends BUSY every 250ms on a thread independent of the (possibly stalled) write path, so liveness keeps flowing during deep stalls. A send_mutex serializes ACK/pong/heartbeat writes. - TCPStreamPusher refreshes last_peer_activity_ns on every inbound frame and only declares a connection dead after peer_liveness_timeout (15s) of complete silence, tolerating arbitrarily long backpressure while still catching a genuinely dead peer (and immediate EPIPE/ECONNRESET). - Re-key both backpressure waits (the SendAll data path and the post-END WaitForEndAck) onto this liveness signal instead of byte-progress / DATA-ACK-progress, so a slow final flush at END is tolerated too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> |
||
|
|
64002f1e29 |
v1.0.0-rc.129 (#36)
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 11m14s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 10m43s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 11m35s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 9m20s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 10m23s
Build Packages / Generate python client (push) Successful in 39s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 11m24s
Build Packages / Create release (push) Has been skipped
Build Packages / Build documentation (push) Successful in 1m0s
Build Packages / build:rpm (rocky8) (push) Successful in 10m35s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 10m35s
Build Packages / build:rpm (rocky9) (push) Successful in 11m17s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m9s
Build Packages / Unit tests (push) Failing after 1h18m57s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.124. * jfjoch_broker: Significant improvements in TCP image socket, as a viable alternative for ZeroMQ sockets (only a single port on broker side, dynamically change number of writers, acknowledgments for written files) * jfjoch_broker: Delta phi is calculated also for still data in Bragg prediction * jfjoch_broker: Image pusher statistics are accessible via the REST interface * jfjoch_writer: Supports TCP image socket and for these auto-forking option Reviewed-on: #36 Co-authored-by: Filip Leonarski <filip.leonarski@psi.ch> Co-committed-by: Filip Leonarski <filip.leonarski@psi.ch> |
||
|
|
f3e0a15d26 |
v1.0.0-rc.127 (#34)
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 10m51s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 8m0s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m6s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 10m7s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 9m47s
Build Packages / Generate python client (push) Successful in 29s
Build Packages / Build documentation (push) Successful in 43s
Build Packages / Create release (push) Has been skipped
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 10m46s
Build Packages / build:rpm (rocky8) (push) Successful in 9m33s
Build Packages / Unit tests (push) Has been skipped
Build Packages / build:rpm (ubuntu2204) (push) Successful in 8m47s
Build Packages / build:rpm (rocky9) (push) Successful in 9m55s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m4s
This is an UNSTABLE release. The release has significant modifications and bug fixes, if things go wrong, it is better to revert to 1.0.0-rc.124. * jfjoch_broker: Default EIGER readout time is 20 microseconds * jfjoch_broker: Multiple improvements regarding performance * jfjoch_broker: Image buffer allows to track frames in preparation and sending * jfjoch_broker: Dedicated thread for ZeroMQ transmission to better utilize the image buffer * jfjoch_broker: Experimental implementation of transmission with raw TCP/IP sockets * jfjoch_writer: Fixes regarding properly closing files in long data collections * jfjoch_process: Scale & merge has been significantly improved, but it is not yet integrated into mainstream code Reviewed-on: #34 |