97c3008c39
Build Packages / Unit tests (push) Successful in 56m25s
Build Packages / DIALS test (push) Successful in 13m9s
Build Packages / XDS test (durin plugin) (push) Successful in 9m34s
Build Packages / XDS test (JFJoch plugin) (push) Successful in 9m37s
Build Packages / XDS test (neggia plugin) (push) Successful in 5m57s
Build Packages / Generate python client (push) Successful in 14s
Build Packages / Build documentation (push) Successful in 40s
Build Packages / Create release (push) Skipped
Build Packages / build:rpm (rocky8_nocuda) (push) Successful in 9m57s
Build Packages / build:rpm (ubuntu2204_nocuda) (push) Successful in 9m47s
Build Packages / build:rpm (rocky9_nocuda) (push) Successful in 10m11s
Build Packages / build:rpm (ubuntu2404_nocuda) (push) Successful in 9m57s
Build Packages / build:rpm (rocky8_sls9) (push) Successful in 10m20s
Build Packages / build:rpm (rocky9_sls9) (push) Successful in 10m41s
Build Packages / build:rpm (ubuntu2204) (push) Successful in 10m19s
Build Packages / build:rpm (ubuntu2404) (push) Successful in 9m42s
Build Packages / build:rpm (rocky8) (push) Successful in 11m4s
Build Packages / build:rpm (rocky9) (push) Successful in 11m43s
A slow filesystem could stall the writer's consume pipeline, propagating TCP backpressure to the pusher. The pusher then treated that backpressure as a dead peer and force-closed the connection mid-run; the writer reconnected as a brand-new connection outside the active session, so the rest of the run was silently dropped and the half-written HDF5 file was later finalized with holes. Replace the throughput/progress-based send timeout with a peer-liveness timeout: - Add TCPFrameType::BUSY (wire version 2 -> 3). - TCPImagePuller runs a heartbeat thread that sends BUSY every 250ms on a thread independent of the (possibly stalled) write path, so liveness keeps flowing during deep stalls. A send_mutex serializes ACK/pong/heartbeat writes. - TCPStreamPusher refreshes last_peer_activity_ns on every inbound frame and only declares a connection dead after peer_liveness_timeout (15s) of complete silence, tolerating arbitrarily long backpressure while still catching a genuinely dead peer (and immediate EPIPE/ECONNRESET). - Re-key both backpressure waits (the SendAll data path and the post-END WaitForEndAck) onto this liveness signal instead of byte-progress / DATA-ACK-progress, so a slow final flush at END is tolerated too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>