Optionally yield incomplete datasets #1

Open
opened 2025-12-02 10:01:55 +01:00 by augustin_s · 6 comments
Owner

(Reported by @reiche)

Currently, repack "hides" incomplete datasets:

if any(v is None for v in res.values()):

This is for many use cases the correct behavior since incompleteness is temporary (i.e., it happens for one out of many) ...

If the incompleteness is continued because a devices is not sending anything, we get stuck waiting for complete data.

Ideas:

  • add a counter for incomplete datasets, and start to warn for a sufficiently large number of them?
  • add an option for yielding the incomplete datasets with the Nones?
  • add an option for yielding the incomplete datasets without the Nones?
  • raise an error?
(Reported by @reiche) Currently, `repack` "hides" incomplete datasets: https://gitea.psi.ch/SwissFEL/BStrd/src/commit/4e5343d14d4317c03ff7f7587f50d1e7fbf7b39d/bstrd/bscache.py#L176 This is for many use cases the correct behavior since incompleteness is temporary (i.e., it happens for one out of many) ... If the incompleteness is continued because a devices is not sending anything, we get stuck waiting for complete data. Ideas: - add a counter for incomplete datasets, and start to warn for a sufficiently large number of them? - add an option for yielding the incomplete datasets with the Nones? - add an option for yielding the incomplete datasets without the Nones? - raise an error?

I would prefer that per default and incomplete dataset is returns with Nones. Otherwise a lot of my programs would crash since I access the data record by the list of requested channels. It has a higher change to handle 'Nones' than missing keys.

But it should be able to have a flag to allow for valid datasets, where None entries are excluded.

Independent I would add a warning in the routine run() of BSCache for incomplete dataset similar to the case that the msg is None.

I would prefer that per default and incomplete dataset is returns with Nones. Otherwise a lot of my programs would crash since I access the data record by the list of requested channels. It has a higher change to handle 'Nones' than missing keys. But it should be able to have a flag to allow for valid datasets, where None entries are excluded. Independent I would add a warning in the routine run() of BSCache for incomplete dataset similar to the case that the msg is None.
Author
Owner

It should be the exact same change, TBH.

The idiomatic pattern is to replace

x = data["channel"]

with

x = data.get("channel")

and then something like

if x:

or

if x is None:

But, this also allows for:

x = data.get("channel", 0)

for cases where you know the default you want to use...


Anyway, I think I will have to keep the default behavior as it is now, and make the other options configurable via an argument to the BSCache constructor...

It should be the exact same change, TBH. The idiomatic pattern is to replace ```python x = data["channel"] ``` with ```python x = data.get("channel") ``` and then something like ```python if x: ``` or ```python if x is None: ``` But, this also allows for: ```python x = data.get("channel", 0) ``` for cases where you know the default you want to use... --- Anyway, I think I will have to keep the default behavior as it is now, and make the other options configurable via an argument to the BSCache constructor...

It would work for me but I have to check all programs where I read from the bs-stream via BSCache.
I typically go through all my codes once every year to make a sanity check.

It would work for me but I have to check all programs where I read from the bs-stream via BSCache. I typically go through all my codes once every year to make a sanity check.
Author
Owner

It would be one grep for "BSCache" and then changing the relevant ones to:

bscache = BSCache(..., handle_incomplete="keep")

or something similar.

And handle_incomplete could be:

  • None : default, current behavior
  • "keep" : what you want
  • "drop" / "remove" : return dict with None values removed
  • "raise" : raise a ValueError

Or something similar...

It would be one grep for "BSCache" and then changing the relevant ones to: ```python bscache = BSCache(..., handle_incomplete="keep") ``` or something similar. And `handle_incomplete` could be: - None : default, current behavior - "keep" : what you want - "drop" / "remove" : return dict with None values removed - "raise" : raise a ValueError Or something similar...

Hi Sven,
I would at least implement for handle_incomplete = None are message (at least once) that the dataset is incomplete without raising a ValueError. At least that way one can see where the program is hanging.
For me it was not so much the problem that it behaves at it is but not know that it is in this state.
It helps also the operator to inform the general problem that the bstream returns incomplete data, which is a nore general problem the operator should be aware of.
In the mean time I will go through my code and catch the possibility to handle these different cases.

Hi Sven, I would at least implement for handle_incomplete = None are message (at least once) that the dataset is incomplete without raising a ValueError. At least that way one can see where the program is hanging. For me it was not so much the problem that it behaves at it is but not know that it is in this state. It helps also the operator to inform the general problem that the bstream returns incomplete data, which is a nore general problem the operator should be aware of. In the mean time I will go through my code and catch the possibility to handle these different cases.
Author
Owner

Oh sure, the warning+counter should be there for all cases...

Oh sure, the warning+counter should be there for all cases...
Sign in to join this conversation.
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: SwissFEL/BStrd#1