wiki/A-chat-about-saving-data.md

4.7 KiB
Raw Permalink Blame History

Saving multiple sources at SwissFEL: a very serious guide

Scroll down for the boring code

Hey, look at my nice data!
What a load of bs!

Oy, no need to be rude!
Im not! I mean, what a load of Beam Synchronous data!

Well, thanks... but its not all saved.
Bummer. How were you saving it?

I used this cool command-line script bs...
Ooooh... Im going to stop you right there.

But my bs command works; there must be a problem with the sources!
Calm down, detective, try saving each source individually with the bs command...

OK, hang on... what the?! They both save fine on their own!
Yep, the issue is when you try and save sources from different IOCs/devices with the bs command. The data is taken from the dispatcher. If the two sources dont arrive at the dispatcher within a small time window, only the first source is sent in the message. Different sources arrive at the dispatcher at different times.

What type of BS is that?! I can't wait and wait...
Good question. Some bs data comes from pipelines, where calculations and moving data around takes time. Youre not just saving numbers—youre saving processed results.

Pipelines?! I want data, not plumbing problems!
Think of pipelines as hardworking elves doing data analysis behind the scenes. No pipelines, more work for you.

Alright, Im sold. But how do I save multiple sources without all this drama?
You need to save from the data buffer. The system can handle sources arriving at slightly different times there.

The data buffer? How?
Youve got plenty of tools for accessing it:

  • DataHub (don't ask about a front-end)
  • Data API (if you speak code)
  • Eco, Slic, Service Now, Concour, Time (some might not work).

Steady your sources, Doc Brown, I just checked my data and one of the sources isn't running at 100 Hz and missing data, I told you the source was the problem

Missing data isn't necessarily a problem; the approaches above can handle missing pulse IDs and return you all the data that is in the data buffer

Pulse IDs? Does my data need to prove its age?

No, its a way to sort data, with bs data every shot had a unique pulse ID. You can use SwissFEL data analysis packages to match arrays with missing shots in for you

Cool! Anything else I should know?
Yes: the databuffer cant clear your desk at PiA

Do Say: Pulse IDs rock my world!

Dont Say: Beam synchronous PV

Example scripts using datahub

Saving historic data for a set time range

from datahub import *
import matplotlib.pyplot as plt
from datetime import datetime, timedelta

# Set time range: from 6 minutes ago to 5 minutes ago
now = datetime.now()
from_time, to_time = [
    (now - timedelta(minutes=m)).strftime('%Y-%m-%d %H:%M:%S.%f')[:-3] 
    for m in [6, 5]
]

# Define the channels to monitor
channels = [
    "SARFE10-PSSS059:SPECTRUM_Y",
    "SAROP21-PBPS133:INTENSITY",
    "SARFE10-PBPG050:FAST-PULSE-ENERGY"
]

# Construct the query with channels and time range
query = {
    "channels": channels,
    "start": from_time,
    "end": to_time
}

# Connect to the data source and retrieve data
with Daqbuf(backend="sf-databuffer", cbor=True) as source:
    table = Table()
    source.add_listener(table)
    source.request(query)
    dataframe = table.as_dataframe(index=Table.PULSE_ID)
    
    # Iterate through each channel and print the number of pulses
    for channel in channels:
        if channel in dataframe.columns:
            NumShots = dataframe[channel].count()
            print(f"{channel}: {NumShots} pulses")
        else:
            print(f"{channel}: Channel not found in the dataframe.")

Example of a stream of live data

from datahub import Bsread, Table

channels = [
    "SARFE10-PSSS059:SPECTRUM_Y",
    "SAROP21-PBPS133:INTENSITY",
    "SARFE10-PBPG050:FAST-PULSE-ENERGY"
]

with Bsread() as source:
    table = Table()
    source.add_listener(table)
    source.req(channels, 0.0, 2.0)
    dataframe = table.as_dataframe(index=Table.PULSE_ID)

    for channel in channels:
        if channel in dataframe.columns:
            NumShots = dataframe[channel].count()
            print(f"{channel}: {NumShots} pulses")
        else:
            print(f"{channel}: Channel not found in the dataframe.")