wiki/A-chat-about-saving-data.md

124 lines
4.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# **Saving multiple sources at SwissFEL: a very serious guide**
*Scroll down for the boring code*
**Hey, look at my nice data!**
*What a load of bs!*
**Oy, no need to be rude!**
*Im not! I mean, what a load of **Beam Synchronous** data!*
**Well, thanks... but its not all saved.**
*Bummer. How were you saving it?*
**I used this cool command-line script `bs...`**
*Ooooh... Im going to stop you right there.*
**But my bs command works; there must be a problem with the sources!**
*Calm down, detective, try saving each source individually with the `bs` command...*
**OK, hang on... what the?! They *both* save fine on their own!**
*Yep, the issue is when you try and save sources from different IOCs/devices with the `bs` command. The data is taken from the **dispatcher**. If the two sources dont arrive at the dispatcher within a small time window, only the first source is sent in the message. Different sources arrive at the dispatcher at **different times**.*
**What type of BS is that?! I can't wait and wait...**
*Good question. Some bs data comes from **pipelines**, where calculations and moving data around **takes time**. Youre not just saving numbers—youre saving processed results.*
**Pipelines?! I want data, not plumbing problems!**
*Think of pipelines as hardworking elves doing data analysis behind the scenes. No pipelines, more work for you.*
**Alright, Im sold. But how do I save multiple sources without all this drama?**
*You need to save from the **data buffer**. The system can handle sources arriving at slightly different times there.*
**The data buffer? How?**
*Youve got plenty of tools for accessing it:*
- **[DataHub](https://github.com/paulscherrerinstitute/datahub)** (don't ask about a front-end)
- **[Data API](https://github.com/paulscherrerinstitute/data_api_python)** (if you speak code)
- **[Eco](https://github.com/paulscherrerinstitute/eco), [Slic](https://gitlab.psi.ch/slic), Service Now, Concour, Time** (some might not work).
**Steady your sources, Doc Brown, I just checked my data and one of the sources isn't running at 100 Hz and missing data, I told you the source was the problem**
*Missing data isn't necessarily a problem; the approaches above can handle missing pulse IDs and return you all the data that is in the data buffer*
**Pulse IDs? Does my data need to prove its age?**
*No, its a way to sort data, with bs data every shot had a unique pulse ID. You can use SwissFEL data analysis packages to match arrays with missing shots in for you*
**Cool! Anything else I should know?**
*Yes: the databuffer cant clear your desk at PiA*
**✅ Do Say:**
*Pulse IDs rock my world!*
**❌ Dont Say:**
*Beam synchronous PV*
## Example scripts using datahub
### Saving historic data for a set time range
```python
from datahub import *
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
# Set time range: from 6 minutes ago to 5 minutes ago
now = datetime.now()
from_time, to_time = [
(now - timedelta(minutes=m)).strftime('%Y-%m-%d %H:%M:%S.%f')[:-3]
for m in [6, 5]
]
# Define the channels to monitor
channels = [
"SARFE10-PSSS059:SPECTRUM_Y",
"SAROP21-PBPS133:INTENSITY",
"SARFE10-PBPG050:FAST-PULSE-ENERGY"
]
# Construct the query with channels and time range
query = {
"channels": channels,
"start": from_time,
"end": to_time
}
# Connect to the data source and retrieve data
with Daqbuf(backend="sf-databuffer", cbor=True) as source:
table = Table()
source.add_listener(table)
source.request(query)
dataframe = table.as_dataframe(index=Table.PULSE_ID)
# Iterate through each channel and print the number of pulses
for channel in channels:
if channel in dataframe.columns:
NumShots = dataframe[channel].count()
print(f"{channel}: {NumShots} pulses")
else:
print(f"{channel}: Channel not found in the dataframe.")
```
### Example of a stream of live data
```python
from datahub import Bsread, Table
channels = [
"SARFE10-PSSS059:SPECTRUM_Y",
"SAROP21-PBPS133:INTENSITY",
"SARFE10-PBPG050:FAST-PULSE-ENERGY"
]
with Bsread() as source:
table = Table()
source.add_listener(table)
source.req(channels, 0.0, 2.0)
dataframe = table.as_dataframe(index=Table.PULSE_ID)
for channel in channels:
if channel in dataframe.columns:
NumShots = dataframe[channel].count()
print(f"{channel}: {NumShots} pulses")
else:
print(f"{channel}: Channel not found in the dataframe.")
```