Finding historical data
If you've already read the Creating a schema and Updating the knowledge base user guides, you should be familiar with how the data in the knowledge base is structured and updated. But sooner or later you will want to access that data; for example, to review which parameters work best for your QPU, or to investigate performance trends over time.
Currently, there are two ways to access data stored in the knowledge base:
These two methods complement each other: accessing data through the dashboard does not require any advanced knowledge, but performing in-depth analyses is easier using the Python code. These are detailed in the following sections.
Using the dashboard¶
The dashboard provides a very convenient way to browse through different QPUs, workflows, and workflow runs. When selecting a given workflow run, you get an overview of all the tasks executed. Clicking a task node will expand a side bar showing task details and the notebook that was used to execute it, as well as a preview of the data analysis.
You can also browse all the experiments and their results using the Experiment Database. With its help, you can access experimental data organised by knowledge base branches, qubits, experiments, and execution times. If a particular experiment was performed as a part of a workflow run, you'll be able to navigate to that specific run.
Using Python code¶
Despite the ease of use the dashboard provides, it cannot be used to perform automated analyses, e.g. inside a Jupyter notebook. For that, you'll need to do some Python programming.
The knowledge base can be queried using GraphQL. To understand the query structure and logic, you need to look into the structure of the schema, as it provides the data model for the GraphQL queries.
Let's start with an example and then go into the details.
Example of querying all T1 experiments¶
Let's assume that inside your schema.py
file you have the following entry:
from qruise.experiment.schema.core import CoreSchema, Experiment, Quantity
from qruise.kb import Schema
with Schema(
title="My schema",
extends=CoreSchema,
) as MySchema:
class T1(Experiment):
t1: Quantity | None
# the rest of the schema file
# ...
Now suppose that you want to find all runs of the T1 experiment. You can achieve that with the following code snippet:
If that looks intimidating, don't worry – let's break down each line to understand what's going on.
The first five lines are just importing the necessary libraries. Note that besides the usual QruiseOS stuff, we also import the gql
library in line 1. Then in line 8 we create a knowledge base session, which we use to instantiate a GraphQL client in line 9.
Now let's look at the lines 13 to 18 – the query itself. This syntax means we want to query all objects of type T1
and select the id
and timestamp
attributes. Note that neither attribute is explicitly defined in the schema – both id
and timestamp
, are inherited from the base Experiment
class.
Finally, we use the created client and query to get the result from the knowledge base in lines 23 and 24, and print it in line 25. Easy!
Schema structure and queries¶
The GraphQL queries that you execute must be compatible with the knowledge base schema. This means you can only query the fields available in your knowledge base class. Which fields are those? Simply put: the ones you define in the schema, plus all the fields inherited from the base class.
To understand that concept better, let's look at the example of the T1
class again:
from qruise.experiment.schema.core import CoreSchema, Experiment, Quantity
from qruise.kb import Schema
with Schema(
title="My schema",
extends=CoreSchema,
) as MySchema:
class T1(Experiment):
t1: Quantity | None
# the rest of the schema file
# ...
As you can see, the T1
class defines one field, i.e. t1
. Now let's look at the definition of the base class – qruise.experiment.schema.core.Experiment
:
class Experiment(Event):
# class docstring omitted for brevity
_abstract = ()
targets: List[Component]
success: bool | None
error: str | None
batch: str | None
storage: str | None
notebook: str | None
data: Dataset | None
output_key: str | None
html_output_key: str | None
user: str | None
start_time: dt.datetime | None
elapsed: float | None
ref: str | None
flow_run_id: str | None
task_run_id: str | None
As you can see, there are quite a few fields! All of these can be used to construct queries for the T1
class. However, notice that there is no timestamp
field - so how could we have used it for querying T1
in the previous section? The answer becomes clear when we take a look at the base qruise.experiment.schema.core.Event
class:
class Event(DocumentTemplate):
# class docstring omitted for brevity
_abstract = () # Abstract means that it cannot have any instances
_key = LexicalKey("id") # Specifies a specific key generation method to use
id: str # Unique event identifier (lexical key)
desc: str | None # Event description (added by system)
comment: str | None # User comment (added by user)
timestamp: dt.datetime | None # Event timestamp (added by system)
type_id: str | None # Event type identifier
You can see that both the timestamp
and id
fields are defined here, i.e. the Experiment
class inherits them from the Event
base class.
Querying nested elements¶
Let's have another look at the T1
class defined in the schema in the previous sections. The t1
field is of type qruise.experiment.schema.core.Quantity
, itself defined as:
class Quantity(QuantityBase):
"""Physical Quantity"""
value: float | None
std: float | None # standard deviation
range: Range | None
window: float | None
step: float | None
default: float | None # default value
display: Display | None
# the rest of the definition omitted for brevity
As you can see, it has fields such as value
and std
. You can use GraphQL queries to query for those fields, too, using the so-called nested bracketing:
As you can see, besides querying for the id
and timestamp
fields, we also ask the knowledge base about the name
of the targets
(i.e. qubit names), and the value
and std
fields of the t1
field itself.
Data formatting¶
When you use GraphQL queries, the client returns a standard Python dictionary containing the query result. You can transform it into a more useful form, such as a pandas
dataframe using the pandas.json_normalize
function:
You can now use this dataframe for filtering, statistical analysis, plotting, and much more – the possibilities are endless! For example, to get statistics of \(T_1\) time by qubit, you can simply run the following:
Exporting measurements and analysis¶
Now that we found data and metadata of interest, how do we load back a dataset from this? For that we need the experiment class name and the unique identifier of an experiment. Then we can simply run the load_document
method of a Session
object:
experiment_type = 'PulsedSpectroscopy'
uid= 'clx8z24pi0000puqs5ueohlni_Q1'
document = session.load_document(f'{experiment_type}/{uid}')
ds = document.data
The loaded document
variable has the structure defined by the experiment class in the schema. Besides, it contains the dataset of raw measurements and analysis ds
.