Finding historical data

If you've already read the Creating a schema and Updating the knowledge base user guides, you should be familiar with how the data in the knowledge base is structured and updated. But sooner or later you will want to access that data; for example, to review which parameters work best for your QPU, or to investigate performance trends over time.

Currently, there are two ways to access data stored in the knowledge base:

using the dashboard
using Python code

These two methods complement each other: accessing data through the dashboard does not require any advanced knowledge, but performing in-depth analyses is easier using the Python code. These are detailed in the following sections.

Using the dashboard¶

The dashboard provides a very convenient way to browse through different QPUs, workflows, and workflow runs. When selecting a given workflow run, you get an overview of all the tasks executed. Clicking a task node will expand a side bar showing task details and the notebook that was used to execute it, as well as a preview of the data analysis.

You can also browse all the experiments and their results using the Experiment Database. With its help, you can access experimental data organised by knowledge base branches, qubits, experiments, and execution times. If a particular experiment was performed as a part of a workflow run, you'll be able to navigate to that specific run.

Using Python code¶

Despite the ease of use the dashboard provides, it cannot be used to perform automated analyses, e.g. inside a Jupyter notebook. For that, you'll need to do some Python programming.

The knowledge base can be queried using GraphQL. To understand the query structure and logic, you need to look into the structure of the schema, as it provides the data model for the GraphQL queries.

Let's start with an example and then go into the details.

Example of querying all T1 experiments¶

Let's assume that inside your schema.py file you have the following entry:

schema.py

from qruise.experiment.schema.core import CoreSchema, Experiment, Quantity
from qruise.kb import Schema

with Schema(
    title="My schema",
    extends=CoreSchema,
) as MySchema:
    class T1(Experiment):
        t1: Quantity | None

    # the rest of the schema file
    # ...

Now suppose that you want to find all runs of the T1 experiment. You can achieve that with the following code snippet:

from gql import gql

from qruise.experiment.utils.kb_utils import create_kb_session
from qruise.kb import create_session
from qruise.kb.gql import GqlClientFactoryRequestsHTTP


session = create_kb_session()
gql_client_factory = GqlClientFactoryRequestsHTTP(session.client)

query = gql(
    """
    {
        T1 {
            id
            timestamp
        }
    }
    """
)

with gql_client_factory.gql_client() as gql_client:
    result = gql_client.execute(query)
    print(result)

{'T1': [{'id': 'clwut1ztp0000foqs9n5sxawy', 'timestamp': '2024-05-31T14:54:35.896668Z'}, {'id': 'clwut210c0000gsqsyu90a8pq', 'timestamp': '2024-05-31T14:54:35.555013Z'}]}

If that looks intimidating, don't worry – let's break down each line to understand what's going on.

The first five lines are just importing the necessary libraries. Note that besides the usual QruiseOS stuff, we also import the gql library in line 1. Then in line 8 we create a knowledge base session, which we use to instantiate a GraphQL client in line 9.

Now let's look at the lines 13 to 18 – the query itself. This syntax means we want to query all objects of type T1 and select the id and timestamp attributes. Note that neither attribute is explicitly defined in the schema – both id and timestamp, are inherited from the base Experiment class.

Finally, we use the created client and query to get the result from the knowledge base in lines 23 and 24, and print it in line 25. Easy!

Schema structure and queries¶

The GraphQL queries that you execute must be compatible with the knowledge base schema. This means you can only query the fields available in your knowledge base class. Which fields are those? Simply put: the ones you define in the schema, plus all the fields inherited from the base class.

To understand that concept better, let's look at the example of the T1 class again:

schema.py

from qruise.experiment.schema.core import CoreSchema, Experiment, Quantity
from qruise.kb import Schema

with Schema(
    title="My schema",
    extends=CoreSchema,
) as MySchema:
    class T1(Experiment):
        t1: Quantity | None

    # the rest of the schema file
    # ...

As you can see, the T1 class defines one field, i.e. t1. Now let's look at the definition of the base class – qruise.experiment.schema.core.Experiment:

class Experiment(Event):
    # class docstring omitted for brevity

    _abstract = ()
    targets: List[Component]
    success: bool | None
    error: str | None
    batch: str | None
    storage: str | None
    notebook: str | None
    data: Dataset | None
    output_key: str | None
    html_output_key: str | None
    user: str | None
    start_time: dt.datetime | None
    elapsed: float | None
    ref: str | None
    flow_run_id: str | None
    task_run_id: str | None

As you can see, there are quite a few fields! All of these can be used to construct queries for the T1 class. However, notice that there is no timestamp field - so how could we have used it for querying T1 in the previous section? The answer becomes clear when we take a look at the base qruise.experiment.schema.core.Event class:

class Event(DocumentTemplate):
    # class docstring omitted for brevity

    _abstract = ()  # Abstract means that it cannot have any instances
    _key = LexicalKey("id")  # Specifies a specific key generation method to use
    id: str  # Unique event identifier (lexical key)
    desc: str | None  # Event description (added by system)
    comment: str | None  # User comment (added by user)
    timestamp: dt.datetime | None  # Event timestamp (added by system)
    type_id: str | None  # Event type identifier

You can see that both the timestamp and id fields are defined here, i.e. the Experiment class inherits them from the Event base class.

Querying nested elements¶

Let's have another look at the T1 class defined in the schema in the previous sections. The t1 field is of type qruise.experiment.schema.core.Quantity, itself defined as:

class Quantity(QuantityBase):
    """Physical Quantity"""

    value: float | None
    std: float | None  # standard deviation
    range: Range | None
    window: float | None
    step: float | None
    default: float | None  # default value
    display: Display | None

    # the rest of the definition omitted for brevity

As you can see, it has fields such as value and std. You can use GraphQL queries to query for those fields, too, using the so-called nested bracketing:

query = gql(
    """
    {
        T1 {
            id
            timestamp
            targets {
                name
            }
            t1 {
                value
                std
            }
        }
    }
    """
)

As you can see, besides querying for the id and timestamp fields, we also ask the knowledge base about the name of the targets (i.e. qubit names), and the value and std fields of the t1 field itself.

Data formatting¶

When you use GraphQL queries, the client returns a standard Python dictionary containing the query result. You can transform it into a more useful form, such as a pandas dataframe using the pandas.json_normalize function:

import pandas as pd

df = pd.json_normalize(result, record_path='T1')

You can now use this dataframe for filtering, statistical analysis, plotting, and much more – the possibilities are endless! For example, to get statistics of \(T_1\) time by qubit, you can simply run the following:

df['qubit'] = df.targets.apply(lambda x : x[0]['name'])
df.groupby('qubit').mean('t1.value')

Exporting measurements and analysis¶

Now that we found data and metadata of interest, how do we load back a dataset from this? For that we need the experiment class name and the unique identifier of an experiment. Then we can simply run the load_document method of a Session object:

experiment_type = 'PulsedSpectroscopy'
uid=  'clx8z24pi0000puqs5ueohlni_Q1'
document = session.load_document(f'{experiment_type}/{uid}')
ds = document.data

The loaded document variable has the structure defined by the experiment class in the schema. Besides, it contains the dataset of raw measurements and analysis ds.