Implement a new Test

Purpose of this Chapter

The aim of this chapter is to explain how to implement a new test. This will be presented by implementing a simple example. Let’s assume that we want to implement a simple Global Minimum test that checks if a value is lower than the defined minimum. If so, the data point is invalid; otherwise, it’s valid. As already mentioned, the tests return a probability for each data point. In our case, we expect that values lower than the minimum are 0% valid and values greater equals the minimum are 100% valid.

Define the Metadata of the Test

Note

Each test has to inherit from the abstract class QAQCTest and has to implement the abstract method perform. Moreover, each test has to provide its metadata. The following metadata has to be provided by each test: NAME, DESCRIPTION, CATEGORY and SUPPORTED_STRUCTURES. See also: autom8qc.qaqc.base.QAQCTest

Warning

Make sure that the class name ends with the suffix Test. Other modules will check for the suffix to identify that the class is a test.

import numpy as np
import pandas as pd

from autom8qc.core import exceptions
from autom8qc.core.structures import Series
from autom8qc.core.parameters import Parameter
from autom8qc.core.parameters import ParameterList
from autom8qc.qaqc.base import QAQCTest
from autom8qc.qaqc.categories import LIMIT_TEST

class GlobalMinimumTest(QAQCTest):

    NAME = "Global Minimum Test"
    DESCRIPTION = "Checks if a data point falls below the defined minimum"
    CATEGORY = LIMIT_TEST
    SUPPORTED_STRUCTURES = Series

Define the Supported Parameters

Each test has to provide its supported parameters. Therefore, you have to implement the static method supported_parameters. The method allows you to access the supported parameters without creating an instance of the class. If your test doesn’t need additional parameters, you don’t have to implement it. In our case, we have the additional parameter min_val which defines the lower limit for the values.

@staticmethod
def supported_parameters():
    return ParameterList(
        Parameter(
            name="min_val",
            description="Limit for valid values",
            dtype=float,
            optional=False,
        )
    )

Implement the Constructor

The constructor is a method that is called when an object is created. In our case, we have to pass the parameter min_val to the constructor and store the value in the related Parameter which we defined in the method supported_parameters. Note that you don’t have to check the type of the parameters since a Parameter checks the type when you set the value. If you want to implement additional checks (e.g., the value must be greater than 0), you have to implement them in the constructor and raise an exception if a constraint is not satisfied. Finally, you have to call the super method check_metadata that checks if the instance is valid.

Warning

Make sure that you call the super constructor before assigning the parameters.

def __init__(self, min_val):
    if 0 >= min_val:
        raise exceptions.InvalidValue("min_val must be greater than 0!")
    super().__init__()
    self.parameters["min_val"] = min_val
    self.check_metadata()

Implement the Abstract Method

Finally, we have to implement the abstract method perform. The method expects the data as a parameter and returns the probabilities (0: Invalid, 1: Valid) for each data point. It’s up to you to define which structures are supported. For our example, we assume that we only support series (Series or pd.Series). If a non-series will pass to the method, our method will raise an error. Therefore, we use the super-method get_data that returns the values of the series and raises an error if the type is invalid.

Warning

Make sure that you don’t override data points.

def perform(self, data):
    series = self.get_data(series)
    min_val = self.parameters["min_val"].value
    mask = probabilities[series >= min_val]
    probabilities = pd.Series(mask, index=series.index, dtype=np.float64)
    probabilities[series.isna()] = np.nan
    return probabilities