Customize a Test Sequence

Purpose of this Chapter

The aim of this chapter is to explain how to create and perform a test sequence. This will be explained by taking a generated time series.

Basic Example

Let’s assume that our test sequence has three tests. The results of the tests will be merged by taking the lowest probability. After merging, we check if too many data points failed the sequence. A data point failed when the probability is lower than 10%. If more than 40% of the data points failed then all probabilities will set to 0. Finally, we map the probabilities to validities.

Generate Sample Data

For demonstration purposes, we generate a time series. The series has intentionally some features which the test sequence will detect. We assume that our series contains invalid values at the beginning and the end. Moreover, our series has an outlier and three points that are smaller than five. Since the constructor of the class TestSequence expects a BaseStructure as a parameter, we have to create an instance of the class Series.

import numpy as np
import pandas as pd
from autom8qc.core.structures import Series

np.random.seed(42)
mu, sigma = 50, 2
values = np.random.normal(mu, sigma, 1000)
values[100:901] -= 40
values[500] = 30
index = pd.date_range(start="1/1/2021", periods=1000, freq="S")
series = pd.Series(values, index=index)
data = Series(name="Example", data=series)

Create Sequence

Create Tests

As already mentioned, the time series has invalid values at the beginning and the end. These can be detected with the TimeRangeTest. Therefore, you have to define the start and the end of the range for valid data points. All data points outside the specified range are invalid. Moreover, we perform the GlobalMinimumTest that checks if a data point is lower than the defined minimum. If so, the data point is invalid. Finally, we create an instance of the class LOFTest (Local Outlier Factor Test) to detect the outlier.

from datetime import datetime
from autom8qc.qaqc.general import TimeRangeTest
from autom8qc.qaqc.limit import GlobalMinimumTest
from autom8qc.qaqc.outlier import LOFTest

start = datetime(2021, 1, 1, 0, 1, 40)
end = datetime(2021, 1, 1, 0, 15)
time_test = TimeRangeTest(start=start, end=end)
minimum_test = GlobalMinimumTest(min_val=5)
outlier_test = LOFTest(neighbors=100, contamination=1e-3)

Create Mapper and Rule

Now we create an instance of the class StandardValidityMapper to map the probabilities to validities (see also: autom8qc.core.validities.StandardValidities). Moreover, we create an instance of the class LowerFrequencyRule which checks if too many data points failed the sequence. A data point failed the test if the probability is lower than the defined threshold. If more than 40% of the data points failed the sequence, the instance will set all probabilities to 0.

from autom8qc.mappers.validities import StandardValidityMapper
from autom8qc.rules.frequency import LowerFrequencyRule

mapper = StandardValidityMapper()
rule = LowerFrequencyRule(threshold=0.1, rel_frequency=0.4)

Create Sequence

Note

To access the results of the test sequence, you have to use the property sequence.results.

Now it’s time to create the sequence. First, we create an instance and then we will add the tests. The default measure of a sequence is an instance of the class WorstProbabilityMeasure which takes the lowest probability. If you want to use another measure, you have to pass it to the constructor.

from autom8qc.qaqc.base import TestSequence

sequence = TestSequence(data=data, mapper=mapper, prob_rule=rule)
sequence.add_test(name="Time Range Test", threshold=0.5, test=time_test)
sequence.add_test(name="Global Minimum Test", threshold=0.5, test=minimum_test)
sequence.add_test(name="Outlier Test", threshold=0.5, test=outlier_test)
sequence.perform()
sequence.plot()