Customize a Test Sequence
Purpose of this Chapter
The aim of this chapter is to explain how to create and perform a test sequence. This will be explained by taking a generated time series.
Basic Example
Let’s assume that our test sequence has three tests. The results of the tests will be merged by taking the lowest probability. After merging, we check if too many data points failed the sequence. A data point failed when the probability is lower than 10%. If more than 40% of the data points failed then all probabilities will set to 0. Finally, we map the probabilities to validities.
Generate Sample Data
For demonstration purposes, we generate a time series. The series has intentionally some features which the test sequence will detect. We assume that our series contains invalid values at the beginning and the end. Moreover, our series has an outlier and three points that are smaller than five. Since the constructor of the class TestSequence expects a BaseStructure as a parameter, we have to create an instance of the class Series.
import numpy as np
import pandas as pd
from autom8qc.core.structures import Series
np.random.seed(42)
mu, sigma = 50, 2
values = np.random.normal(mu, sigma, 1000)
values[100:901] -= 40
values[500] = 30
index = pd.date_range(start="1/1/2021", periods=1000, freq="S")
series = pd.Series(values, index=index)
data = Series(name="Example", data=series)
Create Sequence
Create Tests
As already mentioned, the time series has invalid values at the beginning and the end. These can be detected with the TimeRangeTest. Therefore, you have to define the start and the end of the range for valid data points. All data points outside the specified range are invalid. Moreover, we perform the GlobalMinimumTest that checks if a data point is lower than the defined minimum. If so, the data point is invalid. Finally, we create an instance of the class LOFTest (Local Outlier Factor Test) to detect the outlier.
from datetime import datetime
from autom8qc.qaqc.general import TimeRangeTest
from autom8qc.qaqc.limit import GlobalMinimumTest
from autom8qc.qaqc.outlier import LOFTest
start = datetime(2021, 1, 1, 0, 1, 40)
end = datetime(2021, 1, 1, 0, 15)
time_test = TimeRangeTest(start=start, end=end)
minimum_test = GlobalMinimumTest(min_val=5)
outlier_test = LOFTest(neighbors=100, contamination=1e-3)
Create Mapper and Rule
Now we create an instance of the class StandardValidityMapper to map the probabilities to validities (see also: autom8qc.core.validities.StandardValidities). Moreover, we create an instance of the class LowerFrequencyRule which checks if too many data points failed the sequence. A data point failed the test if the probability is lower than the defined threshold. If more than 40% of the data points failed the sequence, the instance will set all probabilities to 0.
from autom8qc.mappers.validities import StandardValidityMapper
from autom8qc.rules.frequency import LowerFrequencyRule
mapper = StandardValidityMapper()
rule = LowerFrequencyRule(threshold=0.1, rel_frequency=0.4)
Create Sequence
Note
To access the results of the test sequence, you have to use the property sequence.results.
Now it’s time to create the sequence. First, we create an instance and then we will add the tests. The default measure of a sequence is an instance of the class WorstProbabilityMeasure which takes the lowest probability. If you want to use another measure, you have to pass it to the constructor.
from autom8qc.qaqc.base import TestSequence
sequence = TestSequence(data=data, mapper=mapper, prob_rule=rule)
sequence.add_test(name="Time Range Test", threshold=0.5, test=time_test)
sequence.add_test(name="Global Minimum Test", threshold=0.5, test=minimum_test)
sequence.add_test(name="Outlier Test", threshold=0.5, test=outlier_test)
sequence.perform()
sequence.plot()