************************* Customize a Test Sequence ************************* Purpose of this Chapter ======================= The aim of this chapter is to explain how to create and perform a test sequence. This will be explained by taking a generated time series. Basic Example ============= Let's assume that our test sequence has three tests. The results of the tests will be merged by taking the lowest probability. After merging, we check if too many data points failed the sequence. A data point failed when the probability is lower than 10%. If more than 40% of the data points failed then all probabilities will set to 0. Finally, we map the probabilities to validities. .. figure:: ../../figures/examples/sequence_example.svg :width: 100% Generate Sample Data ==================== For demonstration purposes, we generate a time series. The series has intentionally some features which the test sequence will detect. We assume that our series contains invalid values at the beginning and the end. Moreover, our series has an outlier and three points that are smaller than five. Since the constructor of the class **TestSequence** expects a **BaseStructure** as a parameter, we have to create an instance of the class **Series**. .. code-block:: python import numpy as np import pandas as pd from autom8qc.core.structures import Series np.random.seed(42) mu, sigma = 50, 2 values = np.random.normal(mu, sigma, 1000) values[100:901] -= 40 values[500] = 30 index = pd.date_range(start="1/1/2021", periods=1000, freq="S") series = pd.Series(values, index=index) data = Series(name="Example", data=series) Create Sequence =============== Create Tests ------------ As already mentioned, the time series has invalid values at the beginning and the end. These can be detected with the **TimeRangeTest**. Therefore, you have to define the *start* and the *end* of the range for valid data points. All data points outside the specified range are invalid. Moreover, we perform the **GlobalMinimumTest** that checks if a data point is lower than the defined minimum. If so, the data point is invalid. Finally, we create an instance of the class **LOFTest** *(Local Outlier Factor Test)* to detect the outlier. .. code-block:: python from datetime import datetime from autom8qc.qaqc.general import TimeRangeTest from autom8qc.qaqc.limit import GlobalMinimumTest from autom8qc.qaqc.outlier import LOFTest start = datetime(2021, 1, 1, 0, 1, 40) end = datetime(2021, 1, 1, 0, 15) time_test = TimeRangeTest(start=start, end=end) minimum_test = GlobalMinimumTest(min_val=5) outlier_test = LOFTest(neighbors=100, contamination=1e-3) Create Mapper and Rule ---------------------- Now we create an instance of the class **StandardValidityMapper** to map the probabilities to validities (see also: **autom8qc.core.validities.StandardValidities**). Moreover, we create an instance of the class **LowerFrequencyRule** which checks if too many data points failed the sequence. A data point failed the test if the probability is lower than the defined threshold. If more than 40% of the data points failed the sequence, the instance will set all probabilities to 0. .. code-block:: python from autom8qc.mappers.validities import StandardValidityMapper from autom8qc.rules.frequency import LowerFrequencyRule mapper = StandardValidityMapper() rule = LowerFrequencyRule(threshold=0.1, rel_frequency=0.4) Create Sequence --------------- .. note:: To access the results of the test sequence, you have to use the property **sequence.results**. Now it's time to create the sequence. First, we create an instance and then we will add the tests. The default measure of a sequence is an instance of the class **WorstProbabilityMeasure** which takes the lowest probability. If you want to use another measure, you have to pass it to the constructor. .. code-block:: python from autom8qc.qaqc.base import TestSequence sequence = TestSequence(data=data, mapper=mapper, prob_rule=rule) sequence.add_test(name="Time Range Test", threshold=0.5, test=time_test) sequence.add_test(name="Global Minimum Test", threshold=0.5, test=minimum_test) sequence.add_test(name="Outlier Test", threshold=0.5, test=outlier_test) sequence.perform() sequence.plot() .. figure:: ../../figures/examples/sequence_plot.svg :width: 100%