Performing a Test

Purpose of this Chapter

The aim of this chapter is to explain how to perform a test and map the resulting probabilities to another domain. Therefore, a series will be read from a CSV file and parsed to pd.Series. Based on it, a test will be performed to validate the data points of the series and the probabilities will be plotted. Finally, a mapper will be used to map the probabilities to validities.

Read the Data

Note

For simplicity’s sake, the data will be read from a CSV file. But pandas also provides methods to read the data from a Database, to parse REST-API requests, and many more. It’s highly recommended to check out the official panda’s documentation https://pandas.pydata.org/docs/ to figure out how you can read the data from your system.

import pandas as pd
series = pd.read_csv("example.csv", index_col=0, squeeze=True, parse_dates=True)

Perform a Global Minimum Test

Let’s assume that we want to perform the Global Minimum test. Values lower than the defined minimum min_val are invalid, otherwise they are valid. Optional you can use the parameter min_lim, which defines a lower limit for doubtful values. Data points between min_val and min_lim have a linear interpolated probability. If a data point is nearby the value min_val, it has a high probability. If a data point is nearby min_lim, it has a low probability. To perform a test, you have to create an instance of the class GlobalMinimumTest and pass the parameters min_val and min_lim to the constructor. Subsequently, you can apply the test to the parsed data.

from autom8qc.qaqc.limit import GlobalMinimumTest
test = GlobalMinimumTest(min_val=0, min_lim=-0.072)
probabilities = test.perform(series)

Plot the Results

Often it’s good to plot the results of the tests. Instead of writing a code section for it by your own, you can use the method plot, which plots the results in a standardized way. Since the method is implemented in the abstract base class autom8qc.qaqc.base.QAQCTest, every test provides it. The method plot expects the following parameters:

  • series: Data points of the series

  • probabilities (optional): If the probabilities are not passed to the function, the function will calculate the probabilities. Otherwise the given probabilities will be used

  • series_name: Name of the series

test.plot(series=series, probabilities=probabilities, series_name="NO_P2b [ppb]")
../../_images/test_standardized.svg

Map the Probabilities

In the previous section, we learned how to perform a simple test. Based on it, we want to map the probabilities to validities. Therefore, we use the StandardValidityMapper which maps probabilities between 0.3 and 0.8 to Limited, probabilities lower equals than 0 to Erroneous, and probabilities greater equals than 0.8 to Good, and NaN-values to Missing. To visualize the results, you can use the method plot. The method expects the following parameters:

  • data: Data points of the series

  • values: Values that should be mapped

  • mapped_values (optional): If it’s not passed, the values will be mapped before plotting.

  • description: Description of the data

from autom8qc.mappers.validities import StandardValidityMapper

mapper = StandardValidityMapper(good_limit=0.8, limited_limit=0.3)
mapped_values = mapper.map(probabilities)
mapper.plot(series, mapped_values=mapped_values, description="NO_P2b [ppb]")
../../_images/test_mapping.svg