autom8qc.qaqc.base

autom8qc.qaqc.base.QAQCContainer

class autom8qc.qaqc.base.QAQCContainer

This class implements a container to collect independent test groups, test sequences, and test managers. The class provides the method perform to perform every element in the container. Internally the class manages the elements in a dictionary.

Important

This class is primarily designed to integrate the framework into an already existing system. It gives you the possibility to collect all tests that should be performed in the container and perform them with one call. Moreover, it gives you the possibility to access the cached results efficiently and easily.

See also

  • autom8qc.core.structures.QAQCComponent

add_element(elem, name)

Adds the given element to the container.

Raises
Parameters
  • elem (QAQCComponent) – Element

  • name (str) – Name of the component

Returns

None

Return type

None

perform()

Performs all elements in the container.

Returns

None

Return type

None

autom8qc.qaqc.base.QAQCTest

../_images/design_test.svg
class autom8qc.qaqc.base.QAQCTest

This class defines the interface for each QA/QC test. Each test has to provide its information (NAME, DESCRIPTION, CATEGORY, parameters and SUPPORTED_STRUCTURES). Moreover, each test has to implement the abstract method perform which expects an instance of the class BaseStructure and returns a probability (between 0 and 1) for each data point.

See also

Warning

If you inherit from this class, make sure that you call the super constructor and implement the abstract method perform.

Parameters
  • NAME (str) – Name of the test

  • DESCRIPTION (str) – Description of the test

  • CATEGORY (str) – Category of the test

  • SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)

  • parameters (ParameterList) – Supported parameters (default: None)

check_category()

Checks if the CATEGORY is set and is supported by the package.

Raises
Returns

None

Return type

None

check_metadata()

Checks if the defined metadata for the test is valid.

Raises
Returns

None

Return type

None

check_structures()

Checks if the supported structures are correctly set.

Raises

InvalidType – Structure is not correctly defined

Returns

None

Return type

None

get_data(structure)

Returns the data of the structure. For example, if your test supports Series, the method will return pd.Series. Moreover, if the given data data is not supported, the method will raise an error.

Raises

InvalidType – If structure is not supported

Parameters

structure (BaseStructure) – Data structure

Returns

Data of the structure

Return type

object

property metadata

Returns the metadata of the test in a dictionary.

Returns

Metadata of the series

Return type

dict

abstract perform(data)

Performs the test and returns the probabilities.

Warning

Make sure that you don’t override data points of the data. For effiency reasons the data won’t be copied.

Important

Use the method get_data to access the data. The method checks if the given data is supported by the test. Moreover, it ensures that your test supports instances of the class BaseStructure and the dtype of the BaseStructure. For example, if your test supports Series, you can pass the types pd.Series and autom8qc.core.structures.Series.

Raises

NotImplementedError – This is an abstract method

Parameters

data (BaseStructure, pd.Series, pd.DataFrame) – Data points

Returns

Probabilities (1=Valid, 0=Invalid)

Return type

pd.Series

plot(series, probabilities=None, series_name=None)

This method plots the results of the test. Therefore, the method performs the test and plots two axes. You can see the series on the left axis, and on the right axis, you can see the values that are marked. Data points which failed the test (i.e., 0% valid) are marked red. Data points which pass the test (i.e., 100% valid ) are marked green. Optionally, you can use the parameter probabilities to avoid calculation twice.

Parameters
  • series (pd.Series) – Series

  • probabilities (pd.Series) – Results of the test (optional)

  • series_name (str) – Name of the series (optional)

Returns

None

Return type

None

savefig(filename, series, probabilities=None, series_name=None)

Saves the plot.

Important

The extension of the filename has to be the format. For example, if you want to store a SVG figure, the filename has to be ./example.svg

Parameters
  • filename (string) – Name of the file

  • series (pd.Series) – Series

  • probabilities (pd.Series) – Results of the test (optional)

  • series_name (str) – Name of the series (optional)

Returns

None

Return type

None

autom8qc.qaqc.base.TestCategory

class autom8qc.qaqc.base.TestCategory

This class provides all categories that are supported by the framework. With this approach, you can make sure that tests of the same categories always have the same category name. Currently, the following categories are supported:

  • TestCategory.GENERAL

  • TestCategory.LIMIT_TEST

  • TestCategory.OUTLIER_TEST

  • TestCategory.FLATLINE_DETECTION

  • TestCategory.PEAK_DETECTION

autom8qc.qaqc.base.TestGroup

../_images/group_design.svg
class autom8qc.qaqc.base.TestGroup(measure, mapper=None, post_func=None, prob_rule=None, mapped_rule=None)

A TestGroup gives you the possibility to perform several tests isolated and merge the results with a measure. In comparison to a sequence or a manager, a test group isn’t bound to any data. It only performs the items (which are bound to data) and merges the results.

Warning

A test group handles the QA/QC tests isolated and combines the results of the tests. If you want to create a sequence in that QA/QC tests depend on each other, you have to use TestSequence.

Execution Pipeline:
  1. Perform tests (required)

  2. Apply rule on the probabilities (optional)

  3. Map the probabilties to another domain (optional)

  4. Apply rule on the mapped values (optional)

  5. Postprocessing (optional)

Parameters
  • mapper (BaseMapper) – Mapper that maps the total results

  • measure (BaseMeasure) – Measure to combine the results

  • post_func (BaseFunction) – Function that will be applied after the test

  • prob_rule (BaseRule) – Rule that will be applied on the probabilities

  • mapped_rule (BaseRule) – Rule that will be applied on the mapped values

add(name, weight=1, test=None)

Adds the given test to the test group. Note that test must be an instance of TestManager or TestGroup.

Raises
Parameters
  • name (str) – Name of handle the tests

  • weight (float) – Weight of the test

  • test (TestManager or TestGroup) – Manager to handle the test or other test group

Returns

None

Return type

None

add_item(item)

Adds a group item to the group.

Parameters

item (GroupItem) – Group item

property mapped_rule

Returns the rule that will be applied on the mapped values.

Returns

Rule that will be applied on the mapped values.

Return type

BaseRule

property mapper

Returns the mapper.

Returns

Mapper that will be applied on the probabilities.

Return type

BaseMapper

property measure

Returns the measure.

Returns

Measure to combine the results

Return type

BaseMeasure

perform()

Performs all tests of the test group and merges the results.

Important

The results of the execution will be cached. The group checks if cached results already exists. If so, the results will be returned without performing the pipeline.

Raises

NoItemsExist – If test group doesn’t contain any test.

Returns

Total results of the test group

Return type

pd.Series

property post_function

Returns the function that will be applied after the test.

Returns

Function that will be applied after the test.

Return type

BaseFunction

property prob_rule

Returns the rule that will be applied on the probabilities.

Returns

Rule that will be applied on the probabilities.

Return type

BaseRule

property results

Returns the results of the tests as a pd.DataFrame. If the results for a test don’t exist, the test will be performed first. Otherwise the cached results will be used.

Warning

The results are the results of each test and not the total results. If you want to access the results that are combined with the measure, you need to use the property total_results

Returns

Results of the tests

Return type

pd.DataFrame

property tests

Returns the names of the tests.

Returns

Names of the tests

Return type

List<str>

property total_results

Returns the combined results of the test group.

Note

If you want to access the results of each test, you have to use the property results.

Returns

Total results

Return type

pd.Series

autom8qc.qaqc.base.TestManager

../_images/manager_design.svg
class autom8qc.qaqc.base.TestManager(data, test, mapper=None, pre_func=None, post_func=None, prob_rule=None, mapped_rule=None, filter_options=None)

A TestManager simplifies the execution of a test and caches the results. Each manager expects an instance of the class BaseStructure to handle the data in standardized way. Moreover, you have to define the test that should be executed (e.g., Global Minimum Test). Optional, you can specify a mapper that maps the probabilities to other values (e.g., validities). You can also define a pre-processing function and a post-processing function that will be applied before the execution of the test (e.g., linear interpolation) or on the results (e.g., filling gaps). In addition, you can also define rules that will be applied on the probabilities and mapped values.

Execution Pipeline:
  1. Preprocessing (optional)

  2. Perform test (required)

  3. Apply rule on the probabilities (optional)

  4. Map the probabilties to another domain (optional)

  5. Apply rule on the mapped values (optional)

  6. Postprocessing (optional)

Parameters
  • data (BaseStructure) – Data (e.g., time series)

  • test (QAQCTest) – Test

  • mapper (BaseMapper) – Mapper to map the probabilities

  • probabilities (pd.Series) – Probabilities

  • mapped_values (pd.Series) – Mapped values

  • pre_function (BaseFunction) – Function that will be applied before the test

  • post_function (BaseFunction) – Function that will be applied after the test

  • prob_rule (BaseRule) – Rule that will be applied on the probabilities

  • mapped_rule (BaseRule) – Rule that will be applied on the mapped values

  • filter_options (dict) – Options to filter the data

clear()

Clears the cache.

Returns

None

Return type

None

property mapped_rule

Returns the rule that will be applied on the mapped values.

Returns

Rule that will be applied on the mapped values.

Return type

BaseRule

property mapper

Returns the mapper.

Returns

Mapper that will be applied on the probabilities.

Return type

BaseMapper

perform()

Performs the mapper and returns the results.

Important

The results of the execution will be cached. The manager checks if cached results already exists. If so, the results will be returned without performing the pipeline.

Returns

Probabilities or mapped values

Return type

pd.Series

property post_function

Returns the function that will be applied after the test.

Returns

Function that will be applied after the test.

Return type

BaseFunction

property pre_function

Returns the function that will be applied before the test.

Returns

Function that will be applied before the test.

Return type

BaseFunction

property prob_rule

Returns the rule that will be applied on the probabilities.

Returns

Rule that will be applied on the probabilities.

Return type

BaseRule

property results

Returns the results of the test.

Returns

Results of the test

Return type

pd.Series

autom8qc.qaqc.base.TestSequence

../_images/sequence_design.svg
class autom8qc.qaqc.base.TestSequence(data, mapper=None, pre_func=None, post_func=None, prob_rule=None, mapped_rule=None, measure=None, filter_options=None)

A TestSequence allows you to create a sequence of several tests that are based on each other. If a data point failed a test, it won’t pass to the next test. A data point failed the test, if the probability is lower than the defined threshold. With this approach, you can ensure that invalid data points don’t affect the next test. Especially, if you use tests that consider the global range of the data (e.g., autom8qc.qaqc.outlier.LOFTest) it’s highly recommended to use this data structure.

Important

If you don’t pass a measure to the constructor, the autom8qc.measures.probabilities.WorstProbabilityMeasure will be used.

Warning

A test group handles the QA/QC tests isolated and combines the results of the tests. If your tests are not depending on each other, you need to use a TestGroup.

Execution Pipeline:
  1. Preprocessing (optional)

  2. Perform tests (required)

  3. Apply rule on the probabilities (optional)

  4. Map the probabilties to another domain (optional)

  5. Apply rule on the mapped values (optional)

  6. Postprocessing (optional)

Parameters
  • data (BaseStructure) – Data that shall be tested

  • mapper (BaseMapper) – Mapper to map the final results

  • pre_function (BaseFunction) – Function that will be applied before the sequence

  • post_function (BaseFunction) – Function that will be applied after the sequence

  • prob_rule (BaseRule) – Rule that will be applied on the probabilities

  • mapped_rule (BaseRule) – Rule that will be applied on the mapped values

  • measure (BaseMeasure) – Measure for the total result

  • filter_options (dict) – Options to filter the data

add_item(item, stage=None)

Add an item to the list.

Parameters

item (SequenceItem) – Item that should be added

Returns

None

Return type

None

add_test(test, threshold, name, stage=None, weight=1)

Adds the test to the sequence. Each test needs a threshold (between 0 and 1) to filter the good values. If a probability is lower than the defined threshold, it won’t pass to the next test. Optional, you can use the parameter stage to set the position of test. If the stage is not set, the test will be appended.

Warning

The sequence is zero-based. If you want to add a new test at the first position, you have to pass 0 for the stage.

Raises
Parameters
  • test (QAQCTest) – Test that should be performed

  • threshold (float) – Threshold to filter the good points (0, 1)

  • name (str) – Name of the test

  • stage (int) – Stage (position) of the test (optionally)

  • weight (int) – Weight of the test

Returns

None

Return type

None

property mapped_rule

Returns the rule that will be applied on the mapped values.

Returns

Rule that will be applied on the mapped values.

Return type

BaseRule

property mapper

Returns the mapper.

Returns

Mapper that will be applied on the probabilities.

Return type

BaseMapper

property measure

Returns the measure.

Returns

Measure to combine the results

Return type

BaseMeasure

perform()

Performs the sequence and returns the results. If the probabilities will be mapped, then the mapped values will return.

Important

The results of the execution will be cached. The sequence checks if cached results already exists. If so, the results will be returned without performing the pipeline.

Raises

NoItemsExist – Sequence does not contain any test

Returns

Probabilities or mapped values

Return type

pd.Series

plot(min_val=None, max_val=None)

This method plots the results of the test sequence.

Returns

None

Return type

None

property post_function

Returns the function that will be applied after the test.

Returns

Function that will be applied after the test.

Return type

BaseFunction

property pre_function

Returns the function that will be applied before the test.

Returns

Function that will be applied before the test.

Return type

BaseFunction

property prob_rule

Returns the rule that will be applied on the probabilities.

Returns

Rule that will be applied on the probabilities.

Return type

BaseRule

property results

Returns the results of the test.

Returns

Results of the test

Return type

pd.Series

property sequence_results

Returns the results of all stages in a pd.DataFrame.

Returns

Results of all stages

Return type

pd.DataFrame