autom8qc.qaqc.general

IncreasingTimeTest

Class

class autom8qc.qaqc.general.IncreasingTimeTest(index=True, timestamps=True)

Bases: autom8qc.qaqc.base.QAQCTest

This class implements a test that checks if the time-index of the series or dataframe is monotonically increasing. If a timestamp is lower equals then the previous timestamp, the probability will be set to 0.

Parameters
  • NAME (str) – Name of the test

  • DESCRIPTION (str) – Description of the test

  • CATEGORY (str) – Category of the test

  • SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)

  • parameters (ParameterList) – Supported parameters (default: None)

Supported parameters:
  • index (bool): True if index should be considered (default). False if the data should be considered.

  • timestamps (bool): True if the time is in timestamps (default). False if the time is given in numbers (e.g., seconds)

perform(data)

Performs the test and returns the probabilities.

Raises

InvalidType – If structure of the given data is not supported

Parameters

data (BaseStructure, pd.Series, pd.DataFrame) – Data points

Returns

Probabilities (1=Valid, 0=Invalid)

Return type

pd.Series

static supported_parameters()

Returns the supported parameters.

Returns

Supported parameters

Return type

ParameterList

Example

# Generate sample data
import datetime
import numpy as np
import pandas as pd
np.random.seed(42)
seconds = np.array([0, 60, 59, 183, 249, 309])
date = datetime.datetime(year=2021, month=1, day=1)
index = date + pd.to_timedelta(seconds, unit="s")
series = pd.Series(np.random.rand(6) * 100, index=index)

# Perform test and plot the results
from autom8qc.qaqc.general import IncreasingTimeTest
test = IncreasingTimeTest()
test.plot(series=series, series_name="Example")

Visualization

../_images/IncreasingTimeTest.svg

IsNaNTest

Class

class autom8qc.qaqc.general.IsNaNTest(valid=False)

Bases: autom8qc.qaqc.base.QAQCTest

This class implements a test that checks for NaN-values.

Parameters
  • NAME (str) – Name of the test

  • DESCRIPTION (str) – Description of the test

  • CATEGORY (str) – Category of the test

  • SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)

  • parameters (ParameterList) – Supported parameters (default: None)

Supported parameters:
  • valid (bool): Defines whether a NaN value is valid or invalid True: NaN-values=1 and non NaN-values=0 & False (default): NaN-values=0 and non NaN-values=1

SUPPORTED_STRUCTURES

alias of autom8qc.core.structures.Series

perform(data)

Performs the test and returns the probabilities.

Raises

InvalidType – If structure of the given data is not supported

Parameters

data (BaseStructure, pd.Series, pd.DataFrame) – Data points

Returns

Probabilities (1=Valid, 0=Invalid)

Return type

pd.Series

static supported_parameters()

Returns the supported parameters.

Returns

Supported parameters

Return type

ParameterList

Example

# Generate sample data
import numpy as np
import pandas as pd
mu, sigma = 50, 5
values = np.random.normal(mu, sigma, 1000)
values[[42, 100, 230, 456, 666, 667, 668]] = np.nan
index = pd.date_range(start="1/1/2021", periods=1000, freq="min")
series = pd.Series(values, index=index)

# Perform test
from autom8qc.qaqc.general import IsNaNTest
test = IsNaNTest()
probabilities = test.perform(series)
print(probabilities)

SpecificValueTest

Class

class autom8qc.qaqc.general.SpecificValueTest(value, valid=True)

Bases: autom8qc.qaqc.base.QAQCTest

This class implements a test that checks for a specific value that is given by the user. Optional, the user can define if the specific value is valid or invalid. If the value is valid, only data points with the value have a 100% probability. The other data points are invalid and have a 0% probability. If the value is invalid, the data points with the value have a 0% probability and the others have a 100% probability.

Note

The test supports only instances of pd.Series

Parameters
  • NAME (string) – Name of the test

  • DESCRIPTION (string) – Description of the test

  • CATEGORY (string) – Category of the test

  • parameters (ParameterList) – Supported parameters

Supported parameters:
  • value (float): Value

  • valid (bool): If True, data points with the value are valid. If False, data points with the value are invalid. (default: True)

SUPPORTED_STRUCTURES

alias of autom8qc.core.structures.Series

perform(data)

Performs the test and returns the probabilities. If a data point has nan as value, the probability for the point will also be nan.

Raises

InvalidType – If structure of the given data is not supported

Parameters

data (BaseStructure) – Data points

Returns

Probabilities

Return type

pd.Series

static supported_parameters()

Returns the supported parameters.

Returns

Supported parameters

Return type

ParameterList

Example

# Generate sample data
import numpy as np
import pandas as pd
np.random.seed(42)
values = np.random.randint(3, size=500)
index = pd.date_range(start="1/1/2021", periods=500, freq="min")
series = pd.Series(values, index=index)

# Perform test
from autom8qc.qaqc.general import SpecificValueTest
test = SpecificValueTest(value=1, valid=True)
test.plot(series=series, series_name="Example")

Visualization

../_images/SpecificValueTest.svg

TimeGapsTest

Class

class autom8qc.qaqc.general.TimeGapsTest(distance, delta=5)

Bases: autom8qc.qaqc.base.QAQCTest

This test checks for time gaps that are too long or too short. For example, if your measurements are recorded every minute, you will expect that the distances between the points are 60 seconds. For that case, you have to set the parameter distance to 60. Optional, you can use the parameter delta, which defines a range (distance +- delta). Timestamps outside the range are invalid. Otherwise, the probability will be linear interpolated. The greater the distance is from the expected timestamp, the lower is the probability. If you don’t want interpolated probabilities, set delta=1.

Warning

The probability for (distance - delta & distance + delta) is 0. Only values between the limits have a positive probability.

Important

If a gap is too long or too short, the second point’s probability will be 0 and not of the first data point.

Parameters
  • NAME (str) – Name of the test

  • DESCRIPTION (str) – Description of the test

  • CATEGORY (str) – Category of the test

  • SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)

  • parameters (ParameterList) – Supported parameters (default: None)

Supported parameters:
  • distance (int): Expected distance between the points in seconds

  • delta (int): Tolerance limit in seconds (default: 5)

perform(data)

Performs the test and returns the probabilities.

Raises

InvalidType – If structure is not supported

Parameters

series (pd.Series or Series) – Series

Returns

Probabilities

Return type

pd.Series

static supported_parameters()

Returns the supported parameters.

Returns

Supported parameters

Return type

ParameterList

Example

# Generate sample data
import datetime
import numpy as np
import pandas as pd
seconds = np.array([0, 60, 123, 183, 249, 309])
date = datetime.datetime(year=2021, month=1, day=1)
index = date + pd.to_timedelta(seconds, unit="s")
series = pd.Series(np.random.rand(6) * 100, index=index)

# Perform test
from autom8qc.qaqc.general import TimeGapsTest
test = TimeGapsTest(distance=60, delta=5)
test.plot(series=series, series_name="Example")

Visualization

../_images/TimeGapsTest.svg

TimeRangeTest

Class

class autom8qc.qaqc.general.TimeRangeTest(start, end)

Bases: autom8qc.qaqc.base.QAQCTest

This class implements a test that checks if the timestamps are inside the defined time range. If a data point is not in the time range, the point is invalid; therwise, the data point is valid.

Parameters
  • NAME (str) – Name of the test

  • DESCRIPTION (str) – Description of the test

  • CATEGORY (str) – Category of the test

  • SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)

  • parameters (ParameterList) – Supported parameters (default: None)

Supported parameters:
  • start (datetime): Start of the time range

  • end (datetime): End of the time range

perform(data)

Performs the test and returns the probabilities.

Raises

InvalidType – If structure of the given data is not supported

Parameters

data (BaseStructure, pd.Series, pd.DataFrame) – Data points

Returns

Probabilities (1=Valid, 0=Invalid)

Return type

pd.Series

static supported_parameters()

Returns the supported parameters.

Returns

Supported parameters

Return type

ParameterList

Example

# Generate sample data
from datetime import datetime
import numpy as np
import pandas as pd
index = pd.date_range(start="1/1/2021", end="1/10/2021")
series = pd.Series(np.random.rand(10) * 100, index=index)

# Perform test
from autom8qc.qaqc.general import TimeRangeTest
test = TimeRangeTest(start=datetime(2021, 1, 3), end=datetime(2021, 1, 8))
test.plot(series=series, series_name="Example")

Visualization

../_images/TimeRangeTest.svg