autom8qc.qaqc.general
IncreasingTimeTest
Class
- class autom8qc.qaqc.general.IncreasingTimeTest(index=True, timestamps=True)
Bases:
autom8qc.qaqc.base.QAQCTest
This class implements a test that checks if the time-index of the series or dataframe is monotonically increasing. If a timestamp is lower equals then the previous timestamp, the probability will be set to 0.
- Parameters
NAME (str) – Name of the test
DESCRIPTION (str) – Description of the test
CATEGORY (str) – Category of the test
SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)
parameters (ParameterList) – Supported parameters (default: None)
- Supported parameters:
index (bool): True if index should be considered (default). False if the data should be considered.
timestamps (bool): True if the time is in timestamps (default). False if the time is given in numbers (e.g., seconds)
- perform(data)
Performs the test and returns the probabilities.
- Raises
InvalidType – If structure of the given data is not supported
- Parameters
data (BaseStructure, pd.Series, pd.DataFrame) – Data points
- Returns
Probabilities (1=Valid, 0=Invalid)
- Return type
pd.Series
- static supported_parameters()
Returns the supported parameters.
- Returns
Supported parameters
- Return type
ParameterList
Example
# Generate sample data
import datetime
import numpy as np
import pandas as pd
np.random.seed(42)
seconds = np.array([0, 60, 59, 183, 249, 309])
date = datetime.datetime(year=2021, month=1, day=1)
index = date + pd.to_timedelta(seconds, unit="s")
series = pd.Series(np.random.rand(6) * 100, index=index)
# Perform test and plot the results
from autom8qc.qaqc.general import IncreasingTimeTest
test = IncreasingTimeTest()
test.plot(series=series, series_name="Example")
Visualization
IsNaNTest
Class
- class autom8qc.qaqc.general.IsNaNTest(valid=False)
Bases:
autom8qc.qaqc.base.QAQCTest
This class implements a test that checks for NaN-values.
- Parameters
NAME (str) – Name of the test
DESCRIPTION (str) – Description of the test
CATEGORY (str) – Category of the test
SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)
parameters (ParameterList) – Supported parameters (default: None)
- Supported parameters:
valid (bool): Defines whether a NaN value is valid or invalid True: NaN-values=1 and non NaN-values=0 & False (default): NaN-values=0 and non NaN-values=1
- SUPPORTED_STRUCTURES
alias of
autom8qc.core.structures.Series
- perform(data)
Performs the test and returns the probabilities.
- Raises
InvalidType – If structure of the given data is not supported
- Parameters
data (BaseStructure, pd.Series, pd.DataFrame) – Data points
- Returns
Probabilities (1=Valid, 0=Invalid)
- Return type
pd.Series
- static supported_parameters()
Returns the supported parameters.
- Returns
Supported parameters
- Return type
ParameterList
Example
# Generate sample data
import numpy as np
import pandas as pd
mu, sigma = 50, 5
values = np.random.normal(mu, sigma, 1000)
values[[42, 100, 230, 456, 666, 667, 668]] = np.nan
index = pd.date_range(start="1/1/2021", periods=1000, freq="min")
series = pd.Series(values, index=index)
# Perform test
from autom8qc.qaqc.general import IsNaNTest
test = IsNaNTest()
probabilities = test.perform(series)
print(probabilities)
SpecificValueTest
Class
- class autom8qc.qaqc.general.SpecificValueTest(value, valid=True)
Bases:
autom8qc.qaqc.base.QAQCTest
This class implements a test that checks for a specific value that is given by the user. Optional, the user can define if the specific value is valid or invalid. If the value is valid, only data points with the value have a 100% probability. The other data points are invalid and have a 0% probability. If the value is invalid, the data points with the value have a 0% probability and the others have a 100% probability.
Note
The test supports only instances of pd.Series
- Parameters
NAME (string) – Name of the test
DESCRIPTION (string) – Description of the test
CATEGORY (string) – Category of the test
parameters (ParameterList) – Supported parameters
- Supported parameters:
value (float): Value
valid (bool): If True, data points with the value are valid. If False, data points with the value are invalid. (default: True)
- SUPPORTED_STRUCTURES
alias of
autom8qc.core.structures.Series
- perform(data)
Performs the test and returns the probabilities. If a data point has nan as value, the probability for the point will also be nan.
- Raises
InvalidType – If structure of the given data is not supported
- Parameters
data (BaseStructure) – Data points
- Returns
Probabilities
- Return type
pd.Series
- static supported_parameters()
Returns the supported parameters.
- Returns
Supported parameters
- Return type
ParameterList
Example
# Generate sample data
import numpy as np
import pandas as pd
np.random.seed(42)
values = np.random.randint(3, size=500)
index = pd.date_range(start="1/1/2021", periods=500, freq="min")
series = pd.Series(values, index=index)
# Perform test
from autom8qc.qaqc.general import SpecificValueTest
test = SpecificValueTest(value=1, valid=True)
test.plot(series=series, series_name="Example")
Visualization
TimeGapsTest
Class
- class autom8qc.qaqc.general.TimeGapsTest(distance, delta=5)
Bases:
autom8qc.qaqc.base.QAQCTest
This test checks for time gaps that are too long or too short. For example, if your measurements are recorded every minute, you will expect that the distances between the points are 60 seconds. For that case, you have to set the parameter distance to 60. Optional, you can use the parameter delta, which defines a range (distance +- delta). Timestamps outside the range are invalid. Otherwise, the probability will be linear interpolated. The greater the distance is from the expected timestamp, the lower is the probability. If you don’t want interpolated probabilities, set delta=1.
Warning
The probability for (distance - delta & distance + delta) is 0. Only values between the limits have a positive probability.
Important
If a gap is too long or too short, the second point’s probability will be 0 and not of the first data point.
- Parameters
NAME (str) – Name of the test
DESCRIPTION (str) – Description of the test
CATEGORY (str) – Category of the test
SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)
parameters (ParameterList) – Supported parameters (default: None)
- Supported parameters:
distance (int): Expected distance between the points in seconds
delta (int): Tolerance limit in seconds (default: 5)
- perform(data)
Performs the test and returns the probabilities.
- Raises
InvalidType – If structure is not supported
- Parameters
series (pd.Series or Series) – Series
- Returns
Probabilities
- Return type
pd.Series
- static supported_parameters()
Returns the supported parameters.
- Returns
Supported parameters
- Return type
ParameterList
Example
# Generate sample data
import datetime
import numpy as np
import pandas as pd
seconds = np.array([0, 60, 123, 183, 249, 309])
date = datetime.datetime(year=2021, month=1, day=1)
index = date + pd.to_timedelta(seconds, unit="s")
series = pd.Series(np.random.rand(6) * 100, index=index)
# Perform test
from autom8qc.qaqc.general import TimeGapsTest
test = TimeGapsTest(distance=60, delta=5)
test.plot(series=series, series_name="Example")
Visualization
TimeRangeTest
Class
- class autom8qc.qaqc.general.TimeRangeTest(start, end)
Bases:
autom8qc.qaqc.base.QAQCTest
This class implements a test that checks if the timestamps are inside the defined time range. If a data point is not in the time range, the point is invalid; therwise, the data point is valid.
- Parameters
NAME (str) – Name of the test
DESCRIPTION (str) – Description of the test
CATEGORY (str) – Category of the test
SUPPORTED_STRUCTURES (tuple or BaseStructure) – Supported data structures (e.g., Series)
parameters (ParameterList) – Supported parameters (default: None)
- Supported parameters:
start (datetime): Start of the time range
end (datetime): End of the time range
- perform(data)
Performs the test and returns the probabilities.
- Raises
InvalidType – If structure of the given data is not supported
- Parameters
data (BaseStructure, pd.Series, pd.DataFrame) – Data points
- Returns
Probabilities (1=Valid, 0=Invalid)
- Return type
pd.Series
- static supported_parameters()
Returns the supported parameters.
- Returns
Supported parameters
- Return type
ParameterList
Example
# Generate sample data
from datetime import datetime
import numpy as np
import pandas as pd
index = pd.date_range(start="1/1/2021", end="1/10/2021")
series = pd.Series(np.random.rand(10) * 100, index=index)
# Perform test
from autom8qc.qaqc.general import TimeRangeTest
test = TimeRangeTest(start=datetime(2021, 1, 3), end=datetime(2021, 1, 8))
test.plot(series=series, series_name="Example")