Implement a new Rule

Purpose of this Chapter

The aim of this chapter to explain how to implement a new rule. This will be presented by implementing a simple example. Let’s assume that we want to implement a rule that checks if too many data points failed the test. Therefore, the user can define a limit for the probabilities and a relative frequency. A data point failed the test if the probability is lower equals than the defined limit. If the percentage of failed data points is greater than the defined relative frequency, all probabilities will be set to 0.

Define the Metadata of the Rule

Note

Each rule has to inherit from the abstract base class BaseRule and has to implement the abstract method apply. Moreover, each rule has to provide its metadata. The following metadata has to be provided by each rule: NAME and DESCRIPTION. See also: autom8qc.rules.base.BaseRule

Warning

Make sure that the class name ends with the suffix Rule. Other modules will check for the suffix to identify that the class is a rule.

from autom8qc.core import exceptions
from autom8qc.core.parameters import Parameter
from autom8qc.core.parameters import ParameterList
from autom8qc.rules.base import BaseRule

class LowerFrequencyRule(BaseMeasure):

    NAME = "Probability-Frequency-Rule"
    DESCRIPTION = (
        "Checks the frequency of invalid data points. If too many data points "
        "are invalid, all probabilities will be set to 0. A data point is "
        "invalid if the probability is lower equals than the defined threshold"
    )

Define the Supported Parameters

Each rule has to provide the supported parameters. Therefore, you have to implement the static method supported_parameters. The method allows you to access them without creating an instance of the class. If your rule doesn’t need additional parameters, you don’t have to implement it. For our example, we have the parameters threshold and rel_frequency.

@staticmethod
def supported_parameters():
    return ParameterList(
        Parameter(
            name="threshold",
            description="Threshold value",
            dtype=float,
            optional=False,
        ),
        Parameter(
            name="rel_frequency",
            description="Relative frequency",
            dtype=float,
            optional=False,
            default=0.1
        ),
    )

Implement the Constructor

The constructor is a method that is called when an object is created. In our case, we have to pass the parameters threshold and rel_frequency to the constructor and store the values in the related Parameters, which we defined in the method supported_parameters. Note that you don’t have to check the type of the parameters since a Parameter checks the type when you set the value. If you want to implement additional checks, you have to implement them in the constructor and raise an exception if a constraint is not satisfied. Finally, you have to call the super constructor and the super method check_metadata that checks if the metadata is valid.

Warning

Make sure that you call the super constructor.

def __init__(self, threshold, rel_frequency=0.1):
    if threshold < 0 or threshold >= 1:
        raise exceptions.InvalidValue("Threshold must be between 0 and 1!")
    if rel_frequency < 0 or rel_frequency > 1:
        raise exceptions.InvalidValue(
            "Relative frequency must be between 0 and 1!"
        )
    super().__init__()
    self.parameters["threshold"] = threshold
    self.parameters["rel_frequency"] = rel_frequency
    self.check_metadata()

Implement the Abstract Method

Finally, we have to implement the abstract method apply. In our case, we expect that our rule only supports series. If an invalid type will be passed, our method should raises an error. Moreover, we expect that the series only contains probabilities (between 0 and 1).

Note

Make sure that you don’t change the given data, if the condition of your rule is not satisfied. For example, if only 5 % of the data points failed, we will return the unchanged probabilities.

def apply(self, probabilities):
    if not isinstance(probabilities, Series.dtype):
        raise exceptions.InvalidType("Type is not supported by rule!")
    if max(probabilities) > 1 or min(probabilities) < 0:
        raise exceptions.InvalidValue("Invalid probabilities!")
    threshold = self.parameters["threshold"].value
    rel_frequency = self.parameters["rel_frequency"].value
    total = probabilities.shape[0]
    invalid = (probabilities <= threshold).sum()
    if rel_frequency < (invalid / total):
        probabilities = pd.Series(
            0, index=probabilities.index, dtype=probabilities.dtype
        )
    return probabilities