******************** Implement a new Rule ******************** Purpose of this Chapter ======================= The aim of this chapter to explain how to implement a new rule. This will be presented by implementing a simple example. Let's assume that we want to implement a rule that checks if too many data points failed the test. Therefore, the user can define a limit for the probabilities and a relative frequency. A data point failed the test if the probability is lower equals than the defined limit. If the percentage of failed data points is greater than the defined relative frequency, all probabilities will be set to 0. Define the Metadata of the Rule =============================== .. note:: Each rule has to inherit from the abstract base class **BaseRule** and has to implement the abstract method **apply**. Moreover, each rule has to provide its metadata. The following metadata has to be provided by each rule: *NAME* and *DESCRIPTION*. See also: :class:`autom8qc.rules.base.BaseRule` .. warning:: Make sure that the class name ends with the suffix **Rule**. Other modules will check for the suffix to identify that the class is a rule. .. code-block:: python from autom8qc.core import exceptions from autom8qc.core.parameters import Parameter from autom8qc.core.parameters import ParameterList from autom8qc.rules.base import BaseRule class LowerFrequencyRule(BaseMeasure): NAME = "Probability-Frequency-Rule" DESCRIPTION = ( "Checks the frequency of invalid data points. If too many data points " "are invalid, all probabilities will be set to 0. A data point is " "invalid if the probability is lower equals than the defined threshold" ) Define the Supported Parameters =============================== Each rule has to provide the supported parameters. Therefore, you have to implement the static method **supported_parameters**. The method allows you to access them without creating an instance of the class. If your rule doesn't need additional parameters, you don't have to implement it. For our example, we have the parameters *threshold* and *rel_frequency*. .. code-block:: python @staticmethod def supported_parameters(): return ParameterList( Parameter( name="threshold", description="Threshold value", dtype=float, optional=False, ), Parameter( name="rel_frequency", description="Relative frequency", dtype=float, optional=False, default=0.1 ), ) Implement the Constructor ========================= The constructor is a method that is called when an object is created. In our case, we have to pass the parameters *threshold* and *rel_frequency* to the constructor and store the values in the related Parameters, which we defined in the method *supported_parameters*. Note that you don’t have to check the type of the parameters since a Parameter checks the type when you set the value. If you want to implement additional checks, you have to implement them in the constructor and raise an exception if a constraint is not satisfied. Finally, you have to call the super constructor and the super method *check_metadata* that checks if the metadata is valid. .. warning:: Make sure that you call the super constructor. .. code-block:: python :emphasize-lines: 8 def __init__(self, threshold, rel_frequency=0.1): if threshold < 0 or threshold >= 1: raise exceptions.InvalidValue("Threshold must be between 0 and 1!") if rel_frequency < 0 or rel_frequency > 1: raise exceptions.InvalidValue( "Relative frequency must be between 0 and 1!" ) super().__init__() self.parameters["threshold"] = threshold self.parameters["rel_frequency"] = rel_frequency self.check_metadata() Implement the Abstract Method ============================= Finally, we have to implement the abstract method **apply**. In our case, we expect that our rule only supports series. If an invalid type will be passed, our method should raises an error. Moreover, we expect that the series only contains probabilities *(between 0 and 1)*. .. note:: Make sure that you don't change the given data, if the condition of your rule is not satisfied. For example, if only 5 % of the data points failed, we will return the unchanged probabilities. .. code-block:: python def apply(self, probabilities): if not isinstance(probabilities, Series.dtype): raise exceptions.InvalidType("Type is not supported by rule!") if max(probabilities) > 1 or min(probabilities) < 0: raise exceptions.InvalidValue("Invalid probabilities!") threshold = self.parameters["threshold"].value rel_frequency = self.parameters["rel_frequency"].value total = probabilities.shape[0] invalid = (probabilities <= threshold).sum() if rel_frequency < (invalid / total): probabilities = pd.Series( 0, index=probabilities.index, dtype=probabilities.dtype ) return probabilities