# Great Expectations Task

Verified by Prefect

These tasks have been tested and verified by Prefect.

A collection of tasks for interacting with Great Expectations deployments and APIs.

Note that all tasks currently require being executed in an environment where the great expectations configuration directory can be found; learn more about how to initialize a great expectation deployment on their Getting Started docs.

# RunGreatExpectationsValidation

class

prefect.tasks.great_expectations.checkpoints.RunGreatExpectationsValidation

(checkpoint_name=None, ge_checkpoint=None, checkpoint_kwargs=None, context=None, assets_to_validate=None, batch_kwargs=None, expectation_suite_name=None, context_root_dir=None, runtime_environment=None, run_name=None, run_info_at_end=True, disable_markdown_artifact=False, validation_operator="action_list_operator", evaluation_parameters=None, **kwargs)[source]

Task for running data validation with Great Expectations. Works with both the Great Expectations v2 (batch_kwargs) and v3 (Batch Request) APIs.

Example using the GE getting started tutorial: https://github.com/superconductive/ge_tutorials/tree/main/getting_started_tutorial_final_v3_api

The task can be used to run validation in one of the following ways:

checkpoint_name: the name of a pre-configured checkpoint (which bundles expectation suites and batch_kwargs). This is the preferred option. 2. expectation_suite AND batch_kwargs, where batch_kwargs is a dict. This will only work with the Great Expectations v2 API. 3. assets_to_validate: a list of dicts of expectation_suite + batch_kwargs. This will only work with the Great Expectations v2 API.

To create a checkpoint you can use: - for the v2 API: great_expectations checkpoint new <expectations_suite_name> <checkpoint_name> - for the v3 API: great_expectations --v3-api checkpoint new <checkpoint_name>

Here is an example that can be used with both v2 and v3 API provided that the checkpoint has been already created, as described above:

from prefect import Flow, Parameter
from prefect.tasks.great_expectations import RunGreatExpectationsValidation

validation_task = RunGreatExpectationsValidation()

with Flow("ge_test") as flow:
    checkpoint_name = Parameter("checkpoint_name")
    prev_run_row_count = 100  # can be taken eg. from Prefect KV store
    validation_task(
        checkpoint_name=checkpoint_name,
        evaluation_parameters=dict(prev_run_row_count=prev_run_row_count),
    )

flow.run(parameters={"checkpoint_name": "my_checkpoint"})

Args:

checkpoint_name (str, optional): the name of a pre-configured checkpoint; should match the filename of the checkpoint without the extension. Either checkpoint_name or checkpoint is required when using the Great Expectations v3 API.
ge_checkpoint (Checkpoint, optional): an in-memory GE Checkpoint object used to perform validation. If not provided then checkpoint_name will be used to load the specified checkpoint. Either checkpoint_name or checkpoint is required when using the Great Expectations v3 API.
checkpoint_kwargs (Dict, optional): A dictionary whose keys match the parameters of CheckpointConfig which can be used to update and populate the task's Checkpoint at runtime. Only used in the Great Expectations v3 API.
context (DataContext, optional): an in-memory GE DataContext object. e.g. ge.data_context.DataContext() If not provided then context_root_dir will be used to look for one.
assets_to_validate (list, optional): A list of assets to validate when running the validation operator. Only used in the Great Expectations v2 API
batch_kwargs (dict, optional): a dictionary of batch kwargs to be used when validating assets. Only used in the Great Expectations v2 API
expectation_suite_name (str, optional): the name of an expectation suite to be used when validating assets. Only used in the Great Expectations v2 API
context_root_dir (str, optional): the absolute or relative path to the directory holding your great_expectations.yml
runtime_environment (dict, optional): a dictionary of great expectation config key-value pairs to overwrite your config in great_expectations.yml
run_name (str, optional): the name of this Great Expectation validation run; defaults to the task slug
run_info_at_end (bool, optional): add run info to the end of the artifact generated by this task. Defaults to True.
disable_markdown_artifact (bool, optional): toggle the posting of a markdown artifact from this tasks. Defaults to False.
validation_operator (str, optional): configure the actions to be executed after running validation. Defaults to action_list_operator
evaluation_parameters (Optional[dict], optional): the evaluation parameters to use when running validation. For more information, see example and docs.
**kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:

methods:
prefect.tasks.great_expectations.checkpoints.RunGreatExpectationsValidation.run (checkpoint_name=None, ge_checkpoint=None, checkpoint_kwargs=None, context=None, assets_to_validate=None, batch_kwargs=None, expectation_suite_name=None, context_root_dir=None, runtime_environment=None, run_name=None, run_info_at_end=True, disable_markdown_artifact=False, validation_operator="action_list_operator", evaluation_parameters=None)[source]
Task run method. Args: `checkpoint_name (str, optional)`: the name of a pre-configured checkpoint; should match the filename of the checkpoint without the extension. Either checkpoint_name or checkpoint_config is required when using the Great Expectations v3 API. `ge_checkpoint (Checkpoint, optional)`: an in-memory GE `Checkpoint` object used to perform validation. If not provided then `checkpoint_name` will be used to load the specified checkpoint. `checkpoint_kwargs (Dict, optional)`: A dictionary whose keys match the parameters of `CheckpointConfig` which can be used to update and populate the task's Checkpoint at runtime. `context (DataContext, optional)`: an in-memory GE `DataContext` object. e.g. `ge.data_context.DataContext()` If not provided then `context_root_dir` will be used to look for one. `assets_to_validate (list, optional)`: A list of assets to validate when running the validation operator. Only used in the Great Expectations v2 API `batch_kwargs (dict, optional)`: a dictionary of batch kwargs to be used when validating assets. Only used in the Great Expectations v2 API `expectation_suite_name (str, optional)`: the name of an expectation suite to be used when validating assets. Only used in the Great Expectations v2 API `context_root_dir (str, optional)`: the absolute or relative path to the directory holding your `great_expectations.yml` `runtime_environment (dict, optional)`: a dictionary of great expectation config key-value pairs to overwrite your config in `great_expectations.yml` `run_name (str, optional)`: the name of this Great Expectation validation run; defaults to the task slug `run_info_at_end (bool, optional)`: add run info to the end of the artifact generated by this task. Defaults to `True`. `disable_markdown_artifact (bool, optional)`: toggle the posting of a markdown artifact from this tasks. Defaults to `False`. `evaluation_parameters (Optional[dict], optional)`: the evaluation parameters to use when running validation. For more information, see example and docs. `validation_operator (str, optional)`: configure the actions to be executed after running validation. Defaults to `action_list_operator`. Raises: 'signals.FAIL' if the validation was not a success Returns: ('great_expectations.checkpoint.checkpoint.CheckpointResult'): The Great Expectations metadata returned from running the provided checkpoint if a checkpoint name is provided.

prefect.tasks.great_expectations.checkpoints.RunGreatExpectationsValidation.run

Task run method.

Args:

checkpoint_name (str, optional): the name of a pre-configured checkpoint; should match the filename of the checkpoint without the extension. Either checkpoint_name or checkpoint_config is required when using the Great Expectations v3 API.
ge_checkpoint (Checkpoint, optional): an in-memory GE Checkpoint object used to perform validation. If not provided then checkpoint_name will be used to load the specified checkpoint.
checkpoint_kwargs (Dict, optional): A dictionary whose keys match the parameters of CheckpointConfig which can be used to update and populate the task's Checkpoint at runtime.
context (DataContext, optional): an in-memory GE DataContext object. e.g. ge.data_context.DataContext() If not provided then context_root_dir will be used to look for one.
assets_to_validate (list, optional): A list of assets to validate when running the validation operator. Only used in the Great Expectations v2 API
batch_kwargs (dict, optional): a dictionary of batch kwargs to be used when validating assets. Only used in the Great Expectations v2 API
expectation_suite_name (str, optional): the name of an expectation suite to be used when validating assets. Only used in the Great Expectations v2 API
context_root_dir (str, optional): the absolute or relative path to the directory holding your great_expectations.yml
runtime_environment (dict, optional): a dictionary of great expectation config key-value pairs to overwrite your config in great_expectations.yml
run_name (str, optional): the name of this Great Expectation validation run; defaults to the task slug
run_info_at_end (bool, optional): add run info to the end of the artifact generated by this task. Defaults to True.
disable_markdown_artifact (bool, optional): toggle the posting of a markdown artifact from this tasks. Defaults to False.
evaluation_parameters (Optional[dict], optional): the evaluation parameters to use when running validation. For more information, see example and docs.
validation_operator (str, optional): configure the actions to be executed after running validation. Defaults to action_list_operator.

Raises:

'signals.FAIL' if the validation was not a success

Returns:

('great_expectations.checkpoint.checkpoint.CheckpointResult'): The Great Expectations metadata returned from running the provided checkpoint if a checkpoint name is provided.

This documentation was auto-generated from commit bd9182e
on July 31, 2024 at 18:02 UTC

← GitHub Tasks Google Sheets Tasks →