# SodaSpark Tasks


This module contains a collection of tasks to run Data Quality tests using soda-spark library

# SodaSparkScan

class

prefect.tasks.sodaspark.sodaspark_tasks.SodaSparkScan

(scan_def=None, df=None, **kwargs)[source]

Task for running a SodaSpark scan given a scan definition and a Spark Dataframe. For information about SodaSpark please refer to https://docs.soda.io/soda-spark/install-and-use.html. SodaSpark uses PySpark under the hood, hence you need Java to be installed on the machine where you run this task.

Args:

  • scan_def (str, optional): scan definition. Can be either a path to a YAML file containing the scan definition. Please refer to https://docs.soda.io/soda-sql/scan-yaml.html for more information. or the scan definition given as a valid YAML string
  • df (pyspark.sql.DataFrame, optional): Spark DataFrame. DataFrame where to run tests defined in the scan definition.
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.sodaspark.sodaspark_tasks.SodaSparkScan.run

(scan_def=None, df=None)[source]

Task run method. Execute a scan against a Spark DataFrame.

Args:

  • scan_def (str, optional): scan definition. Can be either a path to a YAML file containing the scan definition. Please refer to https://docs.soda.io/soda-sql/scan-yaml.html for more information. or the scan definition given as a valid YAML string
  • df (pyspark.sql.DataFrame, optional): Spark DataFrame. DataFrame where to run tests defined in the scan definition.
Returns:



    This documentation was auto-generated from commit bd9182e
    on July 31, 2024 at 18:02 UTC