# SodaSpark Tasks
This module contains a collection of tasks to run Data Quality tests using soda-spark library
# SodaSparkScan
class
prefect.tasks.sodaspark.sodaspark_tasks.SodaSparkScan
(scan_def=None, df=None, **kwargs)[source]Task for running a SodaSpark scan given a scan definition and a Spark Dataframe. For information about SodaSpark please refer to https://docs.soda.io/soda-spark/install-and-use.html. SodaSpark uses PySpark under the hood, hence you need Java to be installed on the machine where you run this task.
Args:
scan_def (str, optional)
: scan definition. Can be either a path to a YAML file containing the scan definition. Please refer to https://docs.soda.io/soda-sql/scan-yaml.html for more information. or the scan definition given as a valid YAML stringdf (pyspark.sql.DataFrame, optional)
: Spark DataFrame. DataFrame where to run tests defined in the scan definition.**kwargs (dict, optional)
: additional keyword arguments to pass to the Task constructor
methods: |
---|
prefect.tasks.sodaspark.sodaspark_tasks.SodaSparkScan.run (scan_def=None, df=None)[source] |
Task run method. Execute a scan against a Spark DataFrame.
|
This documentation was auto-generated from commit bd9182e
on July 31, 2024 at 18:02 UTC