# Google Cloud Tasks
Verified by Prefect
Tasks that interface with various components of Google Cloud Platform.
Note that these tasks allow for a wide range of custom usage patterns, such as:
- Initialize a task with all settings for one time use
- Initialize a "template" task with default settings and override as needed
- Create a custom Task that inherits from a Prefect Task and utilizes the Prefect boilerplate
All GCP related tasks can be authenticated using the GCP_CREDENTIALS
Prefect Secret. See Third Party Authentication for more information.
# GCPSecret
class
prefect.tasks.gcp.secretmanager.GCPSecret
(project_id=None, secret_id=None, version_id="latest", credentials=None, **kwargs)[source]Task for retrieving a secret from GCP Secrets Manager and returning it as a dictionary. Note that all initialization arguments can optionally be provided or overwritten at runtime.
For authentication, there are three options: you can set the GCP_CREDENTIALS
Prefect Secret containing your GCP access keys, or [explicitly provide a credentials dictionary], or otherwise it will use [default Google client logic].
[explicitly provide a credentials dictionary] https://googleapis.dev/python/google-api-core/latest/auth.html#explicit-credentials
[default Google client logic] https://googleapis.dev/python/google-api-core/latest/auth.html
Args:
project_id (Union[str, int], optional)
: the name of the project where the Secret is savedsecret_id (str, optional)
: the name of the secret to retrieveversion_id (Union[str, int], optional)
: the version number of the secret to use; defaults to 'latest'credentials (dict, optional)
: dictionary containing GCP credentials**kwargs (dict, optional)
: additional keyword arguments to pass to the Task constructor
methods: |
---|
prefect.tasks.gcp.secretmanager.GCPSecret.run (project_id=None, secret_id=None, version_id="latest", credentials=None)[source] |
Task run method.
|
# GCSDownload
class
prefect.tasks.gcp.storage.GCSDownload
(bucket, blob=None, project=None, chunk_size=None, request_timeout=60, **kwargs)[source]Task template for downloading data from Google Cloud Storage as a string.
Args:
bucket (str)
: default bucket name to download fromblob (str, optional)
: default blob name to download.project (str, optional)
: default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentialschunk_size (int, optional)
: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.request_timeout (Union[float, Tuple[float, float]], optional)
: default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).**kwargs (dict, optional)
: additional keyword arguments to pass to the Task constructor
methods: |
---|
prefect.tasks.gcp.storage.GCSDownload.run (bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, request_timeout=60)[source] |
Run method for this Task. Invoked by calling this Task after initialization within a Flow context.
|
# GCSUpload
class
prefect.tasks.gcp.storage.GCSUpload
(bucket, blob=None, project=None, chunk_size=104857600, create_bucket=False, request_timeout=60, **kwargs)[source]Task template for uploading data to Google Cloud Storage. Data can be a string, bytes or io.BytesIO
Args:
bucket (str)
: default bucket name to upload toblob (str, optional)
: default blob name to upload to; otherwise a random string beginning withprefect-
and containing the Task Run ID will be usedproject (str, optional)
: default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentialschunk_size (int, optional)
: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.create_bucket (bool, optional)
: boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults toFalse
.request_timeout (Union[float, Tuple[float, float]], optional)
: default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).**kwargs (dict, optional)
: additional keyword arguments to pass to the Task constructor
methods: |
---|
prefect.tasks.gcp.storage.GCSUpload.run (data, bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, create_bucket=False, content_type=None, content_encoding=None, request_timeout=60)[source] |
Run method for this Task. Invoked by calling this Task after initialization within a Flow context.
|
# GCSCopy
class
prefect.tasks.gcp.storage.GCSCopy
(source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, create_bucket=False, request_timeout=60, **kwargs)[source]Task template for copying data from one Google Cloud Storage bucket to another, without downloading it locally.
Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.
Args:
source_bucket (str, optional)
: default source bucket name.source_blob (str, optional)
: default source blob name.dest_bucket (str, optional)
: default destination bucket name.dest_blob (str, optional)
: default destination blob name.project (str, optional)
: default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentialsrequest_timeout (Union[float, Tuple[float, float]], optional)
: default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).create_bucket (bool, optional)
: boolean specifying whether to create the dest_bucket if it does not exist, otherwise an Exception is raised. Defaults toFalse
.**kwargs (dict, optional)
: additional keyword arguments to pass to the Task constructor
methods: |
---|
prefect.tasks.gcp.storage.GCSCopy.run (source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, credentials=None, create_bucket=False, request_timeout=60)[source] |
Run method for this Task. Invoked by calling this Task after initialization within a Flow context.
|
# GCSBlobExists
class
prefect.tasks.gcp.storage.GCSBlobExists
(bucket_name=None, blob=None, project=None, wait_seconds=0, fail_if_not_found=True, request_timeout=60, **kwargs)[source]Task template for checking a Google Cloud Storage bucket for a given object
Args:
bucket_name (str, optional)
: the bucket to checkblob (str, optional)
: object for which to search within the bucketproject (str, optional)
: default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentialswait_seconds(int, optional)
: retry until file is found or until wait_seconds, whichever is first. Defaults to 0request_timeout (Union[float, Tuple[float, float]], optional)
: default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).fail_if_not_found (bool, optional)
: Will raise Fail signal on task if blob is not found. Defaults to True**kwargs (dict, optional)
: additional keyword arguments to pass to the Task constructor
methods: |
---|
prefect.tasks.gcp.storage.GCSBlobExists.run (bucket_name=None, blob=None, project=None, wait_seconds=0, fail_if_not_found=True, credentials=None)[source] |
Run method for this Task. Invoked by calling this Task after initialization within a Flow context.
|
# BigQueryTask
class
prefect.tasks.gcp.bigquery.BigQueryTask
(query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None, **kwargs)[source]Task for executing queries against a Google BigQuery table and (optionally) returning the results. Note that all initialization settings can be provided / overwritten at runtime.
Args:
query (str, optional)
: a string of the query to executequery_params (list[tuple], optional)
: a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formattedproject (str, optional)
: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentialslocation (str, optional)
: location of the dataset that will be queried; defaults to "US"dry_run_max_bytes (int, optional)
: if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising aValueError
if the maximum is exceededdataset_dest (str, optional)
: the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided,table_dest
must also be providedtable_dest (str, optional)
: the optional name of a destination table to write the query results to, if you don't want them returned; if provided,dataset_dest
must also be providedto_dataframe (bool, optional)
: if provided, returns the results of the query as a pandas dataframe instead of a list ofbigquery.table.Row
objects. Defaults to Falsejob_config (dict, optional)
: an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected)**kwargs (optional)
: additional kwargs to pass to theTask
constructor
methods: |
---|
prefect.tasks.gcp.bigquery.BigQueryTask.run (query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, credentials=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None)[source] |
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.
|
# BigQueryStreamingInsert
class
prefect.tasks.gcp.bigquery.BigQueryStreamingInsert
(dataset_id=None, table=None, project=None, location="US", **kwargs)[source]Task for insert records in a Google BigQuery table via the streaming API. Note that all of these settings can optionally be provided or overwritten at runtime.
Args:
dataset_id (str, optional)
: the id of a destination dataset to write the records totable (str, optional)
: the name of a destination table to write the records toproject (str, optional)
: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentialslocation (str, optional)
: location of the dataset that will be written to; defaults to "US"**kwargs (optional)
: additional kwargs to pass to theTask
constructor
methods: |
---|
prefect.tasks.gcp.bigquery.BigQueryStreamingInsert.run (records, dataset_id=None, table=None, project=None, location="US", credentials=None, **kwargs)[source] |
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.
|
# CreateBigQueryTable
class
prefect.tasks.gcp.bigquery.CreateBigQueryTable
(project=None, dataset=None, table=None, schema=None, clustering_fields=None, time_partitioning=None, **kwargs)[source]Ensures a BigQuery table exists; creates it otherwise. Note that most initialization keywords can optionally be provided at runtime.
Args:
project (str, optional)
: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentialsdataset (str, optional)
: the name of a dataset in that the table will be createdtable (str, optional)
: the name of a table to createschema (List[bigquery.SchemaField], optional)
: the schema to use when creating the tableclustering_fields (List[str], optional)
: a list of fields to cluster the table bytime_partitioning (bigquery.TimePartitioning, optional)
: abigquery.TimePartitioning
object specifying a partitioninig of the newly created table**kwargs (optional)
: additional kwargs to pass to theTask
constructor
methods: |
---|
prefect.tasks.gcp.bigquery.CreateBigQueryTable.run (project=None, credentials=None, dataset=None, table=None, schema=None)[source] |
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.
|
# BigQueryLoadGoogleCloudStorage
class
prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage
(uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.
Args:
uri (str, optional)
: GCS path to load data fromdataset_id (str, optional)
: the id of a destination dataset to write the records totable (str, optional)
: the name of a destination table to write the records toproject (str, optional)
: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentialsschema (List[bigquery.SchemaField], optional)
: the schema to use when creating the tablelocation (str, optional)
: location of the dataset that will be queried; defaults to "US"**kwargs (optional)
: additional kwargs to pass to theTask
constructor
methods: |
---|
prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage.run (uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source] |
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.
|
# BigQueryLoadFile
class
prefect.tasks.gcp.bigquery.BigQueryLoadFile
(file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.
Args:
file (Union[str, path-like object], optional)
: A string or path-like object of the file to be loadedrewind (bool, optional)
: if True, seek to the beginning of the file handle before reading the filesize (int, optional)
: the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.num_retries (int, optional)
: the number of max retries for loading the bigquery table from file. Defaults to6
dataset_id (str, optional)
: the id of a destination dataset to write the records totable (str, optional)
: the name of a destination table to write the records toproject (str, optional)
: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentialsschema (List[bigquery.SchemaField], optional)
: the schema to use when creating the tablelocation (str, optional)
: location of the dataset that will be queried; defaults to "US"**kwargs (optional)
: additional kwargs to pass to theTask
constructor
methods: |
---|
prefect.tasks.gcp.bigquery.BigQueryLoadFile.run (file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source] |
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.
|
This documentation was auto-generated from commit bd9182e
on July 31, 2024 at 18:02 UTC