# Google Cloud Tasks

Verified by Prefect

These tasks have been tested and verified by Prefect.

Tasks that interface with various components of Google Cloud Platform.

Note that these tasks allow for a wide range of custom usage patterns, such as:

Initialize a task with all settings for one time use
Initialize a "template" task with default settings and override as needed
Create a custom Task that inherits from a Prefect Task and utilizes the Prefect boilerplate

All GCP related tasks can be authenticated using the GCP_CREDENTIALS Prefect Secret. See Third Party Authentication for more information.

# GCPSecret

class

prefect.tasks.gcp.secretmanager.GCPSecret

(project_id=None, secret_id=None, version_id="latest", credentials=None, **kwargs)[source]

Task for retrieving a secret from GCP Secrets Manager and returning it as a dictionary. Note that all initialization arguments can optionally be provided or overwritten at runtime.

For authentication, there are three options: you can set the GCP_CREDENTIALS Prefect Secret containing your GCP access keys, or [explicitly provide a credentials dictionary], or otherwise it will use [default Google client logic].

[explicitly provide a credentials dictionary] https://googleapis.dev/python/google-api-core/latest/auth.html#explicit-credentials

[default Google client logic] https://googleapis.dev/python/google-api-core/latest/auth.html

Args:

project_id (Union[str, int], optional): the name of the project where the Secret is saved
secret_id (str, optional): the name of the secret to retrieve
version_id (Union[str, int], optional): the version number of the secret to use; defaults to 'latest'
credentials (dict, optional): dictionary containing GCP credentials
**kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.secretmanager.GCPSecret.run (project_id=None, secret_id=None, version_id="latest", credentials=None)[source]
Task run method. Args: `project_id (Union[str, int], optional)`: the name of the project where the Secret is saved `secret_id (str, optional)`: the name of the secret to retrieve `version_id (Union[str, int], optional)`: the version number of the secret to use; defaults to "latest" `credentials (dict, optional)`: your GCP credentials passed from an upstream Secret task. If not provided here default Google client logic will be used. Returns: `str`: the contents of this secret

prefect.tasks.gcp.secretmanager.GCPSecret.run

(project_id=None, secret_id=None, version_id="latest", credentials=None)[source]

Task run method.

Args:

project_id (Union[str, int], optional): the name of the project where the Secret is saved
secret_id (str, optional): the name of the secret to retrieve
version_id (Union[str, int], optional): the version number of the secret to use; defaults to "latest"
credentials (dict, optional): your GCP credentials passed from an upstream Secret task. If not provided here default Google client logic will be used.

Returns:

str: the contents of this secret

# GCSDownload

class

prefect.tasks.gcp.storage.GCSDownload

(bucket, blob=None, project=None, chunk_size=None, request_timeout=60, **kwargs)[source]

Task template for downloading data from Google Cloud Storage as a string.

Args:

bucket (str): default bucket name to download from
blob (str, optional): default blob name to download.
project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
**kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurrence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:

methods:
prefect.tasks.gcp.storage.GCSDownload.run (bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, request_timeout=60)[source]
Run method for this Task. Invoked by calling this Task after initialization within a Flow context. Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments. Args: `bucket (str, optional)`: the bucket name to upload to `blob (str, optional)`: blob name to download from `project (str, optional)`: Google Cloud project to work within. If not provided here or at initialization, will be inferred from your Google Cloud credentials `chunk_size (int, optional)`: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification. `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. `encryption_key (str, optional)`: an encryption key `request_timeout (Union[float, Tuple[float, float]], optional)`: the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout). Returns: `str`: the data from the blob, as a string Raises: `google.cloud.exception.NotFound`: if `create_bucket=False` and the bucket name is not found `ValueError`: if `blob` name hasn't been provided

prefect.tasks.gcp.storage.GCSDownload.run

(bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, request_timeout=60)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

bucket (str, optional): the bucket name to upload to
blob (str, optional): blob name to download from
project (str, optional): Google Cloud project to work within. If not provided here or at initialization, will be inferred from your Google Cloud credentials
chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
encryption_key (str, optional): an encryption key
request_timeout (Union[float, Tuple[float, float]], optional): the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).

Returns:

str: the data from the blob, as a string

Raises:

google.cloud.exception.NotFound: if create_bucket=False and the bucket name is not found
ValueError: if blob name hasn't been provided

# GCSUpload

class

prefect.tasks.gcp.storage.GCSUpload

(bucket, blob=None, project=None, chunk_size=104857600, create_bucket=False, request_timeout=60, **kwargs)[source]

Task template for uploading data to Google Cloud Storage. Data can be a string, bytes or io.BytesIO

Args:

bucket (str): default bucket name to upload to
blob (str, optional): default blob name to upload to; otherwise a random string beginning with prefect- and containing the Task Run ID will be used
project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
create_bucket (bool, optional): boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults to False.
request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
**kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.storage.GCSUpload.run (data, bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, create_bucket=False, content_type=None, content_encoding=None, request_timeout=60)[source]
Run method for this Task. Invoked by calling this Task after initialization within a Flow context. Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments. Args: `data (Union[str, bytes, BytesIO])`: the data to upload; can be either string, bytes or io.BytesIO `bucket (str, optional)`: the bucket name to upload to `blob (str, optional)`: blob name to upload to a string beginning with `prefect-` and containing the Task Run ID will be used `project (str, optional)`: Google Cloud project to work within. Can be inferred from credentials if not provided. `chunk_size (int, optional)`: The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification. `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. `encryption_key (str, optional)`: an encryption key `create_bucket (bool, optional)`: boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults to `False`.. `content_type (str, optional)`: HTTP 'Content-Type' header for this object. `content_encoding (str, optional)`: HTTP 'Content-Encoding' header for this object. `request_timeout (Union[float, Tuple[float, float]], optional)`: the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout). Raises: `TypeError`: if data is neither string nor bytes. `google.cloud.exception.NotFound`: if `create_bucket=False` and the bucket name is not found Returns: `str`: the blob name that now stores the provided data

prefect.tasks.gcp.storage.GCSUpload.run

(data, bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, create_bucket=False, content_type=None, content_encoding=None, request_timeout=60)[source]

data (Union[str, bytes, BytesIO]): the data to upload; can be either string, bytes or io.BytesIO
bucket (str, optional): the bucket name to upload to
blob (str, optional): blob name to upload to a string beginning with prefect- and containing the Task Run ID will be used
project (str, optional): Google Cloud project to work within. Can be inferred from credentials if not provided.
chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
encryption_key (str, optional): an encryption key
create_bucket (bool, optional): boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults to False..
content_type (str, optional): HTTP 'Content-Type' header for this object.
content_encoding (str, optional): HTTP 'Content-Encoding' header for this object.
request_timeout (Union[float, Tuple[float, float]], optional): the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).

Raises:

TypeError: if data is neither string nor bytes.
google.cloud.exception.NotFound: if create_bucket=False and the bucket name is not found

Returns:

str: the blob name that now stores the provided data

# GCSCopy

class

prefect.tasks.gcp.storage.GCSCopy

(source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, create_bucket=False, request_timeout=60, **kwargs)[source]

Task template for copying data from one Google Cloud Storage bucket to another, without downloading it locally.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

source_bucket (str, optional): default source bucket name.
source_blob (str, optional): default source blob name.
dest_bucket (str, optional): default destination bucket name.
dest_blob (str, optional): default destination blob name.
project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
create_bucket (bool, optional): boolean specifying whether to create the dest_bucket if it does not exist, otherwise an Exception is raised. Defaults to False.
**kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.storage.GCSCopy.run (source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, credentials=None, create_bucket=False, request_timeout=60)[source]
Run method for this Task. Invoked by calling this Task after initialization within a Flow context. Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments. Args: `source_bucket (str, optional)`: default source bucket name. `source_blob (str, optional)`: default source blob name. `dest_bucket (str, optional)`: default destination bucket name. `dest_blob (str, optional)`: default destination blob name. `project (str, optional)`: default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. `create_bucket (bool, optional)`: boolean specifying whether to create the dest_bucket if it does not exist, otherwise an Exception is raised. Defaults to `False`. `request_timeout (Union[float, Tuple[float, float]], optional)`: the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout). Returns: `str`: the name of the destination blob Raises: `ValueError`: if `source_bucket`, `source_blob`, `dest_bucket`, or `dest_blob` are missing or point at the same object.

prefect.tasks.gcp.storage.GCSCopy.run

(source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, credentials=None, create_bucket=False, request_timeout=60)[source]

source_bucket (str, optional): default source bucket name.
source_blob (str, optional): default source blob name.
dest_bucket (str, optional): default destination bucket name.
dest_blob (str, optional): default destination blob name.
project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
create_bucket (bool, optional): boolean specifying whether to create the dest_bucket if it does not exist, otherwise an Exception is raised. Defaults to False.
request_timeout (Union[float, Tuple[float, float]], optional): the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).

Returns:

str: the name of the destination blob

Raises:

ValueError: if source_bucket, source_blob, dest_bucket, or dest_blob are missing or point at the same object.

# GCSBlobExists

class

prefect.tasks.gcp.storage.GCSBlobExists

(bucket_name=None, blob=None, project=None, wait_seconds=0, fail_if_not_found=True, request_timeout=60, **kwargs)[source]

Task template for checking a Google Cloud Storage bucket for a given object

Args:

bucket_name (str, optional): the bucket to check
blob (str, optional): object for which to search within the bucket
project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
wait_seconds(int, optional): retry until file is found or until wait_seconds, whichever is first. Defaults to 0
request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
fail_if_not_found (bool, optional): Will raise Fail signal on task if blob is not found. Defaults to True
**kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.storage.GCSBlobExists.run (bucket_name=None, blob=None, project=None, wait_seconds=0, fail_if_not_found=True, credentials=None)[source]
Run method for this Task. Invoked by calling this Task after initialization within a Flow context. Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments. Args: `bucket_name (str, optional)`: the bucket to check `blob (str, optional)`: object for which to search within the bucket `project (str, optional)`: default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials `wait_seconds(int, optional)`: retry until file is found or until wait_seconds, whichever is first. Defaults to 0 `fail_if_not_found (bool, optional)`: Will raise Fail signal on task if blob is not found. Defaults to True `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. Returns: `bool`: the object exists Raises: `ValueError`: if `bucket_name` or `blob` are missing `FAIL`: if object not found and fail_if_not_found is True

prefect.tasks.gcp.storage.GCSBlobExists.run

(bucket_name=None, blob=None, project=None, wait_seconds=0, fail_if_not_found=True, credentials=None)[source]

bucket_name (str, optional): the bucket to check
blob (str, optional): object for which to search within the bucket
project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
wait_seconds(int, optional): retry until file is found or until wait_seconds, whichever is first. Defaults to 0
fail_if_not_found (bool, optional): Will raise Fail signal on task if blob is not found. Defaults to True
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.

Returns:

bool: the object exists

Raises:

ValueError: if bucket_name or blob are missing
FAIL: if object not found and fail_if_not_found is True

# BigQueryTask

class

prefect.tasks.gcp.bigquery.BigQueryTask

(query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None, **kwargs)[source]

Task for executing queries against a Google BigQuery table and (optionally) returning the results. Note that all initialization settings can be provided / overwritten at runtime.

Args:

query (str, optional): a string of the query to execute
query_params (list[tuple], optional): a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
location (str, optional): location of the dataset that will be queried; defaults to "US"
dry_run_max_bytes (int, optional): if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising a ValueError if the maximum is exceeded
dataset_dest (str, optional): the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided, table_dest must also be provided
table_dest (str, optional): the optional name of a destination table to write the query results to, if you don't want them returned; if provided, dataset_dest must also be provided
to_dataframe (bool, optional): if provided, returns the results of the query as a pandas dataframe instead of a list of bigquery.table.Row objects. Defaults to False
job_config (dict, optional): an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected)
**kwargs (optional): additional kwargs to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.bigquery.BigQueryTask.run (query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, credentials=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None)[source]
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization. Args: `query (str, optional)`: a string of the query to execute `query_params (list[tuple], optional)`: a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted `project (str, optional)`: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials `location (str, optional)`: location of the dataset that will be queried; defaults to "US" `dry_run_max_bytes (int, optional)`: if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising a `ValueError` if the maximum is exceeded `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. `dataset_dest (str, optional)`: the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided, `table_dest` must also be provided `table_dest (str, optional)`: the optional name of a destination table to write the query results to, if you don't want them returned; if provided, `dataset_dest` must also be provided `to_dataframe (bool, optional)`: if provided, returns the results of the query as a pandas dataframe instead of a list of `bigquery.table.Row` objects. Defaults to False `job_config (dict, optional)`: an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected) Raises: `ValueError`: if the `query` is `None` `ValueError`: if only one of `dataset_dest` / `table_dest` is provided `ValueError`: if the query will execeed `dry_run_max_bytes` Returns: `list`: a fully populated list of Query results, with one item per row

prefect.tasks.gcp.bigquery.BigQueryTask.run

(query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, credentials=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

query (str, optional): a string of the query to execute
query_params (list[tuple], optional): a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
location (str, optional): location of the dataset that will be queried; defaults to "US"
dry_run_max_bytes (int, optional): if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising a ValueError if the maximum is exceeded
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
dataset_dest (str, optional): the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided, table_dest must also be provided
table_dest (str, optional): the optional name of a destination table to write the query results to, if you don't want them returned; if provided, dataset_dest must also be provided
to_dataframe (bool, optional): if provided, returns the results of the query as a pandas dataframe instead of a list of bigquery.table.Row objects. Defaults to False
job_config (dict, optional): an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected)

Raises:

ValueError: if the query is None
ValueError: if only one of dataset_dest / table_dest is provided
ValueError: if the query will execeed dry_run_max_bytes

Returns:

list: a fully populated list of Query results, with one item per row

# BigQueryStreamingInsert

class

prefect.tasks.gcp.bigquery.BigQueryStreamingInsert

(dataset_id=None, table=None, project=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via the streaming API. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

dataset_id (str, optional): the id of a destination dataset to write the records to
table (str, optional): the name of a destination table to write the records to
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
location (str, optional): location of the dataset that will be written to; defaults to "US"
**kwargs (optional): additional kwargs to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.bigquery.BigQueryStreamingInsert.run (records, dataset_id=None, table=None, project=None, location="US", credentials=None, **kwargs)[source]
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization. Args: `records (list[dict])`: the list of records to insert as rows into the BigQuery table; each item in the list should be a dictionary whose keys correspond to columns in the table `dataset_id (str, optional)`: the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization `table (str, optional)`: the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization `project (str, optional)`: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials `location (str, optional)`: location of the dataset that will be written to; defaults to "US" `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. `kwargs (optional)`: additional kwargs to pass to the `insert_rows_json` method; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html Raises: `ValueError`: if all required arguments haven't been provided `ValueError`: if any of the records result in errors Returns**: the response from `insert_rows_json`

prefect.tasks.gcp.bigquery.BigQueryStreamingInsert.run

(records, dataset_id=None, table=None, project=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

records (list[dict]): the list of records to insert as rows into the BigQuery table; each item in the list should be a dictionary whose keys correspond to columns in the table
dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
location (str, optional): location of the dataset that will be written to; defaults to "US"
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
**kwargs (optional): additional kwargs to pass to the insert_rows_json method; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html

Raises:

ValueError: if all required arguments haven't been provided
ValueError: if any of the records result in errors

Returns:

the response from insert_rows_json

# CreateBigQueryTable

class

prefect.tasks.gcp.bigquery.CreateBigQueryTable

(project=None, dataset=None, table=None, schema=None, clustering_fields=None, time_partitioning=None, **kwargs)[source]

Ensures a BigQuery table exists; creates it otherwise. Note that most initialization keywords can optionally be provided at runtime.

Args:

project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
dataset (str, optional): the name of a dataset in that the table will be created
table (str, optional): the name of a table to create
schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
clustering_fields (List[str], optional): a list of fields to cluster the table by
time_partitioning (bigquery.TimePartitioning, optional): a bigquery.TimePartitioning object specifying a partitioninig of the newly created table
**kwargs (optional): additional kwargs to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.bigquery.CreateBigQueryTable.run (project=None, credentials=None, dataset=None, table=None, schema=None)[source]
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization. Args: `project (str, optional)`: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. `dataset (str, optional)`: the name of a dataset in that the table will be created `table (str, optional)`: the name of a table to create `schema (List[bigquery.SchemaField], optional)`: the schema to use when creating the table Returns: None Raises: `SUCCESS`: a `SUCCESS` signal if the table already exists

prefect.tasks.gcp.bigquery.CreateBigQueryTable.run

(project=None, credentials=None, dataset=None, table=None, schema=None)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
dataset (str, optional): the name of a dataset in that the table will be created
table (str, optional): the name of a table to create
schema (List[bigquery.SchemaField], optional): the schema to use when creating the table

Returns:

None

Raises:

SUCCESS: a SUCCESS signal if the table already exists

# BigQueryLoadGoogleCloudStorage

class

prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage

(uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

uri (str, optional): GCS path to load data from
dataset_id (str, optional): the id of a destination dataset to write the records to
table (str, optional): the name of a destination table to write the records to
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
location (str, optional): location of the dataset that will be queried; defaults to "US"
**kwargs (optional): additional kwargs to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage.run (uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization. Args: `uri (str, optional)`: GCS path to load data from `dataset_id (str, optional)`: the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization `table (str, optional)`: the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization `project (str, optional)`: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials `schema (List[bigquery.SchemaField], optional)`: the schema to use when creating the table `location (str, optional)`: location of the dataset that will be written to; defaults to "US" `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check `context` for `GCP_CREDENTIALS` and lastly will use default Google client logic. `kwargs (optional)`: additional kwargs to pass to the `bigquery.LoadJobConfig`; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html Raises: `ValueError`: if all required arguments haven't been provided `ValueError`: if the load job results in an error Returns**: `google.cloud.bigquery.job.LoadJob`: the response from `load_table_from_uri`

prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage.run

(uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

uri (str, optional): GCS path to load data from
dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
location (str, optional): location of the dataset that will be written to; defaults to "US"
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
**kwargs (optional): additional kwargs to pass to the bigquery.LoadJobConfig; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html

Raises:

ValueError: if all required arguments haven't been provided
ValueError: if the load job results in an error

Returns:

google.cloud.bigquery.job.LoadJob: the response from load_table_from_uri

# BigQueryLoadFile

class

prefect.tasks.gcp.bigquery.BigQueryLoadFile

(file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

file (Union[str, path-like object], optional): A string or path-like object of the file to be loaded
rewind (bool, optional): if True, seek to the beginning of the file handle before reading the file
size (int, optional): the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.
num_retries (int, optional): the number of max retries for loading the bigquery table from file. Defaults to 6
dataset_id (str, optional): the id of a destination dataset to write the records to
table (str, optional): the name of a destination table to write the records to
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
location (str, optional): location of the dataset that will be queried; defaults to "US"
**kwargs (optional): additional kwargs to pass to the Task constructor

methods:

methods:
prefect.tasks.gcp.bigquery.BigQueryLoadFile.run (file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]
Run method for this Task. Invoked by calling this Task within a Flow context, after initialization. Args: `file (Union[str, path-liike object], optional)`: A string or path-like object of the file to be loaded `rewind (bool, optional)`: if True, seek to the beginning of the file handle before reading the file `size (int, optional)`: the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used. `num_retries (int, optional)`: the number of max retries for loading the bigquery table from file. Defaults to `6` `dataset_id (str, optional)`: the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization `table (str, optional)`: the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization `project (str, optional)`: the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials `schema (List[bigquery.SchemaField], optional)`: the schema to use when creating the table `location (str, optional)`: location of the dataset that will be written to; defaults to "US" `credentials (dict, optional)`: a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. `kwargs (optional)`: additional kwargs to pass to the `bigquery.LoadJobConfig`; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html Raises: `ValueError`: if all required arguments haven't been provided or file does not exist `IOError`: if file can't be opened and read `ValueError`: if the load job results in an error Returns**: `google.cloud.bigquery.job.LoadJob`: the response from `load_table_from_file`

prefect.tasks.gcp.bigquery.BigQueryLoadFile.run

(file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

file (Union[str, path-liike object], optional): A string or path-like object of the file to be loaded
rewind (bool, optional): if True, seek to the beginning of the file handle before reading the file
size (int, optional): the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.
num_retries (int, optional): the number of max retries for loading the bigquery table from file. Defaults to 6
dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
location (str, optional): location of the dataset that will be written to; defaults to "US"
credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task.
**kwargs (optional): additional kwargs to pass to the bigquery.LoadJobConfig; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html

Raises:

ValueError: if all required arguments haven't been provided or file does not exist
IOError: if file can't be opened and read
ValueError: if the load job results in an error

Returns:

google.cloud.bigquery.job.LoadJob: the response from load_table_from_file

This documentation was auto-generated from commit ffa9a6c
on February 1, 2023 at 18:44 UTC

← Function Tasks GitHub Tasks →