# Google Cloud Tasks


Tasks that interface with various components of Google Cloud Platform.

Note that these tasks allow for a wide range of custom usage patterns, such as:

  • Initialize a task with all settings for one time use
  • Initialize a "template" task with default settings and override as needed
  • Create a custom Task that inherits from a Prefect Task and utilizes the Prefect boilerplate

All GCP related tasks can be authenticated using the GCP_CREDENTIALS Prefect Secret. See Third Party Authentication for more information.

# GCPSecret

class

prefect.tasks.gcp.secretmanager.GCPSecret

(project_id=None, secret_id=None, version_id="latest", credentials=None, **kwargs)[source]

Task for retrieving a secret from GCP Secrets Manager and returning it as a dictionary. Note that all initialization arguments can optionally be provided or overwritten at runtime.

For authentication, there are three options: you can set the GCP_CREDENTIALS Prefect Secret containing your GCP access keys, or [explicitly provide a credentials dictionary] (https://googleapis.dev/python/google-api-core/latest/auth.html#explicit-credentials), or otherwise it will use [default Google client logic] (https://googleapis.dev/python/google-api-core/latest/auth.html).

Args:

  • project_id (Union[str, int], optional): the name of the project where the Secret is saved
  • secret_id (str, optional): the name of the secret to retrieve
  • version_id (Union[str, int], optional): the version number of the secret to use; defaults to 'latest'
  • credentials (dict, optional): dictionary containing GCP credentials
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.secretmanager.GCPSecret.run

(project_id=None, secret_id=None, version_id="latest", credentials=None)[source]

Task run method.

Args:

  • project_id (Union[str, int], optional): the name of the project where the Secret is saved
  • secret_id (str, optional): the name of the secret to retrieve
  • version_id (Union[str, int], optional): the version number of the secret to use; defaults to "latest"
  • credentials (dict, optional): your GCP credentials passed from an upstream Secret task. If not provided here [default Google client logic] (https://googleapis.dev/python/google-api-core/latest/auth.html) will be used.
Returns:
  • str: the contents of this secret



# GCSDownload

class

prefect.tasks.gcp.storage.GCSDownload

(bucket, blob=None, project=None, chunk_size=None, encryption_key_secret=None, request_timeout=60, **kwargs)[source]

Task template for downloading data from Google Cloud Storage as a string.

Args:

  • bucket (str): default bucket name to download from
  • blob (str, optional): default blob name to download.
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when downloading the Blob
  • request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor
Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:                                                                                                                                                       

prefect.tasks.gcp.storage.GCSDownload.run

(bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, encryption_key_secret=None, request_timeout=60)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • bucket (str, optional): the bucket name to upload to
  • blob (str, optional): blob name to download from
  • project (str, optional): Google Cloud project to work within. If not provided here or at initialization, will be inferred from your Google Cloud credentials
  • chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • encryption_key (str, optional): an encryption key
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when uploading the Blob
  • request_timeout (Union[float, Tuple[float, float]], optional): the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
Returns:
  • str: the data from the blob, as a string
Raises:
  • google.cloud.exception.NotFound: if create_bucket=False and the bucket name is not found
  • ValueError: if blob name hasn't been provided



# GCSUpload

class

prefect.tasks.gcp.storage.GCSUpload

(bucket, blob=None, project=None, chunk_size=104857600, create_bucket=False, encryption_key_secret=None, request_timeout=60, **kwargs)[source]

Task template for uploading data to Google Cloud Storage. Data can be a string, bytes or io.BytesIO

Args:

  • bucket (str): default bucket name to upload to
  • blob (str, optional): default blob name to upload to; otherwise a random string beginning with prefect- and containing the Task Run ID will be used
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • create_bucket (bool, optional): boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults to False.
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when uploading the Blob
  • request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor
Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:                                                                                                                                                       

prefect.tasks.gcp.storage.GCSUpload.run

(data, bucket=None, blob=None, project=None, chunk_size=None, credentials=None, encryption_key=None, create_bucket=False, encryption_key_secret=None, content_type=None, content_encoding=None, request_timeout=60)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • data (Union[str, bytes, BytesIO]): the data to upload; can be either string, bytes or io.BytesIO
  • bucket (str, optional): the bucket name to upload to
  • blob (str, optional): blob name to upload to a string beginning with prefect- and containing the Task Run ID will be used
  • project (str, optional): Google Cloud project to work within. Can be inferred from credentials if not provided.
  • chunk_size (int, optional): The size of a chunk of data whenever iterating (in bytes). This must be a multiple of 256 KB per the API specification.
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • encryption_key (str, optional): an encryption key
  • create_bucket (bool, optional): boolean specifying whether to create the bucket if it does not exist, otherwise an Exception is raised. Defaults to False.
  • encryption_key_secret (str, optional, DEPRECATED): the name of the Prefect Secret storing an optional encryption_key to be used when uploading the Blob.
  • content_type (str, optional): HTTP ‘Content-Type’ header for this object.
  • content_encoding (str, optional): HTTP ‘Content-Encoding’ header for this object.
  • request_timeout (Union[float, Tuple[float, float]], optional): the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
Raises:
  • TypeError: if data is neither string nor bytes.
  • google.cloud.exception.NotFound: if create_bucket=False and the bucket name is not found
Returns:
  • str: the blob name that now stores the provided data



# GCSCopy

class

prefect.tasks.gcp.storage.GCSCopy

(source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, request_timeout=60, **kwargs)[source]

Task template for copying data from one Google Cloud Storage bucket to another, without downloading it locally.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • source_bucket (str, optional): default source bucket name.
  • source_blob (str, optional): default source blob name.
  • dest_bucket (str, optional): default destination bucket name.
  • dest_blob (str, optional): default destination blob name.
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor
Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:                                                                                                                                                       

prefect.tasks.gcp.storage.GCSCopy.run

(source_bucket=None, source_blob=None, dest_bucket=None, dest_blob=None, project=None, credentials=None, request_timeout=60)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • source_bucket (str, optional): default source bucket name.
  • source_blob (str, optional): default source blob name.
  • dest_bucket (str, optional): default destination bucket name.
  • dest_blob (str, optional): default destination blob name.
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • request_timeout (Union[float, Tuple[float, float]], optional): the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
Returns:
  • str: the name of the destination blob
Raises:
  • ValueError: if source_bucket, source_blob, dest_bucket, or dest_blob are missing or point at the same object.



# GCSBlobExists

class

prefect.tasks.gcp.storage.GCSBlobExists

(bucket_name=None, blob=None, project=None, wait_seconds=0, fail_if_not_found=True, request_timeout=60, **kwargs)[source]

Task template for checking a Google Cloud Storage bucket for a given object

Args:

  • bucket_name (str, optional): the bucket to check
  • blob (str, optional): object for which to search within the bucket
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • wait_seconds(int, optional): retry until file is found or until wait_seconds, whichever is first. Defaults to 0
  • request_timeout (Union[float, Tuple[float, float]], optional): default number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
  • fail_if_not_found (bool, optional): Will raise Fail signal on task if blob is not found. Defaults to True
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor
Note that the design of this task allows you to initialize a template with default settings. Each inidividual occurence of the task in a Flow can overwrite any of these default settings for custom use (for example, if you want to pull different credentials for a given Task, or specify the Blob name at runtime).

methods:                                                                                                                                                       

prefect.tasks.gcp.storage.GCSBlobExists.run

(bucket_name=None, blob=None, project=None, wait_seconds=0, fail_if_not_found=True, credentials=None, request_timeout=60)[source]

Run method for this Task. Invoked by calling this Task after initialization within a Flow context.

Note that some arguments are required for the task to run, and must be provided either at initialization or as arguments.

Args:

  • bucket_name (str, optional): the bucket to check
  • blob (str, optional): object for which to search within the bucket
  • project (str, optional): default Google Cloud project to work within. If not provided, will be inferred from your Google Cloud credentials
  • wait_seconds(int, optional): retry until file is found or until wait_seconds, whichever is first. Defaults to 0
  • fail_if_not_found (bool, optional): Will raise Fail signal on task if blob is not found. Defaults to True
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • request_timeout (Union[float, Tuple[float, float]], optional): the number of seconds the transport should wait for the server response. Can also be passed as a tuple (connect_timeout, read_timeout).
Returns:
  • bool: the object exists
Raises:
  • ValueError: if bucket_name or blob are missing
  • FAIL: if object not found and fail_if_not_found is True



# BigQueryTask

class

prefect.tasks.gcp.bigquery.BigQueryTask

(query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None, **kwargs)[source]

Task for executing queries against a Google BigQuery table and (optionally) returning the results. Note that all initialization settings can be provided / overwritten at runtime.

Args:

  • query (str, optional): a string of the query to execute
  • query_params (list[tuple], optional): a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • dry_run_max_bytes (int, optional): if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising a ValueError if the maximum is exceeded
  • dataset_dest (str, optional): the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided, table_dest must also be provided
  • table_dest (str, optional): the optional name of a destination table to write the query results to, if you don't want them returned; if provided, dataset_dest must also be provided
  • to_dataframe (bool, optional): if provided, returns the results of the query as a pandas dataframe instead of a list of bigquery.table.Row objects. Defaults to False
  • job_config (dict, optional): an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected)
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryTask.run

(query=None, query_params=None, project=None, location="US", dry_run_max_bytes=None, credentials=None, dataset_dest=None, table_dest=None, to_dataframe=False, job_config=None)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • query (str, optional): a string of the query to execute
  • query_params (list[tuple], optional): a list of 3-tuples specifying BigQuery query parameters; currently only scalar query parameters are supported. See the Google documentation for more details on how both the query and the query parameters should be formatted
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • dry_run_max_bytes (int, optional): if provided, the maximum number of bytes the query is allowed to process; this will be determined by executing a dry run and raising a ValueError if the maximum is exceeded
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • dataset_dest (str, optional): the optional name of a destination dataset to write the query results to, if you don't want them returned; if provided, table_dest must also be provided
  • table_dest (str, optional): the optional name of a destination table to write the query results to, if you don't want them returned; if provided, dataset_dest must also be provided
  • to_dataframe (bool, optional): if provided, returns the results of the query as a pandas dataframe instead of a list of bigquery.table.Row objects. Defaults to False
  • job_config (dict, optional): an optional dictionary of job configuration parameters; note that the parameters provided here must be pickleable (e.g., dataset references will be rejected)
Raises:
  • ValueError: if the query is None
  • ValueError: if only one of dataset_dest / table_dest is provided
  • ValueError: if the query will execeed dry_run_max_bytes
Returns:
  • list: a fully populated list of Query results, with one item per row



# BigQueryStreamingInsert

class

prefect.tasks.gcp.bigquery.BigQueryStreamingInsert

(dataset_id=None, table=None, project=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via the streaming API. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

  • dataset_id (str, optional): the id of a destination dataset to write the records to
  • table (str, optional): the name of a destination table to write the records to
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryStreamingInsert.run

(records, dataset_id=None, table=None, project=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • records (list[dict]): the list of records to insert as rows into the BigQuery table; each item in the list should be a dictionary whose keys correspond to columns in the table
  • dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
  • table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • **kwargs (optional): additional kwargs to pass to the insert_rows_json method; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html
Raises:
  • ValueError: if all required arguments haven't been provided
  • ValueError: if any of the records result in errors
Returns:
  • the response from insert_rows_json



# CreateBigQueryTable

class

prefect.tasks.gcp.bigquery.CreateBigQueryTable

(project=None, dataset=None, table=None, schema=None, clustering_fields=None, time_partitioning=None, **kwargs)[source]

Ensures a BigQuery table exists; creates it otherwise. Note that most initialization keywords can optionally be provided at runtime.

Args:

  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • dataset (str, optional): the name of a dataset in that the table will be created
  • table (str, optional): the name of a table to create
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • clustering_fields (List[str], optional): a list of fields to cluster the table by
  • time_partitioning (bigquery.TimePartitioning, optional): a bigquery.TimePartitioning object specifying a partitioninig of the newly created table
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.CreateBigQueryTable.run

(project=None, credentials=None, dataset=None, table=None, schema=None)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • dataset (str, optional): the name of a dataset in that the table will be created
  • table (str, optional): the name of a table to create
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
Returns:
  • None
Raises:
  • SUCCESS: a SUCCESS signal if the table already exists



# BigQueryLoadGoogleCloudStorage

class

prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage

(uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

  • uri (str, optional): GCS path to load data from
  • dataset_id (str, optional): the id of a destination dataset to write the records to
  • table (str, optional): the name of a destination table to write the records to
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryLoadGoogleCloudStorage.run

(uri=None, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • uri (str, optional): GCS path to load data from
  • dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
  • table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task. If not provided, Prefect will first check context for GCP_CREDENTIALS and lastly will use default Google client logic.
  • **kwargs (optional): additional kwargs to pass to the bigquery.LoadJobConfig; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html
Raises:
  • ValueError: if all required arguments haven't been provided
  • ValueError: if the load job results in an error
Returns:
  • google.cloud.bigquery.job.LoadJob: the response from load_table_from_uri



# BigQueryLoadFile

class

prefect.tasks.gcp.bigquery.BigQueryLoadFile

(file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", **kwargs)[source]

Task for insert records in a Google BigQuery table via a load job. Note that all of these settings can optionally be provided or overwritten at runtime.

Args:

  • file (Union[str, path-like object], optional): A string or path-like object of the file to be loaded
  • rewind (bool, optional): if True, seek to the beginning of the file handle before reading the file
  • size (int, optional): the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.
  • num_retries (int, optional): the number of max retries for loading the bigquery table from file. Defaults to 6
  • dataset_id (str, optional): the id of a destination dataset to write the records to
  • table (str, optional): the name of a destination table to write the records to
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be queried; defaults to "US"
  • **kwargs (optional): additional kwargs to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.gcp.bigquery.BigQueryLoadFile.run

(file=None, rewind=False, size=None, num_retries=6, dataset_id=None, table=None, project=None, schema=None, location="US", credentials=None, **kwargs)[source]

Run method for this Task. Invoked by calling this Task within a Flow context, after initialization.

Args:

  • file (Union[str, path-liike object], optional): A string or path-like object of the file to be loaded
  • rewind (bool, optional): if True, seek to the beginning of the file handle before reading the file
  • size (int, optional): the number of bytes to read from the file handle. If size is None or large, resumable upload will be used. Otherwise, multipart upload will be used.
  • num_retries (int, optional): the number of max retries for loading the bigquery table from file. Defaults to 6
  • dataset_id (str, optional): the id of a destination dataset to write the records to; if not provided here, will default to the one provided at initialization
  • table (str, optional): the name of a destination table to write the records to; if not provided here, will default to the one provided at initialization
  • project (str, optional): the project to initialize the BigQuery Client with; if not provided, will default to the one inferred from your credentials
  • schema (List[bigquery.SchemaField], optional): the schema to use when creating the table
  • location (str, optional): location of the dataset that will be written to; defaults to "US"
  • credentials (dict, optional): a JSON document containing Google Cloud credentials. You should provide these at runtime with an upstream Secret task.
  • **kwargs (optional): additional kwargs to pass to the bigquery.LoadJobConfig; see the documentation here: https://googleapis.github.io/google-cloud-python/latest/bigquery/generated/google.cloud.bigquery.client.Client.html
Raises:
  • ValueError: if all required arguments haven't been provided or file does not exist
  • IOError: if file can't be opened and read
  • ValueError: if the load job results in an error
Returns:
  • google.cloud.bigquery.job.LoadJob: the response from load_table_from_file



This documentation was auto-generated from commit n/a
on July 1, 2021 at 18:35 UTC