# Azure ML Service Tasks


This module contains a collection of tasks for interacting with Azure Machine Learning Service resources.

# DatasetCreateFromDelimitedFiles

class

prefect.tasks.azureml.dataset.DatasetCreateFromDelimitedFiles

(dataset_name=None, datastore=None, path=None, dataset_description="", dataset_tags=None, include_path=False, infer_column_types=True, set_column_types=None, fine_grain_timestamp=None, coarse_grain_timestamp=None, separator=",", header=azureml.data.dataset_type_definitions.PromoteHeadersBehavior.ALL_FILES_HAVE_SAME_HEADERS, partition_format=None, create_new_version=False, **kwargs)[source]

Task for creating a TabularDataset from delimited files for use in a Azure Machine Learning service Workspace. The files should exist in a Datastore. Note that all initialization arguments can optionally be provided or overwritten at runtime.

Args:

  • dataset_name (str, optional): The name of the Dataset in the Workspace
  • datastore (azureml.core.datastore.Datastore, optional): The Datastore which holds the files.
  • path (Union[str, List[str]], optional): The path to the delimited files in the Datastore.
  • dataset_description (str, optional): Description of the Dataset.
  • dataset_tags (str, optional): Tags to associate with the Dataset.
  • include_path (bool, optional): Boolean to keep path information as column in the dataset.
  • infer_column_types (bool, optional): Boolean to infer column data types.
  • set_column_types (Dict[str, azureml.data.DataType], optional): A dictionary to set column data type, where key is column name and value is a azureml.data.DataType.
  • fine_grain_timestamp (str, optional): The name of column as fine grain timestamp.
  • coarse_grain_timestamp (str, optional): The name of column coarse grain timestamp.
  • separator (str, optional): The separator used to split columns.
  • header (azureml.data.dataset_type_definitions.PromoteHeadersBehavior, optional): Controls how column headers are promoted when reading from files. Defaults to assume that all files have the same header.
  • partition_format (str, optional): Specify the partition format of path. Defaults to None. The partition information of each path will be extracted into columns based on the specified format. Format part {column_name} creates string column, and {column_name:yyyy/MM/dd/HH/mm/ss} creates datetime column, where yyyy, MM, dd, HH, mm and ss are used to extrat year, month, day, hour, minute and second for the datetime type. The format should start from the position of first partition key until the end of file path. For example, given the path ../Germany/2019/01/01/data.csv where the partition is by country and time, partition_format="/{Country}/{PartitionDate:yyyy/MM/dd}/data.csv" creates string column Country with value Germany and datetime column PartitionDate with value 2019-01-01.
  • create_new_version (bool, optional): Boolean to register the dataset as a new version under the specified name.
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.azureml.dataset.DatasetCreateFromDelimitedFiles.run

(dataset_name=None, datastore=None, path=None, dataset_description="", dataset_tags=None, include_path=False, infer_column_types=True, set_column_types=None, fine_grain_timestamp=None, coarse_grain_timestamp=None, separator=",", header=azureml.data.dataset_type_definitions.PromoteHeadersBehavior.ALL_FILES_HAVE_SAME_HEADERS, partition_format=None, create_new_version=False)[source]

Task run method.

Args:

  • dataset_name (str, optional): The name of the Dataset in the Workspace
  • datastore (azureml.core.datastore.Datastore, optional): The Datastore which holds the files.
  • path (Union[str, List[str]], optional): The path to the delimited files in the Datastore.
  • dataset_description (str, optional): Description of the Dataset.
  • dataset_tags (str, optional): Tags to associate with the Dataset.
  • include_path (bool, optional): Boolean to keep path information as column in the dataset.
  • infer_column_types (bool, optional): Boolean to infer column data types.
  • set_column_types (Dict[str, azureml.data.DataType], optional): A dictionary to set column data type, where key is column name and value is a azureml.data.DataType.
  • fine_grain_timestamp (str, optional): The name of column as fine grain timestamp.
  • coarse_grain_timestamp (str, optional): The name of column coarse grain timestamp.
  • separator (str, optional): The separator used to split columns.
  • header (azureml.data.dataset_type_definitions.PromoteHeadersBehavior, optional): Controls how column headers are promoted when reading from files. Defaults to assume that all files have the same header.
  • partition_format (str, optional): Specify the partition format of path.
  • create_new_version (bool, optional): Boolean to register the dataset as a new version under the specified name.
Returns:
  • azureml.data.TabularDataset: the created TabularDataset



# DatasetCreateFromParquetFiles

class

prefect.tasks.azureml.dataset.DatasetCreateFromParquetFiles

(dataset_name=None, datastore=None, path=None, dataset_description="", dataset_tags=None, include_path=False, set_column_types=None, fine_grain_timestamp=None, coarse_grain_timestamp=None, partition_format=None, create_new_version=False, **kwargs)[source]

Task for creating a TabularDataset from Parquet files for use in a Azure Machine Learning service Workspace. The files should exist in a Datastore. Note that all initialization arguments can optionally be provided or overwritten at runtime.

Args:

  • dataset_name (str, optional): The name of the Dataset in the Workspace
  • datastore (azureml.core.datastore.Datastore, optional): The Datastore which holds the files.
  • path (Union[str, List[str]], optional): The path to the delimited files in the Datastore.
  • dataset_description (str, optional): Description of the Dataset.
  • dataset_tags (str, optional): Tags to associate with the Dataset.
  • include_path (bool, optional): Boolean to keep path information as column in the dataset.
  • set_column_types (Dict[str, azureml.data.DataType], optional): A dictionary to set column data type, where key is column name and value is a azureml.data.DataType.
  • fine_grain_timestamp (str, optional): The name of column as fine grain timestamp.
  • coarse_grain_timestamp (str, optional): The name of column coarse grain timestamp.
  • partition_format (str, optional): Specify the partition format of path. Defaults to None. The partition information of each path will be extracted into columns based on the specified format. Format part {column_name} creates string column, and {column_name:yyyy/MM/dd/HH/mm/ss} creates datetime column, where yyyy, MM, dd, HH, mm and ss are used to extrat year, month, day, hour, minute and second for the datetime type. The format should start from the position of first partition key until the end of file path. For example, given the path ../Germany/2019/01/01/data.csv where the partition is by country and time, partition_format="/{Country}/{PartitionDate:yyyy/MM/dd}/data.csv" creates string column Country with value Germany and datetime column PartitionDate with value 2019-01-01.
  • create_new_version (bool, optional): Boolean to register the dataset as a new version under the specified name.
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.azureml.dataset.DatasetCreateFromParquetFiles.run

(dataset_name=None, datastore=None, path=None, dataset_description="", dataset_tags=None, include_path=False, set_column_types=None, fine_grain_timestamp=None, coarse_grain_timestamp=None, partition_format=None, create_new_version=None)[source]

Task run method.

Args:

  • dataset_name (str, optional): The name of the Dataset in the Workspace
  • datastore (azureml.core.datastore.Datastore, optional): The Datastore which holds the files.
  • path (Union[str, List[str]], optional): The path to the delimited files in the Datastore.
  • dataset_description (str, optional): Description of the Dataset.
  • dataset_tags (str, optional): Tags to associate with the Dataset.
  • include_path (bool, optional): Boolean to keep path information as column in the dataset.
  • set_column_types (Dict[str, azureml.data.DataType], optional): A dictionary to set column data type, where key is column name and value is a azureml.data.DataType.
  • fine_grain_timestamp (str, optional): The name of column as fine grain timestamp.
  • coarse_grain_timestamp (str, optional): The name of column coarse grain timestamp.
  • partition_format (str, optional): Specify the partition format of path.
  • create_new_version (bool, optional): Boolean to register the dataset as a new version under the specified name.
Returns:
  • azureml.data.TabularDataset: the created TabularDataset.



# DatasetCreateFromFiles

class

prefect.tasks.azureml.dataset.DatasetCreateFromFiles

(dataset_name=None, datastore=None, path=None, dataset_description="", dataset_tags=None, create_new_version=False, **kwargs)[source]

Task for creating a FileDataset from files for use in a Azure Machine Learning service Workspace. The files should exist in a Datastore. Note that all initialization arguments can optionally be provided or overwritten at runtime.

Args:

  • dataset_name (str, optional): The name of the Dataset in the Workspace
  • datastore (azureml.core.datastore.Datastore, optional): The Datastore which holds the files.
  • path (Union[str, List[str]], optional): The path to the delimited files in the Datastore.
  • dataset_description (str, optional): Description of the Dataset.
  • dataset_tags (str, optional): Tags to associate with the Dataset.
  • create_new_version (bool, optional): Boolean to register the dataset as a new version under the specified name.
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.azureml.dataset.DatasetCreateFromFiles.run

(dataset_name=None, datastore=None, path=None, dataset_description="", dataset_tags=None, create_new_version=False)[source]

Task run method.

Args:

  • dataset_name (str, optional): The name of the Dataset in the Workspace
  • datastore (azureml.core.datastore.Datastore, optional): The Datastore which holds the files.
  • path (Union[str, List[str]], optional): The path to the delimited files in the Datastore.
  • dataset_description (str, optional): Description of the Dataset.
  • dataset_tags (str, optional): Tags to associate with the Dataset.
  • create_new_version (bool, optional): Boolean to register the dataset as a new version under the specified name.
Returns:
  • azureml.data.FileDataset: the created FileDataset



# DatastoreRegisterBlobContainer

class

prefect.tasks.azureml.datastore.DatastoreRegisterBlobContainer

(workspace, container_name=None, datastore_name=None, create_container_if_not_exists=False, overwrite_existing_datastore=False, azure_credentials_secret="AZ_CREDENTIALS", set_as_default=False, **kwargs)[source]

Task for registering Azure Blob Storage container as a Datastore in a Azure ML service Workspace.

Args:

  • workspace (azureml.core.workspace.Workspace): The Workspace to which the Datastore is to be registered.
  • container_name (str, optional): The name of the container.
  • datastore_name (str, optional): The name of the datastore. If not defined, the container name will be used.
  • create_container_if_not_exists (bool, optional): Create a container, if one does not exist with the given name.
  • overwrite_existing_datastore (bool, optional): Overwrite an existing datastore. If the datastore does not exist, it will be created.
  • azure_credentials_secret (str, optinonal): The name of the Prefect Secret that stores your Azure credentials; this Secret must be a JSON string with two keys: ACCOUNT_NAME and either ACCOUNT_KEY or SAS_TOKEN (if both are defined thenACCOUNT_KEY is used).
  • set_as_default (bool optional): Set the created Datastore as the default datastore for the Workspace.
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.azureml.datastore.DatastoreRegisterBlobContainer.run

(container_name=None, datastore_name=None, create_container_if_not_exists=False, overwrite_existing_datastore=False, azure_credentials_secret="AZ_CREDENTIALS", set_as_default=False)[source]

Task run method.

Args:

  • container_name (str, optional): The name of the container.
  • datastore_name (str, optional): The name of the datastore. If not defined, the container name will be used.
  • create_container_if_not_exists (bool, optional): Create a container, if one does not exist with the given name.
  • overwrite_existing_datastore (bool, optional): Overwrite an existing datastore. If the datastore does not exist, it will be created.
  • azure_credentials_secret (str, optinonal): The name of the Prefect Secret that stores your Azure credentials; this Secret must be a JSON string with two keys: ACCOUNT_NAME and either ACCOUNT_KEY or SAS_TOKEN (if both are defined thenACCOUNT_KEY is used)
  • set_as_default (bool optional): Set the created Datastore as the default datastore for the Workspace.
Return: - (azureml.data.azure_storage_datastore.AzureBlobDatastore): The registered Datastore.



# DatastoreList

class

prefect.tasks.azureml.datastore.DatastoreList

(workspace, **kwargs)[source]

Task for listing the Datastores in a Workspace.

Args:

  • workspace (azureml.core.workspace.Workspace): The Workspace which Datastores are to be listed.
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.azureml.datastore.DatastoreList.run

()[source]

Task run method.

Returns:

  • Dict[str, Datastore]: a dictionary with the datastore names as keys and Datastore objects as items.



# DatastoreGet

class

prefect.tasks.azureml.datastore.DatastoreGet

(workspace, datastore_name=None, **kwargs)[source]

Task for getting a Datastore registered to a given Workspace.

Args:

  • workspace (azureml.core.workspace.Workspace): The Workspace which Datastore is retrieved.
  • datastore_name (str, optional): The name of the Datastore. If None, then the default Datastore of the Workspace is returned.
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.azureml.datastore.DatastoreGet.run

(datastore_name=None)[source]

Task run method.

Args:

  • datastore_name (str, optional): The name of the Datastore. If None, then the default Datastore of the Workspace is returned.
Returns:
  • (azureml.core.datastore.Datastore): The Datastore.



# DatastoreUpload

class

prefect.tasks.azureml.datastore.DatastoreUpload

(datastore=None, relative_root=None, path=None, target_path=None, overwrite=False, **kwargs)[source]

Task for uploading local files to a Datastore.

Args:

  • datastore (azureml.data.azure_storage_datastore.AbstractAzureStorageDatastore, optional): The datastore to upload the files to.
  • relative_root (str, optional): The root from which is used to determine the path of the files in the blob. For example, if we upload /path/to/file.txt, and we define base path to be /path, when file.txt is uploaded to the blob storage, it will have the path of /to/file.txt.
  • path (Union[str, List[str]], optional): The path to a single file, single directory, or a list of path to files to eb uploaded.
  • target_path (str, optional): The location in the blob container to upload to. If None, then upload to root.
  • overwrite (bool, optional): Overwrite existing file(s).
  • **kwargs (dict, optional): additional keyword arguments to pass to the Task constructor

methods:                                                                                                                                                       

prefect.tasks.azureml.datastore.DatastoreUpload.run

(datastore=None, path=None, relative_root=None, target_path=None, overwrite=False)[source]

Task run method.

Args:

  • datastore (azureml.data.azure_storage_datastore.AbstractAzureStorageDatastore, optional): The datastore to upload the files to.
  • relative_root (str, optional): The root from which is used to determine the path of the files in the blob. For example, if we upload /path/to/file.txt, and we define base path to be /path, when file.txt is uploaded to the blob storage, it will have the path of /to/file.txt.
  • path (Union[str, List[str]], optional): The path to a single file, single directory, or a list of path to files to eb uploaded.
  • target_path (str, optional): The location in the blob container to upload to. If None, then upload to root.
  • overwrite (bool, optional): Overwrite existing file(s).
Returns:
  • (azureml.data.data_reference.DataReference): The DataReference instance for the target path uploaded



This documentation was auto-generated from commit bd9182e
on July 31, 2024 at 18:02 UTC