# PIN 16: Results and Targets
Date: March 4, 2020
Authors: Chris White & Laura Lorenz
# Status
Accepted
# Context
Currently, Prefect tracks all data explicitly returned by tasks for downstream consumption. In addition, Prefect allows for caching the success of a task based on a certain condition (specified by a cache_validator) over a certain time frame (specified by cache_for on the task). Prefect also currently allows for the data returned by individual tasks to be stored in diverse, configurable locations via the ResultHandler interface. Despite the configurable nature of the result handler and caching APIs, there are still a few use cases that are not sufficiently covered within Prefect's API:
- "Makefile"-like semantics which allow the existence of data in a certain location to prevent the rerun of a task
- Prefect has no way of tracking data produced as side-effects of tasks
- Prefect exposes no first-class hooks for validating data (whether returned from a task or produced by a task during the run)
- Prefect output caches exist only in-memory for Core-only users, so cached computations are not recoverable between Python processes
While most of this can technically still be achieved within Prefect, the popularity of these patterns along with the lack of an obvious "correct" way to achieve them in Prefect leaves room for improving Prefect's API to accommodate such patterns in a first-class way.
# Proposal
The current proposal seeks to address all use cases listed above with a common underlying implementation. In particular, we will extend the current Result class API with the following new methods and settings:
- we will introduce new
Resulttypes which correspond to different storage backends for data (e.g.,LocalFileResult,S3Result, etc.) - new
read/writemethods which mimic the currentResultHandlerimplementations and will supercede the need for result handlers altogether - the initialization of
Resultobjects will allow for setting a configurable and templatable location (e.g., a templatable file name), along with a collection of validation functions which will optionally be called on the underlying object - a new
existsmethod for determining whether data exists in a given location - a new
validatemethod which accepts an optional list of validators; both the provided validators along with those set at initialization will then be called on the underlying data to validate it in customizable ways cache_forandcache_validatorscan be set on the Result, allowing for output cache behaviors to be exposed using this same interface
With this new collection of result "types", we can now introduce the following new features / settings on Tasks and Flows:
- Flows can be provided with a default Result type for its tasks
- individual tasks can optionally override the default Flow type (identical to the current way result handlers work)
- tasks can receive an optional set of validation functions under the
validatorskwarg which will be appended to the validators attached to the task's result type - when
validate=Trueon theResulttype, thevalidatemethod described above will be called as an additional pipeline step in theTaskRunner; if validation fails, the task run will enter a newValidationErrorfailed state - a new
target: Resultkwarg on tasks which serves two purposes: a) the output result type can be overriden through this kwarg and b) the newexistsmethod will be called as a pre-run pipeline check; if the result exists, it will bereadand optionally validated, and the task run will then enter aCachedstate with the data that was read - if configured, a Result's
cache_fororcache_validatorswill be run during the pre-run pipeline check on the read Result, and if successful the task run instead enter aCachedState with the data that was read
# Consequences
The new API proposed here will allow for makefile like workflows, first-class type validation on all data returned / emitted by tasks (including data produced as a side effect so long as users use the Result API), and hopefully make the current result_handler interface more intuitive for users. Because of the large scale of the proposed changes, we will need to consider whether we can implement them in a backwards compatible way (we probably can).
# Actions
We attempt to list a roughly independent set of action items for achieving the above proposal:
- introduce a new
ValidationErrorfailed state - merge the interfaces of
Results andResultHandlers into a singleResultinterface with multiple types - decide on how templating of file locations should work (e.g., will we call
.format(**prefect.context)?) - introduce three new keywords to
Task.__init__:result(orresult_type),validators, andtarget - utilize or merge existing output caching logic to operate against Result targets instead of the in-memory cache.
cache_forwill be usable for any Result target type that has an inspectable last modified time - develop a future PIN that allows for target file locations to be automatically configured by Prefect based on some default templating behavior