Concepts¶
Models¶
Data Cubes¶
Transforms¶
What¶
A transform is, very broadly speaking, a node in a computation plan that accepts as input something other than a “scalar” parameter, and produces something other than a “scalar” output. In practice right now, this generally means taking one or more datacubes as input and producing one or more datacubes as output. See here for more details on datacubes. One can think of a transform as an arbitrary computation lifted into the domain of datacubes.
Why¶
The gap between data producers and data consumers can be vast. Some of the guarantees SuperMaaS provides and the restrictions SuperMaaS requires help to bridge the gap between producers and a single, well-defined, reasonably-highly-structured method of data representation. Transform infrastructure, and transforms themselves, help to bridge the gap between that method of representation and disparate data consumption spaces.
For example, consider two models that produce rainfall data. Model A
produces CSV files specifying inches of rainfall in named cities, and
model B
produces GeoTIFF images specifying centimeters of rainfall
at latitude/longitude points. The registration process and the metadata
it requires will tell SuperMaaS how to interpret these models’ outputs,
store them internally, and provide API access to fetch their output data
in a normalized format. Transforms interface with that API to fetch the
data as SuperMaaS stores it, perform arbitrary computation over it,
and return it to SuperMaaS for storage and potential further fetching,
by another transform or by an end consumer. In this example, one can
imagine the following workflow:
Model
A
’s output is piped to transformC
, which converts the city names modelA
provides into latitude/longitude coordinate pairs.Model
B
’s output is piped to transformD
, which converts the numeric valuesB
produces in centimeters to inches.Transform
C
’s output is piped to transformE
as the first of two expected inputs.E
determines the difference/error between two sources of data representing the same concept. This is an intentionally vague description as, again, the transform’s computation can be arbitrary.Transform
D
’s output is also piped to transformE
, as the second of two expected inputs.Transform
E
’s output can be shipped off elsewhere, to an end consumer or to another transform.
How¶
Though nothing in SuperMaaS strictly requires it, transforms tend to be
written in Python (>= 3.8). One reason for this is the
Galois-authored-and-maintained supermaas_utils
Python library, which
provides some API abstractions to ease datacube pull-modify-push
workflows, the bread and butter of many transforms. For a literate
sample transform, see
here.