8. API Reference

8.1. Datasets

cate.core.find_data_sources(data_stores: typing.Union[cate.core.ds.DataStore, typing.Sequence[cate.core.ds.DataStore]] = None, ds_id: str = None, query_expr: str = None) → typing.Sequence[cate.core.ds.DataSource][source]

Find data sources in the given data store(s) matching the given id or query_expr.

See also open_dataset().

Return type:

Sequence

Parameters:
  • data_stores – If given these data stores will be queried. Otherwise all registered data stores will be used.
  • ds_id (str) – A data source identifier.
  • query_expr (str) – A query expression.
Returns:

All data sources matching the given constrains.

cate.core.open_dataset(data_source: typing.Union[cate.core.ds.DataSource, str], time_range: typing.Union[typing.Tuple[str, str], typing.Tuple[datetime.datetime, datetime.datetime], typing.Tuple[datetime.date, datetime.date], str] = None, region: typing.Union[<Mock name='mock.Polygon' id='140631459116704'>, typing.List[typing.Tuple[float, float]], str, typing.Tuple[float, float, float, float]] = None, var_names: typing.Union[typing.List[str], str] = None, force_local: bool = False, local_ds_id: str = None, monitor: cate.util.monitor.Monitor = Monitor.NONE) → typing.Any[source]

Open a dataset from a data source.

Parameters:
  • data_source – A DataSource object or a string. Strings are interpreted as the identifier of an ECV dataset and must not be empty.
  • time_range – An optional time constraint comprising start and end date. If given, it must be a TimeRangeLike.
  • region – An optional region constraint. If given, it must be a PolygonLike.
  • var_names – Optional names of variables to be included. If given, it must be a VarNamesLike.
  • force_local (bool) – Optional flag for remote data sources only Whether to make a local copy of data source if it’s not present
  • local_ds_id (str) – Optional, fpr remote data sources only Local data source ID for newly created copy of remote data source
  • monitor (Monitor) – A progress monitor
Returns:

An new dataset instance

8.2. Operations

8.2.1. Anomaly calculation

cate.ops.anomaly_internal(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Calculate anomaly using as reference data the mean of an optional region and time slice from the given dataset. If no time slice/spatial region is given, the operation will calculate anomaly using the mean of the whole dataset as the reference.

This is done for each data array in the dataset. :type monitor: Monitor :param ds: The dataset to calculate anomalies from :param time_range: Time range to use for reference data :param region: Spatial region to use for reference data :param monitor: a progress monitor. :return: The anomaly dataset

cate.ops.anomaly_external(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Calculate anomaly with external reference data, for example, a climatology. The given reference dataset is expected to consist of 12 time slices, one for each month.

The returned dataset will contain the variable names found in both - the reference and the given dataset. Names found in the given dataset, but not in the reference, will be dropped from the resulting dataset. The calculated anomaly will be against the corresponding month of the reference data. E.g. January against January, etc.

In case spatial extents differ between the reference and the given dataset, the anomaly will be calculated on the intersection.

Parameters:
  • ds – The dataset to calculate anomalies from
  • file – Path to reference data file
  • transform – Apply the given transformation before calculating the anomaly. For supported operations see help on ‘ds_arithmetics’ operation.
  • monitor (Monitor) – a progress monitor.
Returns:

The anomaly dataset

8.2.2. Arithmetic

cate.ops.ds_arithmetics(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Do arithmetic operations on the given dataset by providing a list of arithmetic operations and the corresponding constant. The operations will be applied to the dataset in the order in which they appear in the list. For example: ‘log,+5,-2,/3,*2’

Currently supported arithmetic operations: log,log10,log2,log1p,exp,+,-,/,*

where:
log - natural logarithm log10 - base 10 logarithm log2 - base 2 logarithm log1p - log(1+x) exp - the exponential

The operations will be applied element-wise to all arrays of the dataset.

Parameters:
  • ds – The dataset to which to apply arithmetic operations
  • op – A comma separated list of arithmetic operations to apply
  • monitor (Monitor) – a progress monitor.
Returns:

The dataset with given arithmetic operations applied

8.2.3. Averaging

cate.ops.long_term_average(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Perform long term average of the given dataset by doing a mean of monthly values over the time range covered by the dataset. E.g. it averages all January values, all February values, etc, to create a dataset with twelve time slices each containing a mean of respective monthly values.

For further information on climatological datasets, see http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#climatological-statistics

Parameters:
  • ds – A monthly dataset to average
  • var – If given, only these variables will be preserved in the resulting dataset
  • monitor (Monitor) – A progress monitor
Returns:

A climatological long term average dataset

cate.ops.temporal_aggregation(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Perform monthly aggregation of a daily dataset according to the given method.

Parameters:
  • ds – Dataset to aggregate
  • method – Aggregation method
Returns:

Aggregated dataset

8.2.4. Coregistration

cate.ops.coregister(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Perform coregistration of two datasets by resampling the slave dataset unto the grid of the master. If upsampling has to be performed, this is achieved using interpolation, if downsampling has to be performed, the pixels of the slave dataset are aggregated to form a coarser grid.

The returned dataset will contain the lat/lon intersection of provided master and slave datasets, resampled unto the master grid frequency.

This operation works on datasets whose spatial dimensions are defined on pixel-registered and equidistant in lat/lon coordinates grids. E.g., data points define the middle of a pixel and pixels have the same size across the dataset.

This operation will resample all variables in a dataset, as the lat/lon grid is defined per dataset. It works only if all variables in the dataset have lat and lon as dimensions.

For an overview of downsampling/upsampling methods used in this operation, please see https://github.com/CAB-LAB/gridtools

Whether upsampling or downsampling has to be performed is determined automatically based on the relationship of the grids of the provided datasets.

Parameters:
  • ds_master – The dataset whose grid is used for resampling
  • ds_slave – The dataset that will be resampled
  • method_us – Interpolation method to use for upsampling.
  • method_ds – Interpolation method to use for downsampling.
  • monitor (Monitor) – a progress monitor.
Returns:

The slave dataset resampled on the grid of the master

8.2.5. Correlation

cate.ops.pearson_correlation_scalar(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Do product moment Pearson’s correlation analysis.

Performs a simple correlation analysis on two timeseries and returns a correlation coefficient and the corresponding p_value.

Positive correlation implies that as x grows, so does y. Negative correlation implies that as x increases, y decreases.

For more information how to interpret the results, see here, and here.

Parameters:
  • ds_x – The ‘x’ dataset
  • ds_y – The ‘y’ dataset
  • var_x – Dataset variable to use for correlation analysis in the ‘variable’ dataset
  • var_y – Dataset variable to use for correlation analysis in the ‘dependent’ dataset
  • monitor (Monitor) – a progress monitor.
Returns:

{‘corr_coef’: correlation coefficient, ‘p_value’: probability value}

cate.ops.pearson_correlation(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Do product moment Pearson’s correlation analysis.

Perform Pearson correlation on two datasets and produce a lon/lat map of correlation coefficients and the correspoding p_values.

In case two 3D lon/lat/time datasets are provided, pixel by pixel correlation will be performed. It is also possible two pro Perform Pearson correlation analysis on two time/lat/lon datasets and produce a lat/lon map of correlation coefficients and p_values of underlying timeseries in the provided datasets.

The lat/lon definition of both datasets has to be the same. The length of the time dimension should be equal, but not neccessarily have the same definition. E.g., it is possible to correlate different times of the same area.

There are ‘x’ and ‘y’ datasets. Positive correlations imply that as x grows, so does y. Negative correlations imply that as x increases, y decreases.

For more information how to interpret the results, see here, and here.

Parameters:
  • ds_x – The ‘x’ dataset
  • ds_y – The ‘y’ dataset
  • var_x – Dataset variable to use for correlation analysis in the ‘variable’ dataset
  • var_y – Dataset variable to use for correlation analysis in the ‘dependent’ dataset
  • monitor (Monitor) – a progress monitor.
Returns:

a dataset containing a map of correlation coefficients and p_values

8.2.6. Input / Output

cate.ops.open_dataset(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Open a dataset from a data source identified by ds_name.

Parameters:
  • ds_name – The name of data source. This parameter has been deprecated, please use ds_id instead.
  • ds_id – The identifier for the data source.
  • time_range – Optional time range of the requested dataset
  • region – Optional spatial region of the requested dataset
  • var_names – Optional names of variables of the requested dataset
  • normalize – Whether to normalize the dataset’s geo- and time-coding upon opening. See operation normalize.
  • force_local – Whether to make a local copy of remote data source if it’s not present
  • local_ds_id – Optional local identifier for newly created local copy of remote data source. Used only if force_local=True.
  • monitor (Monitor) – A progress monitor
Returns:

An new dataset instance.

cate.ops.save_dataset(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Save a dataset to NetCDF file.

Parameters:
  • ds – The dataset
  • file – File path
  • format – NetCDF format flavour, one of ‘NETCDF4’, ‘NETCDF4_CLASSIC’, ‘NETCDF3_64BIT’, ‘NETCDF3_CLASSIC’.
  • monitor (Monitor) – a progress monitor.
cate.ops.read_object(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Read a data object from a file.

Parameters:
  • file – The file path.
  • format – Optional format name.
Returns:

The data object.

cate.ops.write_object(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Read a data object from a file.

Parameters:
  • file – The file path.
  • format – Optional format name.
Returns:

The data object.

cate.ops.read_text(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Read a string object from a text file.

Parameters:
  • file – The text file path.
  • encoding – Optional encoding, e.g. “utc-8”.
Returns:

The string object.

cate.ops.write_text(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Write an object as string to a text file.

Parameters:
  • obj – The data object.
  • file – The text file path.
  • encoding – Optional encoding, e.g. “utc-8”.
cate.ops.read_json(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Read a data object from a JSON text file.

Parameters:
  • file – The JSON file path.
  • encoding – Optional encoding, e.g. “utc-8”.
Returns:

The data object.

cate.ops.write_json(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Write a data object to a JSON text file. Note that the data object must be JSON-serializable.

Parameters:
  • obj – A JSON-serializable data object.
  • file – The JSON file path.
  • encoding – Optional encoding, e.g. “utf-8”.
  • indent – indent used in the file, e.g. ” ” (two spaces).
cate.ops.read_csv(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Read comma-separated values (CSV) from plain text file into a Pandas DataFrame.

Parameters:
  • file – The CSV file path.
  • delimiter – Delimiter to use. If delimiter is None, will try to automatically determine this.
  • delim_whitespace – Specifies whether or not whitespaces will be used as delimiter. If this option is set, nothing should be passed in for the delimiter parameter.
  • quotechar – The character used to denote the start and end of a quoted item. Quoted items can include the delimiter and it will be ignored.
  • comment – Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character.
  • index_col – The name of the column that provides unique identifiers
  • more_args – Other optional pandas.read_csv() keyword arguments
Returns:

The DataFrame object.

cate.ops.read_geo_data_frame(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Returns a GeoDataFrame from a file.

Parameters:file – Is either the absolute or relative path to the file to be opened
Returns:A GeoDataFrame
cate.ops.read_geo_data_collection(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Returns a GeoDataFrame from a file.

Parameters:file – Is either the absolute or relative path to the file to be opened
Returns:A GeoDataFrame
cate.ops.read_netcdf(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Read a dataset from a netCDF 3/4 or HDF file.

Parameters:
  • file – The netCDF file path.
  • drop_variables – List of variables to be dropped.
  • decode_cf – Whether to decode CF attributes and coordinate variables.
  • normalize – Whether to normalize the dataset’s geo- and time-coding upon opening. See operation normalize.
  • decode_times – Whether to decode time information (convert time coordinates to datetime objects).
  • engine – Optional netCDF engine name.
cate.ops.write_netcdf3(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Write a data object to a netCDF 3 file. Note that the data object must be netCDF-serializable.

Parameters:
  • obj – A netCDF-serializable data object.
  • file – The netCDF file path.
  • engine – Optional netCDF engine to be used
cate.ops.write_netcdf4(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Write a data object to a netCDF 4 file. Note that the data object must be netCDF-serializable.

Parameters:
  • obj – A netCDF-serializable data object.
  • file – The netCDF file path.
  • engine – Optional netCDF engine to be used

8.2.7. Data visualization

cate.ops.plot_map(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Create a geographic map plot for the variable given by dataset ds and variable name var.

Plots the given variable from the given dataset on a map with coastal lines. In case no variable name is given, the first encountered variable in the dataset is plotted. In case no time is given, the first time slice is taken. It is also possible to set extents of the plot. If no extents are given, a global plot is created.

The plot can either be shown using pyplot functionality, or saved, if a path is given. The following file formats for saving the plot are supported: eps, jpeg, jpg, pdf, pgf, png, ps, raw, rgba, svg, svgz, tif, tiff

Parameters:
  • ds – the dataset containing the variable to plot
  • var – the variable’s name
  • indexers – Optional indexers into data array of var. The indexers is a dictionary or a comma-separated string of key-value pairs that maps the variable’s dimension names to constant labels. e.g. “layer=4”.
  • time – time slice index to plot, can be a string “YYYY-MM-DD” or an integer number
  • region – Region to plot
  • projection – name of a global projection, see http://scitools.org.uk/cartopy/docs/v0.15/crs/projections.html
  • central_lon – central longitude of the projection in degrees
  • title – an optional title
  • properties – optional plot properties for Python matplotlib, e.g. “bins=512, range=(-1.5, +1.5)” For full reference refer to https://matplotlib.org/api/lines_api.html and https://matplotlib.org/api/_as_gen/matplotlib.axes.Axes.contourf.html
  • file – path to a file in which to save the plot
Returns:

a matplotlib figure object or None if in IPython mode

cate.ops.plot(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Create a 1D/line or 2D/image plot of a variable given by dataset ds and variable name var.

Parameters:
  • ds – Dataset or Dataframe that contains the variable named by var.
  • var – The name of the variable to plot
  • indexers – Optional indexers into data array of var. The indexers is a dictionary or a comma-separated string of key-value pairs that maps the variable’s dimension names to constant labels. e.g. “lat=12.4, time=‘2012-05-02’”.
  • title – an optional plot title
  • properties – optional plot properties for Python matplotlib, e.g. “bins=512, range=(-1.5, +1.5), label=’Sea Surface Temperature’” For full reference refer to https://matplotlib.org/api/lines_api.html and https://matplotlib.org/devdocs/api/_as_gen/matplotlib.patches.Patch.html#matplotlib.patches.Patch
  • file – path to a file in which to save the plot
Returns:

a matplotlib figure object or None if in IPython mode

cate.ops.plot_data_frame(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Plot a data frame.

This is a wrapper of pandas.DataFrame.plot() function.

For further documentation please see http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.html

Parameters:
  • df – A pandas dataframe to plot
  • plot_type – Plot type
  • file – path to a file in which to save the plot
  • kwargs – Keyword arguments to pass to the underlying pandas.DataFrame.plot function

8.2.8. Resampling

cate.ops.resample_2d(src, w, h, ds_method=54, us_method=11, fill_value=None, mode_rank=1, out=None)[source]

Resample a 2-D grid to a new resolution.

Parameters:
  • src – 2-D ndarray
  • wint New grid width
  • hint New grid height
  • ds_method (int) – one of the DS_ constants, optional Grid cell aggregation method for a possible downsampling
  • us_method (int) – one of the US_ constants, optional Grid cell interpolation method for a possible upsampling
  • fill_valuescalar, optional If None, it is taken from src if it is a masked array, otherwise from out if it is a masked array, otherwise numpy’s default value is used.
  • mode_rank (int) – scalar, optional The rank of the frequency determined by the ds_method DS_MODE. One (the default) means most frequent value, zwo means second most frequent value, and so forth.
  • out – 2-D ndarray, optional Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output.
Returns:

An resampled version of the src array.

cate.ops.downsample_2d(src, w, h, method=54, fill_value=None, mode_rank=1, out=None)[source]

Downsample a 2-D grid to a lower resolution by aggregating original grid cells.

Parameters:
  • src – 2-D ndarray
  • wint Grid width, which must be less than or equal to src.shape[-1]
  • hint Grid height, which must be less than or equal to src.shape[-2]
  • method (int) – one of the DS_ constants, optional Grid cell aggregation method
  • fill_valuescalar, optional If None, it is taken from src if it is a masked array, otherwise from out if it is a masked array, otherwise numpy’s default value is used.
  • mode_rank (int) – scalar, optional The rank of the frequency determined by the method DS_MODE. One (the default) means most frequent value, zwo means second most frequent value, and so forth.
  • out – 2-D ndarray, optional Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output.
Returns:

A downsampled version of the src array.

cate.ops.upsample_2d(src, w, h, method=11, fill_value=None, out=None)[source]

Upsample a 2-D grid to a higher resolution by interpolating original grid cells.

Parameters:
  • src – 2-D ndarray
  • wint Grid width, which must be greater than or equal to src.shape[-1]
  • hint Grid height, which must be greater than or equal to src.shape[-2]
  • method (int) – one of the US_ constants, optional Grid cell interpolation method
  • fill_valuescalar, optional If None, it is taken from src if it is a masked array, otherwise from out if it is a masked array, otherwise numpy’s default value is used.
  • out – 2-D ndarray, optional Alternate output array in which to place the result. The default is None; if provided, it must have the same shape as the expected output.
Returns:

An upsampled version of the src array.

8.2.9. Subsetting

cate.ops.select_var(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Filter the dataset, by leaving only the desired variables in it. The original dataset information, including original coordinates, is preserved.

Parameters:
  • ds – The dataset or dataframe from which to perform selection.
  • var – One or more variable names to select and preserve in the dataset. All of these are valid ‘var_name’ ‘var_name1,var_name2,var_name3’ [‘var_name1’, ‘var_name2’]. One can also use wildcards when doing the selection. E.g., choosing ‘var_name*’ for selection will select all variables that start with ‘var_name’. This can be used to select variables along with their auxiliary variables, to select all uncertainty variables, and so on.
Returns:

A filtered dataset

cate.ops.subset_spatial(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Do a spatial subset of the dataset

Parameters:
  • ds – Dataset to subset
  • region – Spatial region to subset
  • mask – Should values falling in the bounding box of the polygon but not the polygon itself be masked with NaN.
Returns:

Subset dataset

cate.ops.subset_temporal(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Do a temporal subset of the dataset.

Parameters:
  • ds – Dataset or dataframe to subset
  • time_range – Time range to select
Returns:

Subset dataset

cate.ops.subset_temporal_index(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Do a temporal indices based subset

Parameters:
  • ds – Dataset or dataframe to subset
  • time_ind_min – Minimum time index to select
  • time_ind_max – Maximum time index to select
Returns:

Subset dataset

8.2.10. Timeseries

cate.ops.tseries_point(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Extract time-series from ds at given lon, lat position using interpolation method for each var given in a comma separated list of variables.

The operation returns a new timeseries dataset, that contains the point timeseries for all required variables with original variable meta-information preserved.

If a variable has more than three dimensions, the resulting timeseries variable will preserve all other dimensions except for lon/lat.

Parameters:
  • ds – The dataset from which to perform timeseries extraction.
  • point – Point to extract, e.g. (lon,lat)
  • var – Variable(s) for which to perform the timeseries selection if none is given, all variables in the dataset will be used.
  • method – Interpolation method to use.
Returns:

A timeseries dataset

cate.ops.tseries_mean(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Extract spatial mean timeseries of the provided variables, return the dataset that in addition to all the information in the given dataset contains also timeseries data for the provided variables, following naming convention ‘var_name1_ts_mean’

If a data variable with more dimensions than time/lat/lon is provided, the data will be reduced by taking the mean of all data values at a single time position resulting in one dimensional timeseries data variable.

Parameters:
  • ds – The dataset from which to perform timeseries extraction.
  • var – Variables for which to perform timeseries extraction
  • calculate_std – Whether to calculate std in addition to mean
  • std_suffix – Std suffix to use for resulting datasets, if std is calculated.
  • monitor (Monitor) – a progress monitor.
Returns:

Dataset with timeseries variables

8.2.11. Misc

cate.ops.normalize(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Normalize the geo- and time-coding upon opening the given dataset w.r.t. to a common (CF-compatible) convention used within Cate. This will maximize the compatibility of a dataset for usage with Cate’s operations.

That is, * variables named “latitude” will be renamed to “lat”; * variables named “longitude” or “long” will be renamed to “lon”;

Then, for equi-rectangular grids, * Remove 2D “lat” and “lon” variables; * Two new 1D coordinate variables “lat” and “lon” will be generated from original 2D forms.

Finally, it will be ensured that a “time” coordinate variable will be of type datetime.

Parameters:ds – The dataset to normalize.
Returns:The normalized dataset, or the original dataset, if it is already “normal”.
cate.ops.sel(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Return a new dataset with each array indexed by tick labels along the specified dimension(s).

This is a wrapper for the xarray.sel() function.

For documentation refer to xarray documentation at http://xarray.pydata.org/en/stable/generated/xarray.Dataset.sel.html#xarray.Dataset.sel

Parameters:
  • ds – The dataset from which to select.
  • point – Optional geographic point given by longitude and latitude
  • time – Optional time
  • indexers – Keyword arguments with names matching dimensions and values given by scalars, slices or arrays of tick labels. For dimensions with multi-index, the indexer may also be a dict-like object with keys matching index level names.
  • method – Method to use for inexact matches: * None: only exact matches * pad / ffill: propagate last valid index value forward * backfill / bfill: propagate next valid index value backward * nearest (default): use nearest valid index value
Returns:

A new Dataset with the same contents as this dataset, except each variable and dimension is indexed by the appropriate indexers. In general, each variable’s data will be a view of the variable’s data in this dataset.

cate.ops.from_dataframe(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Convert the given dataframe to an xarray dataset.

This is a wrapper for the xarray.from_dataframe() function.

For documentation refer to xarray documentation at http://xarray.pydata.org/en/stable/generated/xarray.Dataset.from_dataframe.html#xarray.Dataset.from_dataframe

Parameters:df – Dataframe to convert
Returns:A dataset created from the given dataframe
cate.ops.identity(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Return the given value. This operation can be useful to create constant resources to be used as input for other operations.

Parameters:value – An arbitrary (Python) value.
cate.ops.literal(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Return the given value. This operation can be useful to create constant resources to be used as input for other operations.

Parameters:value – An arbitrary (Python) literal.
cate.ops.pandas_fillna(*args, monitor: cate.util.monitor.Monitor = Monitor.NONE, **kwargs)[source]

Return a new dataframe with NaN values filled according to the given value or method.

This is a wrapper for the pandas.fillna() function For additional keyword arguments and information refer to pandas documentation at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html

Parameters:
  • df – The dataframe to fill
  • value – Value to fill
  • method – Method according to which to fill NaN. ffill/pad will propagate the last valid observation to the next valid observation. backfill/bfill will propagate the next valid observation back to the last valid observation.
  • limit – Maximum number of NaN values to forward/backward fill.
Returns:

A dataframe with nan values filled with the given value or according to the given method.

8.3. Data Stores and Data Sources API

class cate.core.DataStore(ds_id: str, title: str = None, is_local: bool = False)[source]

Represents a data store of data sources.

Parameters:
  • ds_id – Unique data store identifier.
  • title – A human-readable tile.
id

Return the unique identifier for this data store.

is_local

Whether this is a remote data source not requiring any internet connection when its query() method is called or the open_dataset() and make_local() methods on one of its data sources.

query(ds_id: str = None, query_expr: str = None, monitor: cate.util.monitor.Monitor = Monitor.NONE) → typing.Sequence[cate.core.ds.DataSource][source]

Retrieve data sources in this data store using the given constraints.

Return type:

Sequence

Parameters:
  • ds_id (str) – Data source identifier.
  • query_expr (str) – Query expression which may be used if ìd is unknown.
  • monitor (Monitor) – A progress monitor.
Returns:

Sequence of data sources.

title

Return a human-readable tile for this data store.

class cate.core.DataSource[source]

An abstract data source from which datasets can be retrieved.

cache_info

Return information about cached, locally available data sets. The returned dict, if any, is JSON-serializable.

data_store

The data store to which this data source belongs.

id

Data source identifier.

info_string

Return a textual representation of the meta-information about this data source. Useful for CLI / REPL applications.

make_local(local_name: str, local_id: str = None, time_range: typing.Union[typing.Tuple[str, str], typing.Tuple[datetime.datetime, datetime.datetime], typing.Tuple[datetime.date, datetime.date], str] = None, region: typing.Union[<Mock name='mock.Polygon' id='140631459116704'>, typing.List[typing.Tuple[float, float]], str, typing.Tuple[float, float, float, float]] = None, var_names: typing.Union[typing.List[str], str] = None, monitor: cate.util.monitor.Monitor = Monitor.NONE) → typing.Union[_ForwardRef('DataSource'), NoneType][source]

Turns this (likely remote) data source into a local data source given a name and a number of optional constraints.

If this is a remote data source, data will be downloaded and turned into a local data source which will be added to the data store named “local”.

If this is already a local data source, a new local data source will be created by copying required data or data subsets.

The method returns the newly create local data source.

Parameters:
  • local_name (str) – A human readable name for the new local data source.
  • local_id (str) – A unique ID to be used for the new local data source. If not given, a new ID will be generated.
  • time_range – An optional time constraint comprising start and end date. If given, it must be a TimeRangeLike.
  • region – An optional region constraint. If given, it must be a PolygonLike.
  • var_names – Optional names of variables to be included. If given, it must be a VarNamesLike.
  • monitor (Monitor) – a progress monitor.
Returns:

the new local data source

matches(ds_id: str = None, query_expr: str = None) → bool[source]

Test if this data source matches the given id or query_expr. If neither id nor query_expr are given, the method returns True.

Return type:

bool

Parameters:
  • ds_id (str) – A data source identifier.
  • query_expr (str) – A query expression. Currently, only simple search strings are supported.
Returns:

True, if this data sources matches the given id or query_expr.

meta_info

Return meta-information about this data source. The returned dict, if any, is JSON-serializable.

open_dataset(time_range: typing.Union[typing.Tuple[str, str], typing.Tuple[datetime.datetime, datetime.datetime], typing.Tuple[datetime.date, datetime.date], str] = None, region: typing.Union[<Mock name='mock.Polygon' id='140631459116704'>, typing.List[typing.Tuple[float, float]], str, typing.Tuple[float, float, float, float]] = None, var_names: typing.Union[typing.List[str], str] = None, protocol: str = None) → typing.Any[source]

Open a dataset from this data source.

Parameters:
  • time_range – An optional time constraint comprising start and end date. If given, it must be a TimeRangeLike.
  • region – An optional region constraint. If given, it must be a PolygonLike.
  • var_names – Optional names of variables to be included. If given, it must be a VarNamesLike.
  • protocol (str) – Deprecated. Protocol name, if None selected default protocol will be used to access data.
Returns:

A dataset instance or None if no data is available for the given constraints.

schema

The data Schema for any dataset provided by this data source or None if unknown. Currently unused in cate.

status

Return information about data source accessibility

temporal_coverage(monitor: cate.util.monitor.Monitor = Monitor.NONE) → typing.Union[typing.Tuple[datetime.datetime, datetime.datetime], NoneType][source]

The temporal coverage as tuple (start, end) where start and end are UTC datetime instances.

Parameters:monitor (Monitor) – a progress monitor.
Returns:A tuple of (start, end) UTC datetime instances or None if the temporal coverage is unknown.
title

Human-readable data source title. The default implementation tries to retrieve the title from meta_info['title'].

variables_info

Return meta-information about the variables contained in this data source. The returned dict, if any, is JSON-serializable.

8.4. Operation Registration API

class cate.core.Operation(wrapped_op: typing.Callable, op_meta_info=None)[source]

An Operation comprises a wrapped callable (e.g. function, constructor, lambda form) and additional meta-information about the wrapped operation itself and its inputs and outputs.

Parameters:
  • wrapped_op – some callable object that will be wrapped.
  • op_meta_info – operation meta information.
op_meta_info
Returns:Meta-information about the operation, see cate.core.op.OpMetaInfo.
wrapped_op
Returns:The actual operation object which may be any callable.
class cate.core.OpMetaInfo(qualified_name: str, has_monitor: bool = False, header: dict = None, input_names: typing.List[str] = None, inputs: typing.Dict[str, typing.Dict[str, typing.Any]] = None, outputs: typing.Dict[str, typing.Dict[str, typing.Any]] = None)[source]

Represents meta-information about an operation:

  • qualified_name: a an ideally unique, qualified operation name
  • header: dictionary of arbitrary operation attributes
  • input: ordered dictionary of named inputs, each mapping to a dictionary of arbitrary input attributes
  • output: ordered dictionary of named outputs, each mapping to a dictionary of arbitrary output attributes

Warning: OpMetaInfo` objects should be considered immutable. However, the dictionaries mentioned above are returned “as-is”, mostly for performance reasons. Changing entries in these dictionaries directly may cause unwanted side-effects.

Parameters:
  • op_qualified_name – The operation’s qualified name.
  • has_monitor – Whether the operation supports a Monitor keyword argument named monitor.
  • header – Header information dictionary.
  • input_names – Input information dictionary.
  • inputs – Input information dictionary.
  • outputs – Output information dictionary.
MONITOR_INPUT_NAME = 'monitor'

The constant 'monitor', which is the name of an operation input that will receive a Monitor object as value.

RETURN_OUTPUT_NAME = 'return'

The constant 'return', which is the name of a single, unnamed operation output.

has_monitor
Returns:True if the operation supports a Monitor value as additional keyword argument named monitor.
has_named_outputs
Returns:True if the output value of the operation is expected be a dictionary-like mapping of output names to output values.
header
Returns:Operation header attributes.
input_names

The input names in the order they have been declared.

Returns:List of input names.
inputs

Mapping from an input name to a dictionary of properties describing the input.

Returns:Named inputs.
outputs

Mapping from an output name to a dictionary of properties describing the output.

Returns:Named outputs.
qualified_name
Returns:Fully qualified name of the actual operation.
set_default_input_values(input_values: typing.Dict)[source]

If any missing input value in input_values, set value of “default_value” property, if it exists.

Parameters:input_values (Dict) – The dictionary of input values that will be modified.
to_json_dict(data_type_to_json=None) → typing.Dict[str, typing.Any][source]

Return a JSON-serializable dictionary representation of this object. E.g. values of the data_type` property are converted from Python types to their string representation.

Returns:A JSON-serializable dictionary
validate_input_values(input_values: typing.Dict, except_types=None)[source]

Validate given input_values against the operation’s input properties.

Parameters:
  • input_values (Dict) – The dictionary of input values.
  • except_types – A set of types or None. If an input value’s type is in this set, it will not be validated against the various input properties, such as data_type, nullable, value_set, value_range.
Raises:

ValueError – If input_values are invalid w.r.t. to the operation’s input properties.

validate_output_values(output_values: typing.Dict)[source]

Validate given output_values against the operation’s output properties.

Parameters:output_values (Dict) – The dictionary of output values.
Raises:ValueError – If output_values are invalid w.r.t. to the operation’s output properties.
cate.core.op(tags=UNDEFINED, version=UNDEFINED, res_pattern=UNDEFINED, deprecated=UNDEFINED, registry=OP_REGISTRY, **properties)[source]

op is a decorator function that registers a Python function or class in the default operation registry or the one given by registry, if any. Any other keywords arguments in header are added to the operation’s meta-information header. Classes annotated by this decorator must have callable instances.

When a function is registered, an introspection is performed. During this process, initial operation the meta-information header property description is derived from the function’s docstring.

If any output of this operation will have its history information automatically updated, there should be version information found in the operation header. Thus it’s always a good idea to add it to all operations:

@op(version='X.x')
Parameters:
  • tags – An optional list of string tags.
  • version – An optional version string.
  • res_pattern – An optional pattern that will be used to generate the names for data resources that are used to hold a reference to the objects returned by the operation and that are cached in a Cate workspace. Currently, the only pattern variable that is supported and that must be present is {index} which will be replaced by an integer number that is guaranteed to produce a unique resource name.
  • deprecated – An optional boolean or a string. If a string is used, it should explain why the operation has been deprecated and which new operation to use instead. If set to True, the operation’s doc-string should explain the deprecation.
  • registry – The operation registry.
  • properties – Other properties (keyword arguments) that will be added to the meta-information of operation.
cate.core.op_input(input_name: str, default_value=UNDEFINED, units=UNDEFINED, data_type=UNDEFINED, nullable=UNDEFINED, value_set_source=UNDEFINED, value_set=UNDEFINED, value_range=UNDEFINED, deprecated=UNDEFINED, position=UNDEFINED, context=UNDEFINED, registry=OP_REGISTRY, **properties)[source]

op_input is a decorator function that provides meta-information for an operation input identified by input_name. If the decorated function or class is not registered as an operation yet, it is added to the default operation registry or the one given by registry, if any.

When a function is registered, an introspection is performed. During this process, initial operation meta-information input properties are derived for each positional and keyword argument named input_name:

Derived property Source
position The position of a positional argument, e.g. 2 for input z in def f(x, y, z, c=2).
default_value The value of a keyword argument, e.g. 52.3 for input latitude from argument definition latitude:float=52.3
data_type The type annotation type, e.g. float for input latitude from argument definition latitude:float

The derived properties listed above plus any of value_set, value_range, and any key-value pairs in properties are added to the input’s meta-information. A key-value pair in properties will always overwrite the derived properties listed above.

Parameters:
  • input_name (str) – The name of an input.
  • default_value – A default value.
  • units – The geo-physical units of the input value.
  • data_type – The data type of the input values. If not given, the type of any given, non-None default_value is used.
  • nullable – If True, the value of the input may be None. If not given, it will be set to True if the default_value is None.
  • value_set_source – The name of an input, which can be used to generate a dynamic value set.
  • value_set – A sequence of the valid values. Note that all values in this sequence must be compatible with data_type.
  • value_range – A sequence specifying the possible range of valid values.
  • deprecated – An optional boolean or a string. If a string is used, it should explain why the input has been deprecated and which new input to use instead. If set to True, the input’s doc-string should explain the deprecation.
  • position – The zero-based position of an input.
  • context – If True, the value of the operation input will be a dictionary representing the current execution context. For example, when the operation is executed from a workflow, the dictionary will hold at least three entries: workflow provides the current workflow, step is the currently executed step, and value_cache which is a mapping from step identifiers to step outputs. If context is a string, the value of the operation input will be the result of evaluating the string as Python expression with the current execution context as local environment. This means, context may be an expression such as ‘workspace’, ‘workspace.base_dir’, ‘step’, ‘step.id’.
  • properties – Other properties (keyword arguments) that will be added to the meta-information of the named output.
  • registry – Optional operation registry.
cate.core.op_output(output_name: str, data_type=UNDEFINED, deprecated=UNDEFINED, registry=OP_REGISTRY, **properties)[source]

op_output is a decorator function that provides meta-information for an operation output identified by output_name. If the decorated function or class is not registered as an operation yet, it is added to the default operation registry or the one given by registry, if any.

If your function does not return multiple named outputs, use the op_return() decorator function. Note that:

@op_return(...)
def my_func(...):
    ...

if equivalent to:

@op_output('return', ...)
def my_func(...):
    ...

To automatically add information about cate, its version, this operation and its inputs, to this output, set ‘add_history’ to True:

@op_output('name', add_history=True)

Note that the operation should have version information added to it when add_history is True:

@op(version='X.x')
Parameters:
  • output_name (str) – The name of the output.
  • data_type – The data type of the output value.
  • deprecated – An optional boolean or a string. If a string is used, it should explain why the output has been deprecated and which new output to use instead. If set to True, the output’s doc-string should explain the deprecation.
  • properties – Other properties (keyword arguments) that will be added to the meta-information of the named output.
  • registry – Optional operation registry.
cate.core.op_return(data_type=UNDEFINED, registry=OP_REGISTRY, **properties)[source]

op_return is a decorator function that provides meta-information for a single, anonymous operation return value (whose output name is "return"). If the decorated function or class is not registered as an operation yet, it is added to the default operation registry or the one given by registry, if any. Any other keywords arguments in properties are added to the output’s meta-information.

When a function is registered, an introspection is performed. During this process, initial operation meta-information output properties are derived from the function’s return type annotation, that is data_type will be e.g. float if a function is annotated as def f(x, y) -> float: ....

The derived data_type property and any key-value pairs in properties are added to the output’s meta-information. A key-value pair in properties will always overwrite a derived data_type.

If your function returns multiple named outputs, use the op_output() decorator function. Note that:

@op_return(...)
def my_func(...):
    ...

if equivalent to:

@op_output('return', ...)
def my_func(...):
    ...

To automatically add information about cate, its version, this operation and its inputs, to this output, set ‘add_history’ to True:

@op_return(add_history=True)

Note that the operation should have version information added to it when add_history is True:

@op(version='X.x')
Parameters:
  • data_type – The data type of the return value.
  • properties – Other properties (keyword arguments) that will be added to the meta-information of the return value.
  • registry – The operation registry.

8.5. Workflow API

class cate.core.Workflow(op_meta_info: cate.util.opmetainf.OpMetaInfo, node_id: str = None)[source]

A workflow of (connected) steps.

Parameters:
  • op_meta_info – Meta-information object of type OpMetaInfo.
  • node_id – A node ID. If None, an ID will be generated.
find_steps_to_compute(step_id: str) → typing.List[_ForwardRef('Step')][source]

Compute the list of steps required to compute the output of the step with the given step_id. The order of the returned list is its execution order, with the step given by step_id is the last one.

Return type:List
Parameters:step_id (str) – The step to be computed last and whose output value is requested.
Returns:a list of steps, which is never empty
invoke_steps(steps: typing.List[_ForwardRef('Step')], context: typing.Dict = None, monitor_label: str = None, monitor=Monitor.NONE) → None[source]

Invoke just the given steps.

Parameters:
  • steps (List) – Selected steps of this workflow.
  • context (Dict) – An optional execution context
  • monitor_label (str) – An optional label for the progress monitor.
  • monitor – The progress monitor.
classmethod load(file_path_or_fp: typing.Union[str, io.IOBase], registry=OP_REGISTRY) → cate.core.workflow.Workflow[source]

Load a workflow from a file or file pointer. The format is expected to be “Workflow JSON”.

Parameters:
  • file_path_or_fp – file path or file pointer
  • registry – Operation registry
Returns:

a workflow

remove_orphaned_sources(removed_node: cate.core.workflow.Node)[source]

Remove all input/output ports, whose source is still referring to removed_node. :type removed_node: Node :param removed_node: A removed node.

classmethod sort_steps(steps: typing.List[_ForwardRef('Step')])[source]

Sorts the list of workflow steps in the order they they can be executed.

sorted_steps

The workflow steps in the order they they can be executed.

steps

The workflow steps in the order they where added.

store(file_path_or_fp: typing.Union[str, io.IOBase]) → None[source]

Store a workflow to a file or file pointer. The format is “Workflow JSON”.

Parameters:file_path_or_fp – file path or file pointer
to_json_dict() → dict[source]

Return a JSON-serializable dictionary representation of this object.

Returns:A JSON-serializable dictionary
update_sources() → None[source]

Resolve unresolved source references in inputs and outputs.

update_sources_node_id(changed_node: cate.core.workflow.Node, old_id: str)[source]

Update the source references of input and output ports from old_id to new_id.

class cate.core.OpStep(operation, node_id: str = None, registry=OP_REGISTRY)[source]

An OpStep is a step node that invokes a registered operation of type Operation.

Parameters:
  • operation – A fully qualified operation name or operation object such as a class or callable.
  • registry – An operation registry to be used to lookup the operation, if given by name.
  • node_id – A node ID. If None, a unique ID will be generated.
class cate.core.NoOpStep(inputs: dict = None, outputs: dict = None, node_id: str = None)[source]

A NoOpStep “performs” a no-op, which basically means, it does nothing. However, it might still be useful to define step that or duplicates or renames output values by connecting its own output ports with any of its own input ports. In other cases it might be useful to have a NoOpStep as a placeholder or blackbox for some other real operation that will be put into place at a later point in time.

Parameters:
  • inputs – input name to input properties mapping.
  • outputs – output name to output properties mapping.
  • node_id – A node ID. If None, an ID will be generated.
class cate.core.ExpressionStep(expression: str, inputs=None, outputs=None, node_id=None)[source]

An ExpressionStep is a step node that computes its output from a simple (Python) expression string.

Parameters:
  • expression – A simple (Python) expression string.
  • inputs – input name to input properties mapping.
  • outputs – output name to output properties mapping.
  • node_id – A node ID. If None, an ID will be generated.
class cate.core.SubProcessStep(command: str, run_python: bool = False, env: typing.Dict[str, str] = None, cwd: str = None, shell: bool = False, started_re: str = None, progress_re: str = None, done_re: str = None, inputs: typing.Dict[str, typing.Dict] = None, outputs: typing.Dict[str, typing.Dict] = None, node_id: str = None)[source]

A SubProcessStep is a step node that computes its output by a sub-process created from the given program.

Parameters:
  • command – A pattern that will be interpolated by input values to obtain the actual command (program with arguments) to be executed. May contain “{input_name}” fields which will be replaced by the actual input value converted to text. input_name must refer to a valid operation input name in op_meta_info.input or it must be the value of either the “write_to” or “read_from” property of another input’s property map.
  • run_python – If True, command_line_pattern refers to a Python script which will be executed with the Python interpreter that Cate uses.
  • cwd – Current working directory to run the command line in.
  • env – Environment variables passed to the shell that executes the command line.
  • shell – Whether to use the shell as the program to execute.
  • started_re – A regex that must match a text line from the process’ stdout in order to signal the start of progress monitoring. The regex must provide the group names “label” or “total_work” or both, e.g. “(?P<label>w+)” or “(?P<total_work>d+)”
  • progress_re – A regex that must match a text line from the process’ stdout in order to signal process. The regex must provide group names “work” or “msg” or both, e.g. “(?P<msg>w+)” or “(?P<work>d+)”
  • done_re – A regex that must match a text line from the process’ stdout in order to signal the end of progress monitoring.
  • inputs – input name to input properties mapping.
  • outputs – output name to output properties mapping.
  • node_id – A node ID. If None, an ID will be generated.
class cate.core.WorkflowStep(workflow: cate.core.workflow.Workflow, resource: str, node_id: str = None)[source]

A WorkflowStep is a step node that invokes an externally stored Workflow.

Parameters:
  • workflow – The referenced workflow.
  • resource – A resource (e.g. file path, URL) from which the workflow was loaded.
  • node_id – A node ID. If None, an ID will be generated.
resource

The workflow’s resource path (file path, URL).

workflow

The workflow.

class cate.core.Step(op_meta_info: cate.util.opmetainf.OpMetaInfo, node_id: str = None)[source]

A step is an inner node of a workflow.

Parameters:node_id – A node ID. If None, a name will be generated.
enhance_json_dict(node_dict: collections.OrderedDict)[source]

Enhance the given JSON-compatible node_dict by step specific elements.

classmethod new_step_from_json_dict(json_dict, registry=OP_REGISTRY) → typing.Union[_ForwardRef('Step'), NoneType][source]

Create a new step node instance from the given json_dict

parent_node

The node’s ID.

persistent

Return whether this step is persistent. That is, if the current workspace is saved, the result(s) of a persistent step may be written to a “resource” file in the workspace directory using this step’s ID as filename. The file format and filename extension will be chosen according to each result’s data type. On next attempt to execute the step again, e.g. if a workspace is opened, persistent steps may read the “resource” file to produce the result rather than performing an expensive re-computation. :return: True, if so, False otherwise

to_json_dict()[source]

Return a JSON-serializable dictionary representation of this object.

Returns:A JSON-serializable dictionary
class cate.core.Node(op_meta_info: cate.util.opmetainf.OpMetaInfo, node_id: str = None)[source]

Base class for all nodes including parent nodes (e.g. Workflow) and child nodes (e.g. Step).

All nodes have inputs and outputs, and can be invoked to perform some operation.

Inputs and outputs are exposed as attributes of the input and output properties and are both of type NodePort.

Parameters:node_id – A node ID. If None, a name will be generated.
call(context: typing.Dict = None, monitor=Monitor.NONE, input_values: typing.Dict = None)[source]

Calls this workflow with given input_values and returns the result.

The method does the following: 1. Set default_value where input values are missing in input_values 2. Validate the input_values using this workflows’s meta-info 3. Set this workflow’s input port values 4. Invoke this workflow with given context and monitor 5. Get this workflow’s output port values. Named outputs will be returned as dictionary.

Parameters:
  • context (Dict) – An optional execution context. It will be used to automatically set the value of any node input which has a “context” property set to either True or a context expression string.
  • monitor – An optional progress monitor.
  • input_values (Dict) – The input values.
Returns:

The output values.

collect_predecessors(predecessors: typing.List[_ForwardRef('Node')], excludes: typing.List[_ForwardRef('Node')] = None)[source]

Collect this node (self) and preceding nodes in predecessors.

find_node(node_id) → typing.Union[_ForwardRef('Node'), NoneType][source]

Find a (child) node with the given node_id.

find_port(name) → typing.Union[_ForwardRef('NodePort'), NoneType][source]

Find port with given name. Output ports are searched first, then input ports. :param name: The port name :return: The port, or None if it couldn’t be found.

id

The node’s identifier.

inputs

The node’s inputs.

invoke(context: typing.Dict = None, monitor: cate.util.monitor.Monitor = Monitor.NONE) → None[source]

Invoke this node’s underlying operation with input values from input. Output values in output will be set from the underlying operation’s return value(s).

Parameters:
  • context (Dict) – An optional execution context.
  • monitor (Monitor) – An optional progress monitor.
max_distance_to(other_node: cate.core.workflow.Node) → int[source]

If other_node is a source of this node, then return the number of connections from this node to node. If it is a direct source return 1, if it is a source of the source of this node return 2, etc. If other_node is this node, return 0. If other_node is not a source of this node, return -1.

Return type:int
Parameters:other_node – The other node.
Returns:The distance to other_node
op_meta_info

The node’s operation meta-information.

outputs

The node’s outputs.

parent_node

The node’s parent node or None if this node has no parent.

requires(other_node: cate.core.workflow.Node) → bool[source]

Does this node require other_node for its computation? Is other_node a source of this node?

Return type:bool
Parameters:other_node – The other node.
Returns:True if this node is a target of other_node
root_node

The root_node node.

set_id(node_id: str) → None[source]

Set the node’s identifier.

Parameters:node_id (str) – The new node identifier. Must be unique within a workflow.
to_json_dict()[source]

Return a JSON-serializable dictionary representation of this object.

Returns:A JSON-serializable dictionary
update_sources()[source]

Resolve unresolved source references in inputs and outputs.

update_sources_node_id(changed_node: cate.core.workflow.Node, old_id: str)[source]

Update the source references of input and output ports from old_id to new_id.

class cate.core.NodePort(node: cate.core.workflow.Node, name: str)[source]

Represents a named input or output port of a Node.

to_json(force_dict=False)[source]

Return a JSON-serializable dictionary representation of this object.

Returns:A JSON-serializable dictionary
update_source()[source]

Resolve this node port’s source reference, if any.

If the source reference has the form node-id.port-name then node-id must be the ID of the workflow or any contained step and port-name must be a name either of one of its input or output ports.

If the source reference has the form .port-name then node-id will refer to either the current step or any of its parent nodes that contains an input or output named port-name.

If the source reference has the form node-id then node-id must be the ID of the workflow or any contained step which has exactly one output.

If node-id refers to a workflow, then port-name is resolved first against the workflow’s inputs followed by its outputs. If node-id refers to a workflow’s step, then port-name is resolved first against the step’s outputs followed by its inputs.

Raises:ValueError – if the source reference is invalid.
update_source_node_id(node: cate.core.workflow.Node, old_node_id: str) → None[source]

A node identifier has changed so we update the source references and clear the source of input and output ports from old_node_id to node.id.

Parameters:
  • node (Node) – The node whose identifier changed.
  • old_node_id (str) – The former node identifier.

8.6. Task Monitoring API

class cate.core.Monitor[source]

A monitor is used to both observe and control a running task.

The Monitor class is an abstract base class for concrete monitors. Derived classes must implement the following three abstract methods: start(), progress(), and done(). Derived classes must implement also the following two abstract methods, if they want cancellation support: cancel() and is_cancelled().

Pass Monitor.NONE to functions that expect a monitor instead of passing None.

Given here is an example of how progress monitors should be used by functions::

def long_running_task(a, b, c, monitor):
    with monitor.starting('doing a long running task', total_work=100)
        # do 30% of the work here
        monitor.progress(work=30)
        # do 70% of the work here
        monitor.progress(work=70)

If a function makes calls to other functions that also support a monitor, a child-monitor is used::

def long_running_task(a, b, c, monitor):
    with monitor.starting('doing a long running task', total_work=100)
        # let other_task do 30% of the work
        other_task(a, b, c, monitor=monitor.child(work=30))
        # let other_task do 70% of the work
        other_task(a, b, c, monitor=monitor.child(work=70))
NONE = Monitor.NONE

A valid monitor that effectively does nothing. Use Monitor.NONE it instead of passing None to functions and methods that expect an argument of type Monitor.

cancel()[source]

Request the task to be cancelled. This method will be usually called from the code that created the monitor, not by users of the monitor. For example, a GUI could create the monitor due to an invocation of a long-running task, and then the user wishes to cancel that task. The default implementation does nothing. Override to implement something useful.

check_for_cancellation()[source]

Checks if the monitor has been cancelled and raises a Cancellation in that case.

child(work: float = 1)[source]

Return a child monitor for the given partial amount of work.

Parameters:work (float) – The partial amount of work.
Returns:a sub-monitor
done()[source]

Call to signal that a task has been done.

is_cancelled() → bool[source]

Check if there is an external request to cancel the current task observed by this monitor.

Users of a monitor shall frequently call this method and check its return value. If cancellation is requested, they should politely exit the current processing in a proper way, e.g. by cleaning up allocated resources. The default implementation returns False. Subclasses shall override this method to return True if a task cancellation request was detected.

Returns:True if task cancellation was requested externally. The default implementation returns False.
observing(label: str)[source]

A context manager for easier use of progress monitors. Observes a dask task and reports back to the monitor.

Parameters:label (str) – Passed to the monitor’s start method
Returns:
progress(work: float = None, msg: str = None)[source]

Call to signal that a task has mode some progress.

Parameters:
  • work (float) – The incremental amount of work.
  • msg (str) – A detail message.
start(label: str, total_work: float = None)[source]

Call to signal that a task has started.

Note that label and total_work are not passed to __init__, because they are usually not known at constructions time. It is the responsibility of the task to derive the appropriate values for these.

Parameters:
  • label (str) – A task label
  • total_work (float) – The total amount of work
starting(label: str, total_work: float = None)[source]

A context manager for easier use of progress monitors. Calls the monitor’s start method with label and total_work. Will then take care of calling Monitor.done().

Parameters:
  • label (str) – Passed to the monitor’s start method
  • total_work (float) – Passed to the monitor’s start method
Returns:

class cate.core.ConsoleMonitor(stay_in_line=False, progress_bar_size=1)[source]

A simple console monitor that directly writes to sys.stdout and detects user cancellation requests via CTRL+C.

Parameters:
  • stay_in_line – If True, the text written out will stay in the same line.
  • progress_bar_size – If > 1, a progress monitor of max. progress_bar_size characters will be written to the console.