Data pipeline plugins¶
Driven by topic hierarchies, wis2box is a plugin architecture orchestrating all the required components of a WIS2 Node. wis2box also provides a data pipeline plugin architecture which allows for users to define a plugin based on a topic hierarchy to publish incoming data (see Data mappings for more information).
See also
See also
Default pipeline plugins¶
wis2box provides a number of data pipeline plugins by default, which users can be used “out of the box”. The list below describes each plugin and provides an example data mappings configuration.
wis2box.data.csv2bufr.ObservationDataCSV2BUFR
¶
This plugin converts CSV observation data into BUFR using csv2bufr
. A csv2bufr template
can be configured to process the data accordingly. In addition, file-pattern
can be used
to filter on incoming data based on a regular expression. Consult the csv2bufr documentation
for more information on configuration and templating.
A typical csv2bufr plugin workflow definition would by defined as follows:
csv:
- plugin: wis2box.data.csv2bufr.ObservationDataCSV2BUFR
template: /data/wis2box/synop_bufr.json # locally created csv2bufr mapping (located in $WIS2BOX_HOST_DATADIR)
notify: true # trigger GeoJSON publishing for API and UI
file-pattern: '^.*\.csv$'
wis2box.data.bufr4.ObservationDataBUFR2GeoJSON
¶
This plugin is typically used for wis2box API publication, and converts BUFR
observation data into GeoJSON using bufr2geojson
. A file-pattern
can be used to filter on incoming data based on a regular expression. Consult the bufr2geojson documentation
for more information on configuration and templating.
A typical bufr2geojson plugin workflow definition would be defined as follows:
bufr4:
- plugin: wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON
file-pattern: '^.*\.bufr4$'
wis2box.data.geojson.ObservationDataGeoJSON
¶
This plugin is for the purposes of publishing GeoJSON data to the API.
wis2box.data.synop2bufr.SYNOP2BUFR
¶
This plugin converts SYNOP ASCII data into BUFR using synop2bufr
. A file-pattern
can be used
to filter on incoming data based on a regular expression.
Note that the regular expression must contain two groups (for 4-digit year and 2-digit month), which are used as part of synop2bufr processing. Consult the synop2bufr documentation for more information.
A typical synop2bufr plugin workflow definition would be defined as follows:
txt:
- plugin: wis2box.data.synop2bufr.ObservationDataSYNOP2BUFR
notify: true # trigger GeoJSON publishing for API and UI
file-pattern: '^station_123_(\d{4})(\d{2}).*.txt$' # example: station_123_202305_112342.txt (where 2023 is the year and 05 is the month)
wis2box.data.bufr4.ObservationDataBUFR
¶
This plugin takes an incoming BUFR4 data file and separates it into individual BUFR bulletins if there is more than one in a file. Those bulletins are then further divided into individual subsets for publication on WIS2. As part of the process, files are quality checked for valid WIGOS Station Identifiers and location information. Where these are missing, the information is either infilled using the wis2box station list or the subset discarded if no match is found. Missing temporal information results in the data being discarded.
For processing efficiency, and to allow for concurrent processing, it is recommended that the input data to this plugin is already separated into one BUFR message per file and one subset per message.
A typical BUFR4 plugin workflow definition would be defined as follows:
bin:
- plugin: wis2box.data.bufr4.ObservationDataBUFR
notify: true # trigger GeoJSON publishing for API and UI
file-pattern: '^.*\.bin$'
wis2box.data.universal.UniversalData
¶
This plugin can be used to publish any data, without any transformation.
The plugin takes any incoming data, copies it to the /data
endpoint configured in wis2box, providing minimal information in the WIS2 Notification:
properties.datatime
in the WIS2 notification is parsed asmatch.group(1)
of the regular expression defined in the plugin configuration. If the group cannot be parsed bydateutil.parser
, an error will be raised and the data will not be publishedgeometry
in the WIS2 Notification will be null
For example, to publish GRIB2 data matching the file-pattern ^.*_(\d{8})\d{2}.*\.grib2$
the following configuration could be used:
grib2:
- plugin: wis2box.data.universal.UniversalData
notify: true
buckets:
- ${WIS2BOX_STORAGE_INCOMING}
file-pattern: '^.*_(\d{8})\d{2}.*\.grib2$' # example: Z_NAFP_C_BABJ_20231207000000_P_CMA-GEPS-GLB-036.grib2 (where 20231207000000 will be used as the datetime)
See Data mappings for a full example data mapping configuration.