wis2box 0.5.1 documentation

Author

World Meteorological Organization (WMO)

Contact

https://github.com/wmo-im/wis2box

Release

0.5.1

Date

2022-12-08

Overview

wis2box is a Python reference implementation of a WMO WIS 2.0 node. The project provides a plug and play toolset to ingest, process, and publish weather/climate/water data using standards-based approaches in alignment with the WIS 2.0 principles. In addition, wis2box also provides access to all data in the WIS 2.0 network, from other wis2box instances and global centres.

wis2box is designed to have a low barrier to entry for data providers, providing enabling infrastructure and services for data discovery, access, and visualization.

wis2box features

Features

  • WIS 2.0 compliant: easily register your wis2box to WIS 2.0 infrastructure, conformant to WMO data and metadata standards

  • event driven or interactive data ingest/process/publishing pipeline

  • visualization of stations/data on interactive maps

  • discovery metadata management and publishing

  • download/access of data from the WIS 2.0 network to your local environment

  • standards-based data services and access mechanisms:

  • robust and extensible plugin framework. Write your own data processing engines and integrate seamlessly into wis2box!

  • free and open source (FOSS)

  • containerized: use of Docker, enabling easy deployment to cloud or on-premises infrastructure

WIS 2.0

The WMO Information System is a coordinated global infrastructure responsible for telecommunications and data management functions and is owned and operated by WMO Members.

WIS provides an integrated approach suitable for all WMO Programmes to meet the requirements for routine collection and automated dissemination of observed data and products, as well as data discovery, access, and retrieval services for weather, climate, water, and related data produced by centres and Member countries in the framework of any WMO Programme. It is capable of exchanging large data volumes, such as new ground and satellite-based systems, finer resolutions in numerical weather prediction, and hydrological models and their applications. These data and products must be available to National Hydrological and Meteorological Services (NHMS), but also national disaster authorities for more timely alerts where and when needed.

WIS is a vital data communications backbone for integrating the diverse real-time and non-real-time high priority data sets, regardless of location.

Further documentation on WIS 2.0 can be found at the following links:

How wis2box works

wis2box is implemented in the spirit of the Twelve-Factor App methodology.

wis2box is a Docker and Python-based platform with the capabilities for centres to publish their data holdings to the WMO Information System with a plug and play capability supporting data publishing, discovery and access.

High level system context

The following diagram provides a high level overview of the main functions of wis2box:

how wis2box works: System context

Core wis2box functionality includes the ability to:

  • integrate your existing data processing pipeline

  • cache station metadata from the OSCAR/Surface station metadata management tool

  • process and transform your weather/climate/water data into official WMO data formats

  • create and publish discovery metadata of your datasets

  • provide your data via OGC and PubSub standards mechanisms to your data, enabling easy access for web applications, desktop GIS tools, mobile applications

  • connect your wis2box to the WIS 2.0 network

  • make your data and services available to market search engines

  • subscribe to and download weather/climate/water data from the WIS 2.0 network

Docker Compose

wis2box is built as Docker Compose application, allowing for easy install and container management.

Container workflow

Let’s dive a little deeper. The following diagram provides a view of all wis2box containers:

how wis2box works: Containers

Container functionality can be described as follows:

  • Storage: core data and metadata persistence, the initial data entry point of wis2box. Data pipelines and workflow are triggered from here

  • Internal Message Broker: internal message bus

  • Public Message Broker: public facing broker. Provides data and metadata notifications

  • Data Management: the epicentre of wis2box. Provides core wis2box administration and data/workflow/publishing utilities

  • API Application: OGC APIs providing geospatial web services

  • Web Application: user interface

Technology

wis2box is built on free and open source (FOSS) technology.

Container

Function

Technology

Standards

Storage

data and metadata storage

MinIO Elasticsearch

S3

Internal Message Broker

PubSub

mosquitto

MQTT

Public Message Broker

PubSub

mosquitto

MQTT

Data Management

data processing and publishing

ecCodes csv2bufr bufr2geojson OWSLib pygeometa pyoscar

WCMP WMDR

API Application

data discovery and access

pygeoapi

OGC API

Web Application

data discovery and visualization

Vue.js Leaflet

OGC API

Installation

wis2box leverages Docker for easy installation across operating systems and environments.

Requirements and dependencies

Dependencies are installed as containers in the deployment of wis2box. This is true for the wis2box software itself, which runs as a container orchestrating the necessary data management workflows of a node in the WIS 2.0 network.

wis2box requires the following prior to installation:

Requirement

Version

Python

3.8

Docker Engine

20.10.14

Docker Compose

2.4.1

If these are already installed, you can skip to installing wis2box.

Successful installation can be confirmed by inspecting the versions on your system.

docker version
docker compose version
python3 -V

Note

Docker may require post-install configuration. Linux users may need to follow post install steps to grant docker privileges. Users in corporate settings my need to configure Docker’s HTTP/HTTPS proxy.

Installing wis2box

Once Python and Docker are installed, the wis2box software needs to be installed.

ZIP archive

wis2box can be installed from a ZIP archive of a the latest branch or a wis2box release.

# curl, wget or download from your web browser
curl https://github.com/wmo-im/wis2box/archive/refs/heads/main.zip -L -O -J
unzip wis2box-main.zip
cd wis2box-main

GitHub

wis2box can also be installed using the git CLI.

# clone wis2box GitHub repository
git clone https://github.com/wmo-im/wis2box.git
cd wis2box

Summary

Congratulations! Whichever of the abovementioned methods you chose, you have successfully installed wis2box onto your system. From here, you can get started with test data by following the Quickstart, or continue on to Configuration.

Quickstart

Requirements and dependencies

The quickstart assumes wis2box and its dependencies have been installed. If this is not true, please follow the Installation steps first.

Successful installation can be confirmed by inspecting the versions on your system.

docker version
docker-compose version
python3 -V

The quickstart deploys wis2box with test data. It is the minimal runtime configuration profile - as used in wis2box Github CI/CD.

Note

For information on how to quickly get started with your own data out of the box, proceed to Running. For more information on deployment, see Administration and Configuration.

wis2box passes environment variables from dev.env to its container on startup. The test enviroment file is provided in tests/test.env. Copy this file to dev.env in your working directory.

cp tests/test.env dev.env

Build and update wis2box

python3 wis2box-ctl.py build
python3 wis2box-ctl.py update

Start wis2box and login to the wis2box container

python3 wis2box-ctl.py start
python3 wis2box-ctl.py login

Once logged in, verify the enviroment

wis2box environment show

Publish test discovery metadata

wis2box metadata discovery publish $WIS2BOX_DATADIR/metadata/discovery/mwi-surface-weather-observations.yml
wis2box metadata discovery publish $WIS2BOX_DATADIR/metadata/discovery/ita-surface-weather-observations.yml
wis2box metadata discovery publish $WIS2BOX_DATADIR/metadata/discovery/dza-surface-weather-observations.yml

Setup observation collections from discovery metadata

wis2box data add-collection $WIS2BOX_DATADIR/metadata/discovery/mwi-surface-weather-observations.yml
wis2box data add-collection $WIS2BOX_DATADIR/metadata/discovery/ita-surface-weather-observations.yml
wis2box data add-collection $WIS2BOX_DATADIR/metadata/discovery/dza-surface-weather-observations.yml

Ingest data, using data ingest command to push the wis2box-incoming bucket

wis2box data ingest --topic-hierarchy mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP --path $WIS2BOX_DATADIR/observations/malawi
wis2box data ingest --topic-hierarchy ita.roma_met_centre.data.core.weather.surface-based-observations.SYNOP --path $WIS2BOX_DATADIR/observations/italy
wis2box data ingest --topic-hierarchy dza.alger_met_centre.data.core.weather.surface-based-observations.SYNOP --path $WIS2BOX_DATADIR/observations/algeria

Cache and publish stations

wis2box metadata station sync $WIS2BOX_DATADIR/metadata/station/station_list.csv

Logout of wis2box container:

exit

From here, you can run python3 wis2box-ctl.py status to confirm that containers are running.

To explore your wis2box installation and services, visit http://localhost:8999 in your web browser.

Configuration

Once you have installed wis2box, it is time to setup the configuration. wis2box setup is based on a simple configuration that can be adjusted depending the user’s needs and deployment environment.

Environment variables

wis2box configuration is driven primarily by a small set of environment variables. The runtime configuration is defined in the Env format in a plain text file named dev.env and docker/default.env.

Any values set in dev.env override the default environment variables in docker/default.env. For further / specialized configuration, see the sections below.

WIS2BOX_HOST_DATADIR

The minimum required setting in dev.env is the WIS2BOX_HOST_DATADIR environment variable. Setting this value is required to map the wis2box data directory from the host system to the containers.

It is recommended to set this value to an absolute path on your system.

Sections

Note

A reference configuration can always be found in the wis2box GitHub repository. The Quickstart uses a variant of wis2box.env with mappings to the test data, as an example. For complex installations, it is recommended to start configuring wis2box by copying the example wis2box.env file and modifying accordingly.

wis2box environment variables can be categorized via the following core sections:

  • Storage: MinIO configuration

  • API: API configuration for provisioning the OGC API capabilities

  • Logging: logging configuaration for wis2box

  • PubSub: PubSub options

  • Other: other miscellaneous options

Note

Configuration directives and reference are described below via annotated examples. Changes in configuration require a restart of wis2box to take effect. See the Administration section for information on managing wis2box.

Storage

wis2box currently supports S3 compatible storage (e.g. MinIO, Amazon S3). Additional storage types are planned for future releases.

The following environment variables can be used to configure WIS2BOX_STORAGE.

Note

When using wis2box in production and using the default MinIO-container, please specify a unique WIS2BOX_STORAGE_PASSWORD

WIS2BOX_STORAGE_TYPE=S3
WIS2BOX_STORAGE_SOURCE=http://minio:9000
WIS2BOX_STORAGE_USERNAME=minio  # username for the storage-layer
WIS2BOX_STORAGE_PASSWORD=minio123  # password for the storage-layer
WIS2BOX_STORAGE_INCOMING=wis2box-incoming  # name of the storage-bucket/folder for incoming files
WIS2BOX_STORAGE_PUBLIC=wis2box-public  # name of the storage-bucket/folder for public files
WIS2BOX_STORAGE_ARCHIVE=wis2box-archive  # name of the storage-bucket/folder for archived data
WIS2BOX_STORAGE_CONFIG=wis2box-config  # name of the storage-bucket/folder for configuration files
WIS2BOX_STORAGE_DATA_RETENTION_DAYS=7  # number of days to keep files in incoming and public

MinIO

wis2box uses MinIO as the default S3 storage capability.

When overriding the default storage environment variables, please redefine the MINIO* environment variables to match your configuration.

MINIO_ROOT_USER=${WIS2BOX_STORAGE_USERNAME}
MINIO_ROOT_PASSWORD=${WIS2BOX_STORAGE_PASSWORD}
MINIO_NOTIFY_MQTT_USERNAME_WIS2BOX=${WIS2BOX_BROKER_USERNAME}
MINIO_NOTIFY_MQTT_PASSWORD_WIS2BOX=${WIS2BOX_BROKER_PASSWORD}
MINIO_NOTIFY_MQTT_BROKER_WIS2BOX=tcp://${WIS2BOX_BROKER_HOST}:${WIS2BOX_BROKER_PORT}

API

API configurations drive control of the OGC API setup.

WIS2BOX_API_TYPE=pygeoapi  # server tpye
WIS2BOX_API_URL=http://localhost:8999/pygeoapi  # public landing page endpoint
WIS2BOX_API_BACKEND_TYPE=Elasticsearch  # backend provider type
WIS2BOX_API_BACKEND_URL=http://elasticsearch:9200  # internal backend connection URL
WIS2BOX_DOCKER_API_URL  # container name of API container (for internal communications/workflow)

Logging

The logging directives control logging level/severity and output.

WIS2BOX_LOGGING_LOGLEVEL=ERROR  # the logging level (see https://docs.python.org/3/library/logging.html#logging-levels)
WIS2BOX_LOGGING_LOGFILE=stdout  # the full file path to the logfile or ``stdout`` to display on console

PubSub

PubSub configuration provides connectivity information for the PubSub broker.

WIS2BOX_BROKER_HOST=mosquitto  # the hostname of the internal broker
WIS2BOX_BROKER_PORT=1883  # the port of the internal broker
WIS2BOX_BROKER_USERNAME=wis2box  # the username of the internal broker
WIS2BOX_BROKER_PASSWORD=wis2box  # the password of the internal broker
WIS2BOX_BROKER_PUBLIC=mqtt://foo:bar@localhost:1883  # RFC 1738 URL of public broker endpoint

Web application

Web application configuration provides the ability to customize web components.

WIS2BOX_BASEMAP_URL="https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png"  # URL of map tile server to use
WIS2BOX_BASEMAP_ATTRIBUTION="<a href="https://osm.org/copyright">OpenStreetMap</a> contributors"  # attribution of map tile server

Other

Additional directives provide various configurationscontrol of configuration options for the deployment of wis2box.

WIS2BOX_OSCAR_API_TOKEN=some_token  # OSCAR/Surface API token for OSCAR API interaction
WIS2BOX_URL=http://localhost:8999/  # public wis2box url
WIS2BOX_AUTH_STORE=http://wis2box-auth # wis2box auth service location

Note

To access internal containers, URL configurations should point to the named containers as specified in docker-compose.yml.

A full configuration example can be found below:

# please define a data-directory on your host machine
# this will map to /data/wis2box on the wis2box-container
WIS2BOX_HOST_DATADIR=/path/to/local/data/directory

# Optional
# Environment variable overrides
WIS2BOX_LOGGING_LOGLEVEL=WARNING
WIS2BOX_DATA_RETENTION_DAYS=30

WIS2BOX_DATADIR_DATA_MAPPINGS=/data/wis2box/data-mappings.yml
# data paths and retention
WIS2BOX_DATADIR=/data/wis2box

# API
WIS2BOX_API_TYPE=pygeoapi
WIS2BOX_API_URL=http://localhost:8999/oapi
WIS2BOX_API_BACKEND_TYPE=Elasticsearch
WIS2BOX_API_BACKEND_URL=http://elasticsearch:9200
WIS2BOX_DOCKER_API_URL=http://wis2box-api:80/oapi

# logging
WIS2BOX_LOGGING_LOGLEVEL=ERROR
WIS2BOX_LOGGING_LOGFILE=stdout

# PubSub
WIS2BOX_BROKER_USERNAME=wis2box
WIS2BOX_BROKER_PASSWORD=wis2box
WIS2BOX_BROKER_HOST=mosquitto
WIS2BOX_BROKER_PORT=1883

WIS2BOX_BROKER_PUBLIC=mqtt://wis2box:wis2box@mosquitto:1883

# Web application
WIS2BOX_BASEMAP_URL=https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png
WIS2BOX_BASEMAP_ATTRIBUTION=<a href="https://osm.org/copyright">OpenStreetMap</a> contributors

# other
WIS2BOX_OSCAR_API_TOKEN=some_token
WIS2BOX_URL=http://localhost:8999

# mappings of topic hierarchy to wis2box data plugins
# optionally override default mappings from wis2box data plugins
# WIS2BOX_DATADIR_DATA_MAPPINGS=${PWD}/data-mappings.yml

# access control
WIS2BOX_AUTH_URL=http://wis2box-auth

# storage
WIS2BOX_STORAGE_TYPE=S3
WIS2BOX_STORAGE_SOURCE=http://minio:9000
WIS2BOX_STORAGE_USERNAME=minio
WIS2BOX_STORAGE_PASSWORD=minio123
WIS2BOX_STORAGE_INCOMING=wis2box-incoming
WIS2BOX_STORAGE_PUBLIC=wis2box-public
WIS2BOX_STORAGE_ARCHIVE=wis2box-archive
WIS2BOX_STORAGE_CONFIG=wis2box-config
WIS2BOX_STORAGE_DATA_RETENTION_DAYS=7

# you should be okay from here

# MinIO
MINIO_ROOT_USER=${WIS2BOX_STORAGE_USERNAME}
MINIO_ROOT_PASSWORD=${WIS2BOX_STORAGE_PASSWORD}
MINIO_PROMETHEUS_AUTH_TYPE=public
MINIO_NOTIFY_MQTT_ENABLE_WIS2BOX=on
MINIO_NOTIFY_MQTT_USERNAME_WIS2BOX=${WIS2BOX_BROKER_USERNAME}
MINIO_NOTIFY_MQTT_PASSWORD_WIS2BOX=${WIS2BOX_BROKER_PASSWORD}
MINIO_NOTIFY_MQTT_BROKER_WIS2BOX=tcp://${WIS2BOX_BROKER_HOST}:${WIS2BOX_BROKER_PORT}
MINIO_NOTIFY_MQTT_TOPIC_WIS2BOX=wis2box-storage/minio
MINIO_NOTIFY_MQTT_TOPIC_WIS2BOX=wis2box-storage/minio
MINIO_NOTIFY_MQTT_QOS_WIS2BOX=1

Docker Compose

The Docker Compose setup is driven from the resulting dev.env file created. For advanced cases and/or power users, updates can also be made to docker-compose.yml or docker-compose.override.yml (for changes to ports).

Summary

At this point, you have defined the runtime configuration required to administer your wis2box installation.

Administration

wis2box is designed to be built as a network of virtual machines within a virtual network. Once this is built, users login into the main wis2box machine to setup their workflow and configurations for data processing and publishing.

The wis2box-ctl.py utility provides a number of tools for managing the wis2box containers.

The following steps provide an example of container management workflow.

# build all images
python3 wis2box-ctl.py build

# start system
python3 wis2box-ctl.py start

# stop system
python3 wis2box-ctl.py stop

# view status of all deployed containers
python3 wis2box-ctl.py status

Note

Run python3 wis2box-ctl.py --help for all usage options.

With wis2box now installed and started, it’s time to start up the box and login to the wis2box container:

python3 wis2box-ctl.py start
python3 wis2box-ctl.py login

Now that you are logged into the wis2box container, it’s now time to manage station metadata, discovery metadata and data processing pipelines.

Public environment variables

The following environment variables are used for public services:

  • WIS2BOX_API_URL: API application

  • WIS2BOX_BROKER_PUBLIC: MQTT broker

  • WIS2BOX_URL: Web application, including access to data download/object storage

Default service ports

A default wis2box installation utilizes the following ports for public services:

Public services

  • 8999: Web application, API application, storage

  • 1883: Message broker via MQTT

  • 8884: Message broker via MQTT/WebSockets

Internal services

  • 1883: Message broker

  • 9200: Elasticsearch

  • 9000: MinIO

  • 9001: MinIO admin UI

Changing default ports

The docker/docker-compose.override.yml file provides definitions on utilized ports. To change default ports, edit docker/default.env before stopping and starting wis2box for changes to take effect.

MQTT Quality of Service (QoS)

The quality of service level of all wis2box powered brokers is always 1 by default.

Running

wis2box workflows can be categorized as design time (interactive) or runtime (automated).

Design time

  • environment creation

  • topic hierarchy registration

  • station metadata caching

  • station metadata API publishing

  • discovery metadata API publishing

Runtime

  • automated data processing and publishing

Running topics

Concepts

Let’s clarify a few concepts as part working with wis2box:

  • topic hierarchy: structure defined by WMO to categorize and classify data, allowing for easy and efficient search and identification

  • discovery metadata: description of a dataset to be included in the WIS 2.0 Global Discovery Catalogue

  • catalogue: a collection of discovery metadata records

  • station metadata: description of the properties of an observing station, which provides observations and measurements

  • data mappings: the wis2box mechanism to define and associate a topic hierarchy to a processing pipeline

Topic hierarchy

Note

The WIS 2.0 topic hierarchy is currently in development. wis2box implementation of the topic hierarchies will change, based on ratifications/updates of the topic hierarchies in WMO technical regulations and publications.

wis2box implements the WIS 2.0 topic hierarchies, which are designed to efficiently categorize and classify data.

Environment

wis2box initializes the environment when starting, before data processing or publishing. To view the environment, run the following command:

wis2box environment show

For the purposes of documentation, the value WIS2BOX_DATADIR represents the base directory for all data managed in wis2box.

The default enviroment variables are below.

# data paths and retention
WIS2BOX_DATADIR=/data/wis2box

# API
WIS2BOX_API_TYPE=pygeoapi
WIS2BOX_API_URL=http://localhost:8999/oapi
WIS2BOX_API_BACKEND_TYPE=Elasticsearch
WIS2BOX_API_BACKEND_URL=http://elasticsearch:9200
WIS2BOX_DOCKER_API_URL=http://wis2box-api:80/oapi

# logging
WIS2BOX_LOGGING_LOGLEVEL=ERROR
WIS2BOX_LOGGING_LOGFILE=stdout

# PubSub
WIS2BOX_BROKER_USERNAME=wis2box
WIS2BOX_BROKER_PASSWORD=wis2box
WIS2BOX_BROKER_HOST=mosquitto
WIS2BOX_BROKER_PORT=1883

WIS2BOX_BROKER_PUBLIC=mqtt://wis2box:wis2box@mosquitto:1883

# Web application
WIS2BOX_BASEMAP_URL=https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png
WIS2BOX_BASEMAP_ATTRIBUTION=<a href="https://osm.org/copyright">OpenStreetMap</a> contributors

# other
WIS2BOX_OSCAR_API_TOKEN=some_token
WIS2BOX_URL=http://localhost:8999

# mappings of topic hierarchy to wis2box data plugins
# optionally override default mappings from wis2box data plugins
# WIS2BOX_DATADIR_DATA_MAPPINGS=${PWD}/data-mappings.yml

# access control
WIS2BOX_AUTH_URL=http://wis2box-auth

# storage
WIS2BOX_STORAGE_TYPE=S3
WIS2BOX_STORAGE_SOURCE=http://minio:9000
WIS2BOX_STORAGE_USERNAME=minio
WIS2BOX_STORAGE_PASSWORD=minio123
WIS2BOX_STORAGE_INCOMING=wis2box-incoming
WIS2BOX_STORAGE_PUBLIC=wis2box-public
WIS2BOX_STORAGE_ARCHIVE=wis2box-archive
WIS2BOX_STORAGE_CONFIG=wis2box-config
WIS2BOX_STORAGE_DATA_RETENTION_DAYS=7

# you should be okay from here

# MinIO
MINIO_ROOT_USER=${WIS2BOX_STORAGE_USERNAME}
MINIO_ROOT_PASSWORD=${WIS2BOX_STORAGE_PASSWORD}
MINIO_PROMETHEUS_AUTH_TYPE=public
MINIO_NOTIFY_MQTT_ENABLE_WIS2BOX=on
MINIO_NOTIFY_MQTT_USERNAME_WIS2BOX=${WIS2BOX_BROKER_USERNAME}
MINIO_NOTIFY_MQTT_PASSWORD_WIS2BOX=${WIS2BOX_BROKER_PASSWORD}
MINIO_NOTIFY_MQTT_BROKER_WIS2BOX=tcp://${WIS2BOX_BROKER_HOST}:${WIS2BOX_BROKER_PORT}
MINIO_NOTIFY_MQTT_TOPIC_WIS2BOX=wis2box-storage/minio
MINIO_NOTIFY_MQTT_TOPIC_WIS2BOX=wis2box-storage/minio
MINIO_NOTIFY_MQTT_QOS_WIS2BOX=1

Data mappings

Once a topic hierarchy is defined, it needs to be included in the wis2box data mappings configuration. wis2box provides a default data mapping (in YAML format):

data:
    mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP:
        plugins:
            csv:
                - plugin: wis2box.data.csv2bufr.ObservationDataCSV2BUFR
                  template: synop-bufr.json
                  notify: true
                  file-pattern: '^WIGOS_(\d-\d+-\d+-\w+)_.*\.csv$'
            bufr4:
                - plugin: wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON
                  file-pattern: '^WIGOS_(\d-\d+-\d+-\w+)_.*\.bufr4$'
    ita.roma_met_centre.data.core.weather.surface-based-observations.SYNOP:
        plugins:
            bin:
                - plugin: wis2box.data.bufr4.ObservationDataBUFR
                  notify: true
                  file-pattern: '^.*\.bin$'
            bufr4:
                - plugin: wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON
                  file-pattern: '^WIGOS_(\d-\d+-\d+-\w+)_.*\.bufr4$'
    dza.alger_met_centre.data.core.weather.surface-based-observations.SYNOP:
        plugins:
            bufr4:
                - plugin: wis2box.data.bufr4.ObservationDataBUFR
                  notify: true
                  buckets:
                    - ${WIS2BOX_STORAGE_INCOMING}
                  file-pattern: '^.*\.bufr4$'
                - plugin: wis2box.data.bufr2geojson.ObservationDataBUFR2GeoJSON
                  buckets:
                    - ${WIS2BOX_STORAGE_PUBLIC}
                  file-pattern: '^WIGOS_(\d-\d+-\d+-\w+)_.*\.bufr4$'

The data mappings are indicated by the data keyword, with each topic having a separate entry specifying:

  • plugins: all plugin objects associated with the topic, by file type/extension

Each plugin is based on the file extension to be detected and processed, with the following configuration:

  • plugin: the codepath of the plugin

  • notify: whether the plugin should publish a data notification

  • template: additional argument allowing a mapping template name to be passed to the plugin

  • file-pattern: additional argument allowing a file pattern to be passed to the plugin

  • buckets: the name(s) of the storage bucket(s) that data should be saved to (See Configuration for more informaiton on buckets)

The default data mapping can be overriden by user-defined data mappings with the following steps:

  • create a YAML file similar to the above to include your topic hierarchy

  • set the WIS2BOX_DATADIR_DATA_MAPPINGS environment variable to point to the new file of definitions

  • restart wis2box

See Extending wis2box for more information on adding your own data processing pipeline.

Station metadata

wis2box is designed to support data ingest and processing of any kind. For observations, processing workflow typically requires station metadata to be present at runtime.

wis2box provides the ability to cache station metadata from the WMO OSCAR/Surface system.

To cache your stations of interest, create a CSV file formatting per below, specifying one line (with station name and WIGOS station identifier [WSI]) per station:

station_name,wigos_station_identifier,traditional_station_identifier
Balaka,0-454-2-AWSBALAKA,
Malomo_EPA,0-454-2-AWSMALOMO,
IN-GUEZZAM,0-20000-0-60690,60690
BENI OUNIF,0-12-0-08BECCN60577,60577

Use this CSV to cache station metadata:

wis2box metadata station cache /path/to/station_list.csv

Resulting station metadata files (JSON) are stored in WIS2BOX_DATADIR/data/metadata/station and can be used by wis2box data processing pipelines. These data are required before starting automated processing.

Note

run the command wis2box metadata station sync /path/to/station_list.csv to both cache stations form OSCAR/Surface and publish station as a collection to the wis2box API

See also

API publishing

Summary

At this point, you have cached the required station metadata for your given dataset(s).

Discovery metadata

Discovery metadata describes a given dataset or collection. Data being published through a wis2box requires discovery metadata (describing it) to be created, maintained and published to the wis2box catalogue API.

wis2box supports managing discovery metadata using the WMO Core Metadata Profile (WCMP) 2.0 standard.

Note

WCMP 2.0 is currently in development as part of WMO activities.

Creating a discovery metadata record in wis2box is as easy as completing a YAML configuration file. wis2box leverages the pygeometa project’s metadata control file (MCF) format. Below is an example MCF file.

wis2box:
    retention: P30D
    topic_hierarchy: mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP
    data_category: observationsSurfaceLand
    country: mwi
    center_id: mwi_met_centre

mcf:
    version: 1.0

metadata:
    identifier: mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP
    language: en
    language_alternate: fr
    charset: utf8
    hierarchylevel: dataset
    datestamp: 2021-11-29

spatial:
    datatype: vector
    geomtype: point

identification:
    language: en
    charset: utf8
    title:
        en: Surface weather observations from Malawi
    abstract:
        en: Surface weather observations from Malawi
    dates:
        creation: 2021-11-29
        publication: 2021-11-29
    keywords:
        default:
            keywords:
                en:
                    - surface weather
                    - temperature
                    - observations
        wmo:
            keywords:
                en:
                    - weatherObservations
            keywords_type: theme
            vocabulary:
                name:
                    en: WMO Category Code
                url: https://github.com/wmo-im/wcmp-codelists/blob/main/codelists/WMO_CategoryCode.csv
        wis2:
            keywords:
                en:
                    - mw.malawi.weatherObservations.dataset_name
            keywords_type: theme
            vocabulary:
                name:
                    en: WMO Core Metadata profile topic hierarchy
                url: https://github.com/wmo-im/wcmp2-codelists/blob/main/codelists/topic_hierarchy.csv

    topiccategory:
        - climatologyMeteorologyAtmosphere
    extents:
        spatial:
            - bbox: [32.6881653175,-16.8012997372,35.7719047381,-9.23059905359]
              crs: 4326
        temporal:
            - begin: 2021-11-29
              end: null
              resolution: P1H
    fees: None
    accessconstraints: otherRestrictions
    rights:
        en: WMO Unified Policy for the International Exchange of Earth System Data
    url: https://example.org/malawi-surface-weather-observations
    status: onGoing
    maintenancefrequency: continual

contact:
    pointOfContact: &contact_poc
        organization: Department of Climate Change and Meteorologial Services (DCCMS)
        url: https://www.metmalawi.gov.mw
        individualname: Firstname Lastname
        positionname: Position Name
        phone: +265-1-822-014
        fax: +265-1-822-215
        address: P.O. Box 1808
        city: Blantyre
        administrativearea: Blantyre District
        postalcode: M3H 5T4
        country: Malawi
        email: you@example.org
        hoursofservice: 0700h - 1500h UTC
        contactinstructions: email

    distributor: *contact_poc

dataquality:
    scope:
        level: dataset
    lineage:
        statement: this data was generated by the csv2bufr tool

Note

There are no conventions to the MCF filename. The filename does not get used/exposed or published. It is up to the user to determine the best filename, keeping in mind your wis2box system may manage and publish numerous datasets (and MCF files) over time.

Summary

At this point, you have created discovery metadata for your given dataset(s).

Data ingest, processing and publishing

At this point, the system is ready for ingest/processing and publishing.

Data ingest, processing and publishing can be run in automated fashion or via the wis2box CLI. Data is ingested, processed, and published as WMO BUFR data, as well as GeoJSON features.

Warning

GeoJSON data representations provided in wis2box are in development and are subject to change based on evolving requirements for observation data representations in WIS 2.0 technical regulations.

Interactive ingest, processing and publishing

The wis2box CLI provides a data subsystem to process data interactively. CLI data ingest/processing/publishing can be run with explicit or implicit topic hierarchy routing (which needs to be tied to the pipeline via the Data mappings).

Explicit topic hierarchy workflow
# process a single CSV file
wis2box data ingest --topic-hierarchy foo.bar.baz -p /path/to/file.csv

# process a directory of CSV files
wis2box data ingest --topic-hierarchy foo.bar.baz -p /path/to/dir

# process a directory of CSV files recursively
wis2box data ingest --topic-hierarchy foo.bar.baz -p /path/to/dir -r
Implicit topic hierarchy workflow
# process incoming data; topic hierarchy is inferred from fuzzy filepath equivalent
# wis2box will detect 'foo/bar/baz' as topic hierarchy 'foo.bar.baz'
wis2box data ingest -p /path/to/foo/bar/baz/data/file.csv
Event driven ingest, processing and publishing

Once all metadata and topic hierarchies are setup, event driven workflow will immediately start to listen on files in the wis2box-incoming storage bucket as they are placed in the appropriate topic hierarchy directory.

Summary

Congratulations! At this point, you have successfully setup a wis2box data pipeline. Data should be flowing through the system.

Data pipeline plugins

Driven by topic hierarchies, wis2box is a plugin architecture orchestrating all the required components of a WIS 2.0 node. wis2box also provides a data pipeline plugin architecture which allows for users to define a plugin based on a topic hierarchy to publish incoming data (see Data mappings for more information).

See also

Data mappings

Default pipeline plugins

wis2box provides a number of data pipeline plugins which users can be used “out of the box”.

wis2box.data.csv2bufr.ObservationDataCSV2BUFR

This plugin converts CSV observation data into BUFR using csv2bufr. A csv2bufr template can be configured to process the data accordingly. In addition, file-pattern can be used to filter on incoming data based on a regular expression. Consult the csv2bufr documentation for more information on configuration and templating.

wis2box.data.bufr4.ObservationDataBUFR2GeoJSON

This plugin converts BUFR observation data into GeoJSON using bufr2geojson. A file-pattern can be used to filter on incoming data based on a regular expression. Consult the bufr2geojson documentation for more information on configuration and templating.

wis2box.data.geojson.ObservationDataGeoJSON

This plugin is for the purposes of publishing GeoJSON data to the API.

API publishing

When wis2box starts, the API provisioning environment is initialized. At this stage, the following steps are required:

  • station metadata has been configured

  • discovery metadata has been created

  • data pipelines are configured and running

Let’s dive into publishing the data and metadata:

wis2box provides an API supporting the OGC API standards using pygeoapi.

Station metadata API publishing

The first step is to publish our station metadata to the API. The command below will generate local station collection GeoJSON for pygeoapi publication.

wis2box metadata station publish-collection

Note

run the command wis2box metadata station sync /path/to/station_list.csv to both cache stations from OSCAR/Surface and publish station as a collection to the wis2box API

See also

Station metadata

Discovery metadata API publishing

This step will publish dataset discovery metadata to the API.

wis2box metadata discovery publish /path/to/discovery-metadata.yml
Dataset collection API publishing

The below command will add the dataset collection to pygeoapi from the discovery metadata MCF created as described in the Discovery metadata section.

wis2box data add-collection $WIS2BOX_DATADIR/data/config/foo/bar/baz/discovery-metadata.yml

To delete the colection from the API backend and configuration:

wis2box api delete-collection foo.bar.baz

Note

Changes to the API configuration are reflected and updated automatically.

Summary

At this point, you have successfully published the required data and metadata collections to the API.

Data retention

wis2box is configured to set data retention according to your requirements. Data retention is managed via the WIS2BOX_STORAGE_DATA_RETENTION_DAYS environment variable as part of configuring wis2box.

Cleaning

Cleaning applies to storage defined by WIS2BOX_STORAGE_PUBLIC and involves the deletion of files after set amount of time.

Cleaning is performed by default daily at 0Z by the system, and can also be run interactively with:

# delete data older than WIS2BOX_STORAGE_DATA_RETENTION_DAYS by default
wis2box data clean


# delete data older than --days (force override)
wis2box data clean --days=30
Archiving

Archiving applies to storage defined by WIS2BOX_STORAGE_INCOMING and involves moving files to the storage defined by WIS2BOX_STORAGE_ARCHIVE.

Archive is performed on incoming data by default daily at 1Z by the system, and can also be run interactively with:

wis2box data archive

Only files with a timestamp older than one hour are considered for archiving.

Too Long Didn’t Read

For the truly impatient, this section summarizes the steps to required to run wis2box from example configurations.

Environment

wis2box passes environment variables from dev.env to its containers on startup. An example file is provided in examples/config/wis2box.env. Copy this file to your working directory, and update it to suit your needs.

cp examples/config/wis2box.env dev.env

Note

You must map WIS2BOX_HOST_DATADIR to the absolute path of directory on your host machine.

For terminals that support environment variables, it may be useful to also define environment variables locally. For terminals that do not support environment variables, you will need to define the asolute path to your data directory.

# source
. dev.env
# verify
echo $WIS2BOX_HOST_DATADIR
Data mappings

To start processing your own data, you need to define a data mappings file. Example mapping files are provided in examples/config:

  • examples/config/synop-bufr-mappings.yml, input is .bufr containing SYNOP observation-data

  • examples/config/synop-csv-mappings.yml, input is .csv containing SYNOP observation-data

To publish .bufr files of SYNOP data:

cp examples/config/synop-bufr-mappings.yml $WIS2BOX_HOST_DATADIR/data-mappings.yml

Edit $WIS2BOX_HOST_DATADIR/data-mappings.yml and change [iso3c_country].[center_id].data.core.weather.surface-based-observations.SYNOP:

  • replace iso3c_country with your corresponding ISO 3166 alpha-3 code.

  • replace center_id with the string identifying the center running the wis2node.

Station metadata

wis2box needs to have a station_list.csv that contains the stations that will be sending data. An example is provided in examples/config/station_list.csv. Update the file with your stations.

cp examples/config/station_list.csv $WIS2BOX_HOST_DATADIR/station_list.csv
Discovery metadata

Discovery metadata is central to wis2box, wis2box-api, and wis2box-ui. An example is provided in examples/config/surface-weather-observations.yml.

cp examples/config/surface-weather-observations.yml $WIS2BOX_HOST_DATADIR/surface-weather-observations.yml

Edit the file $WIS2BOX_HOST_DATADIR/surface-weather-observations.yml to provide the correct metadata for your dataset:

  • replace [iso3c_country].[center_id].data.core.weather.surface-based-observations.SYNOP with the topic you used in $WIS2BOX_HOST_DATADIR/data-mappings.yml previously

  • text provided in title and abstract will be displayed in wis2box-ui

  • provide a valid bounding-box in bbox

Build wis2box

Run the build and update commands to set up wis2box. This will start the process of building the wis2box containers from source.

python3 wis2box-ctl.py build
python3 wis2box-ctl.py update

This might take a while the first time.

Start wis2box

Start wis2box containers and check that all services are running (and healthy).

python3 wis2box-ctl.py start
python3 wis2box-ctl.py status

If neccessary repeat the command until all services are up and running.

Runtime configuration

The last design-time steps required to run wis2box are once wis2box is running.

Login to the wis2box container

python3 wis2box-ctl.py login

Note

$WIS2BOX_DATADIR is the location that $WIS2BOX_HOST_DATADIR binds to the container. This allows wis2box command to access the configuration files from inside the wis2box container.

Setup observation data processing and API publication:

wis2box data add-collection $WIS2BOX_DATADIR/surface-weather-observations.yml

Cache and publish station collection and discovery metadata to the API:

wis2box metadata discovery publish $WIS2BOX_DATADIR/surface-weather-observations.yml
wis2box metadata station sync $WIS2BOX_DATADIR/station_list.csv

Logout of wis2box container:

exit
Data ingest

The runtime component of wis2box is data ingestion. This is an event driven workflow driven by s3 notifications from uploading data to wis2box-storage. An example is provided in examples/scripts/copy_to_incoming.py. To access the storage component, visit http://localhost:9001 in your web browser. The default username/password is minio/minio123

Debugging

Something’s now working? wis2box includes a local grafana-instance to help you collect and view logs and figure out what’s wrong. Visit http://localhost:3000 in your local web browser to view the local grafana instance.

wis2box-ui

wis2box includes a UI to view the data that has been ingested. To explore, visit http://localhost:8999 in your web browser.

Not seeing data? After data has been ingested for a station for the first time, you need to re-publish the stations. This will republish the station with a link relation to any associated observation collection.

python3 wis2box-ctl.py execute wis2box metadata station publish-collection

Storage

Overview

The default wis2box storage capability is powered by MinIO, which provides S3 compatible object storage.

The default wis2box MinIO administration user interface can be accessed locally at http://localhost:9001.

The username/password for MinIO is configured through environment variables (see Configuration).

MinIO login screen

Once logged in, buckets can be managed via the default “Buckets” menu item (click “Manage”). Click “Browse” provides a browsing capability for a storage administrator.

MinIO default administration UI

Uploading data to MinIO

Files can uploaded to the MinIO bucket in a number of ways. Any new file received on MinIO will trigger an MQTT notification which is received by wis2box.

Below are basic examples on sending data to the MinIO wis2box-incoming bucket. For more information and additional examples, consult the official MinIO documentation.

Using the boto3 Python Client

Install the Python boto3 package via pip:

pip3 install boto3

The below example copies a local file (myfile.csv) to the wis2box-incoming bucket with topic foo.bar.baz:

import boto3

endpoint_url = '<your-wis2box-url>'
filename = 'myfile.csv'

session = boto3.Session(
    aws_access_key_id='wis2box',
    aws_secret_access_key='Wh00data!'
)

s3client = session.client('s3', endpoint_url=endpoint_url)

with open(filename, 'rb') as fh:
    s3client.upload_fileobj(fh, 'wis2box-incoming', f'foo/bar/baz/{filename}')

To allow uploading files into MinIO remotely, the wis2box-incoming bucket is proxied via Nginx.

For example, to upload the local file (WIGOS_0-454-2-AWSNAMITAMBO_2021-11-18T0955.csv with topic) with topic mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP via the Nbinx proxy:

import boto3

endpoint_url = 'http://localhost:9000'
filename = 'myfile.csv'

session = boto3.Session(
    aws_access_key_id='wis2box',
    aws_secret_access_key='Wh00data!'
)

s3client = session.client('s3', endpoint_url=endpoint_url)

filename = 'WIGOS_0-454-2-AWSNAMITAMBO_2021-11-18T0955.csv'

with open(filename, 'rb') as f:
    s3client.upload_fileobj(f, 'wis2box-incoming', f'data/core/observations-surface-land/mw/FWCL/landFixed/{filename}')

Using the MinIO Python Client

MinIO provides a Python client which can be used as follows:

Install the Python minio module via pip:

pip3 install minio

The below example copies a local file (myfile.csv) to the wis2box-incoming bucket to topic foo.bar.baz:

from minio import Minio

client = Minio(
    'localhost:9000',
    access_key='minio',
    secret_key='minio123',
    secure=False
)

client.fput_object('wis2box-incoming', 'myfile.csv', '/foo/bar/baz/myfile.csv')

Using S3cmd

Given MinIO is S3 compatible, data can be uploaded using generic S3 tooling. The below example uses S3cmd to upload data to wis2box MinIO storage:

Edit the following fields in ~/.s3cfg:

cat << EOF > ~/.s3cfg
# Setup endpoint
host_base = localhost:9000
use_https = False

# Setup access keys
access_key = minio
secret_key = minio123
EOF

Below is a simple command line example to copy a local file called myfile.csv into the wis2box-incoming bucket, to topic foo/bar/baz:

s3cmd myfile.csv s3://wis2box-incoming/foo/bar/baz

Using the MinIO UI

Files can also be uploaded interactively via the MinIO adminstration interface. The example below demonstrates this capability when browsing the wis2box-incoming bucket:

Uploading files using the MinIO adminstration interface

Monitoring

wis2box has built-in monitoring functions based on Prometheus, Loki and Grafana.

The Grafana endpoint can be visualized at http://localhost/monitoring.

Grafana uses two data sources to display monitoring data:

  • Prometheus: actively ‘scrapes’ data from the configured prometheus-client exporters every X seconds

  • Loki: logging endpoint for the Docker containers that compose the wis2box

Prometheus exporters for wis2box

The exporters for wis2box are based on the Prometheus Python Client

  • mqtt_metric_collector: collects data on messages published, using an mqtt-session subscribed to the wis2box-broker

wis2box also analyzes prometheus metrics from MinIO.

Note

For more information see the list of supported MinIO metrics

Loki logging

The logs of the following Docker containers are sent to Loki:

  • mqp-publisher

  • wis2box

  • wis2box-ui

  • mosquitto

  • wis2box-api

  • wis2box-auth

Monitoring topics

Grafana dashboards

wis2box provides a Grafana dashboard in order to visualize and analyze various metrics.

Go to http://localhost:3000 to see the home dashboard of wis2box once the stack is running.

Grafana Home dashboard

Note

The dashboard configuration can be found in docker/grafana/dashboards/home.json.

Exploring logs

You can explore logs by selecting explore from the side-bar in Grafana.

Select wis2box-loki as a data source to browse the logs produced by the Docker containers that compose wis2box:

Grafana Loki Explore

Services

wis2box provides a number of data access services and mechanisms in providing data to users, applications and beyond.

Discovery Catalogue

The discovery catalogue is powered by OGC API - Records and is located at http://localhost:8999/oapi/collections/discovery-metadata

The OGC API endpoint is located by default at http://localhost:8999/oapi. The discovery catalogue endpoint is located at http://localhost:8999/oapi/collections/discovery-metadata

Below are some examples of working with the discovery catalogue.

Note

  • adding f=json to URLs will provide the equivalent JSON/GeoJSON representations

  • query predicates (datetime, bbox, q, etc.) can be combined

See also

Data access

Data API

wis2box data is made available via OGC API - Features and is located at http://localhost:8999/oapi standards.

The OGC API endpoint is located by default at http://localhost:8999/oapi

Below are some examples of working with the discovery catalogue.

Note

  • the examples below use the mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP collection as described in the Quickstart. For other dataset collections, use the same query patterns below, substituting the collection id accordingly

Note

  • adding f=json to URLs will provide the equivalent JSON/GeoJSON representations

  • query predicates (datetime, bbox, q, etc.) can be combined

See also

Data access

Management API

The Data API also provides a management API to manage resources in alignment with OGC API - Features - Part 4: Create, Replace, Update and Delete.

SpatioTemporal Asset Catalog (STAC)

The wis2box SpatioTemporal Asset Catalog (STAC) endpoint can be found at:

http://localhost:8999/stac

…providing the user with a crawlable catalogue of all data on a wis2box.

Web Accessible Folder (WAF)

The wis2box Web Accessible Folder publich bucket endpoint can be found at:

http://localhost:8999/data/

…providing the user with a crawlable online folder of all data on a wis2box.

Broker

The wis2box broker is powered by MQTT and can be found at:

mqtt://localhost:1883

…providing a PubSub capability for event driven subscription and access.

Adding services

wis2box’s architecture allows for additional services as required by adding Docker containers. Examples of additional services include adding a container for a samba share or FTP server. Key considerations for adding services:

  • Storage buckets can be found at http://minio:9000

  • Elasticsearch indexes can be found at the container/URL http://elasticsearch:9200

Examples of additional services can be found in docker/extras.

Authentication and Access control

wis2box provides built in access control for the WAF and API on a topic hierarchy basis. Configuration is done using the wis2box command line utility. Authentication tokens are only required for topics that have access control configured.

Adding Access Control

All topic hierarchies in wis2box are open by default. A topic becomes closed, with access control applied, the first time a token is generated for a topic hierarchy.

Note

Make sure you are logged into the wis2box container when using the wis2box CLI

wis2box auth add-token --topic-hierarchy mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP mytoken

If no token is provided, a random string will be generated. Be sure to the record token now, there is no way to retrieve it once it is lost.

Authenticating

Token credentials can be validated using the wis2box command line utility.

wis2box auth show
wis2box auth has-access --topic-hierarchy mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP mytoken
wis2box auth has-access --topic-hierarchy mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP notmytoken

Once a token has been generated, access to any data of that topic in the WAF or API requires token authentication. Tokens are passed as a bearer token in the Authentication header or as an argument appended to the URI. Headers can be easily added to requests using cURL.

curl -H "Authorization: Bearer mytoken" "http://localhost:8999/oapi/collections/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP"
curl -H "Authorization: Bearer notmytoken" "http://localhost:8999/oapi/collections/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP"

Removing Access Control

A topic becomes open and no longer requires authentication when all tokens have been deleted. This can be done by deleting individual tokens, or all tokens for a given topic hierarchy.

wis2box auth remove-tokens --topic-hierarchy mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP
wis2box auth show

Extending Access Control

wis2box provides access control out of the box with subrequests to wis2box-auth. wis2box-auth could be replaced in nginx for another auth server like Gluu or a Web SSO like LemonLDAP or Keycloak. These services are not yet configurable via the wis2box command line utility.

wis2box is intentionally plug and playable. Beyond custom authentication servers, extending wis2box provides an overview of more modifications that can be made to wis2box.

Data access

Overview

This section provides examples of interacting with wis2box data services as described in Services using a number of common tools and software packages.

API

Using Python, requests and Pandas

Python is a popular programming language which is heavily used in the data science domains. Python provides high level functionality supporting rapid application development with a large ecosystem of packages to work with weather/climate/water data.

Let’s use the Python requests package to further interact with the wis2box API, and Pandas to run some simple summary statistics.

[1]:
import json

import requests

def pretty_print(input):
    print(json.dumps(input, indent=2))


# define the endpoint of the OGC API
api = 'http://localhost:8999/oapi'
Stations

Let’s find all the stations in our wis2box:

[2]:
url = f'{api}/collections/stations/items?limit=50'

response = requests.get(url).json()

print(f"Number of stations: {response['numberMatched']}")

print('Stations:\n')
for station in response['features']:
    print(station['properties']['name'])
Number of stations: 26
Stations:

NAMBUMA
BALAKA
BILIRA
CHIDOOLE
CHIKANGAWA
CHIKWEO
CHINGALE
KALAMBO
KASIYA AWS
KASUNGU NATIONAL PARK AWS
KAWALAZI
KAYEREKERA
LENGWE NATIONAL PARK
LOBI AWS
MAKANJIRA
MALOMO
MISUKU
MLARE
MLOMBA
MTOSA BENGA
NAMITAMBO
NANKUMBA
NKHOMA UNIVERSITY
NKHULAMBE
NYACHILENDA
TOLEZA
Discovery Metadata

Now, let’s find all the dataset that are provided by the above stations. Each dataset is identified by a WIS 2.0 discovery metadata record.

[3]:
url = f'{api}/collections/discovery-metadata/items'

response = requests.get(url).json()

print('Datasets:\n')
for dataset in response['features']:
    print(f"id: {dataset['properties']['id']}, title: {dataset['properties']['title']}")
Datasets:

id: data.core.test-passthrough, title: Surface weather observations (passthrough)
id: mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP, title: Surface weather observations (hourly)

Let’s find all the data access links associated with the Surface weather observations (hourly) dataset:

[4]:
dataset_id = 'mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP'

url = f"{api}/collections/discovery-metadata/items/{dataset_id}"

response = requests.get(url).json()

print('Data access links:\n')
for link in response['links']:
    print(f"{link} {link['href']} ({link['type']}) {link['rel']}")
    link['rel']

[link['href'] for link in response['links']]
Data access links:

{'rel': 'self', 'type': 'application/geo+json', 'title': 'This document as GeoJSON', 'href': 'http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=json'} http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=json (application/geo+json) self
{'rel': 'alternate', 'type': 'application/ld+json', 'title': 'This document as RDF (JSON-LD)', 'href': 'http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=jsonld'} http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=jsonld (application/ld+json) alternate
{'rel': 'alternate', 'type': 'text/html', 'title': 'This document as HTML', 'href': 'http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=html'} http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=html (text/html) alternate
{'rel': 'collection', 'type': 'application/json', 'title': 'Discovery metadata', 'href': 'http://localhost:8999/oapi/collections/discovery-metadata'} http://localhost:8999/oapi/collections/discovery-metadata (application/json) collection
[4]:
['http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=json',
 'http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=jsonld',
 'http://localhost:8999/oapi/collections/discovery-metadata/items/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP?f=html',
 'http://localhost:8999/oapi/collections/discovery-metadata']

Let’s use the OGC API - Features (OAFeat) link to drill into the observations for Chidoole station

[5]:
dataset_api_link = 'http://localhost:8999/oapi/collections/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP'

dataset_api_link
[5]:
'http://localhost:8999/oapi/collections/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP'
Observations

Let’s inspect some of the data in the API’s raw GeoJSON format:

[6]:
url = f'{dataset_api_link}/items'

query_parameters = {
    'wigos_station_identifier': '0-454-2-AWSCHIDOOLE',
    'limit': 10000,
    'name': 'air_temperature'
}

response = requests.get(url, params=query_parameters).json()

pretty_print(response['features'][0])
{
  "id": "WIGOS_0-454-2-AWSCHINGALE_20220112T135500-25",
  "reportId": "WIGOS_0-454-2-AWSCHINGALE_20220112T135500",
  "type": "Feature",
  "geometry": {
    "type": "Point",
    "coordinates": [
      35.11,
      -15.24,
      623.0
    ]
  },
  "properties": {
    "wigos_station_identifier": "0-454-2-AWSCHINGALE",
    "phenomenonTime": "2022-01-12T13:55:00Z",
    "resultTime": "2022-01-12T13:55:00Z",
    "name": "air_temperature",
    "value": 24.85,
    "units": "Celsius",
    "description": null,
    "metadata": [
      {
        "name": "station_or_site_name",
        "value": null,
        "units": "CCITT IA5",
        "description": "Chingale"
      },
      {
        "name": "station_type",
        "value": 0,
        "units": "CODE TABLE",
        "description": "Automatic"
      },
      {
        "name": "height_of_barometer_above_mean_sea_level",
        "value": 624.0,
        "units": "m",
        "description": null
      },
      {
        "name": "height_of_sensor_above_local_ground_or_deck_of_marine_platform",
        "value": 1.5,
        "units": "m",
        "description": null
      }
    ],
    "index": 25,
    "fxxyyy": "012101",
    "id": "WIGOS_0-454-2-AWSCHINGALE_20220112T135500-25"
  }
}

Let’s inspect what’s measured at Chidoole:

[7]:
print('Observed property:\n')
feature = response['features'][9]
print(f"{feature['properties']['name']} ({feature['properties']['units']})")
Observed property:

air_temperature (Celsius)
Pandas

Let’s use the GeoJSON to build a more user-friendly table

[8]:
import pandas as pd

datestamp = [obs['properties']['resultTime'] for obs in response['features']]
air_temperature = [obs['properties']['value'] for obs in response['features']]

d = {
    'Date/Time': datestamp,
    'Air temperature (°C)': air_temperature
}

df = pd.DataFrame(data=d)
[9]:
df
[9]:
Date/Time Air temperature (°C)
0 2022-01-12T13:55:00Z 24.85
1 2022-01-12T14:55:00Z 27.25
2 2022-01-12T15:55:00Z 26.65
3 2022-01-12T16:55:00Z 25.95
4 2022-01-12T17:55:00Z 25.45
... ... ...
5101 2022-06-09T12:55:00Z 21.35
5102 2022-06-09T13:55:00Z 22.25
5103 2022-06-09T14:55:00Z 20.25
5104 2022-06-10T12:55:00Z 23.75
5105 2022-06-10T14:55:00Z 21.15

5106 rows × 2 columns

[10]:
print("Time extent\n")
print(f'Begin: {df["Date/Time"].min()}')
print(f'End: {df["Date/Time"].max()}')

print("Summary statistics:\n")
df[['Air temperature (°C)']].describe()
Time extent

Begin: 2022-01-12T13:55:00Z
End: 2022-06-10T14:55:00Z
Summary statistics:

[10]:
Air temperature (°C)
count 5106.000000
mean 23.541559
std 4.053172
min 13.550000
25% 20.950000
50% 23.350000
75% 26.350000
max 37.850000
[ ]:

Using Python and OWSLib

OWSLib is a Python package which provides Pythonic access to OGC APIs and web services. Let’s see how easy it is to work with wis2box with standards-based tooling:

[1]:
from owslib.ogcapi.features import Features

import pandas as pd

def pretty_print(input):
    print(json.dumps(input, indent=2))


api = 'http://localhost:8999/oapi'

Let’s load the wis2box API into OWSLib and inspect some data

[2]:
oafeat = Features(api)

collections = oafeat.collections()
print(f'This OGC API Features endpoint has {len(collections["collections"])} datasets')

for dataset in collections['collections']:
    print(dataset['title'])

malawi_obs = oafeat.collection_items('mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP')
malawi_obs_df = pd.DataFrame(malawi_obs['features'])

# then filter by station
obs = oafeat.collection_items('mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP', wigos_station_identifier='0-454-2-AWSCHIDOOLE', name='air_temperature', limit=10000)

datestamp = [obs['properties']['resultTime'] for obs in obs['features']]
air_temperature = [obs['properties']['value'] for obs in obs['features']]

d = {
    'Date/Time': datestamp,
    'Air temperature (°C)': air_temperature
}

df = pd.DataFrame(data=d)
This OGC API Features endpoint has 4 datasets
Surface weather observations (passthrough)
Discovery metadata
Stations
Surface weather observations (hourly)
[3]:
df.dtypes
[3]:
Date/Time                object
Air temperature (°C)    float64
dtype: object
[4]:
df.head(3)
[4]:
Date/Time Air temperature (°C)
0 2022-01-12T13:55:00Z 24.85
1 2022-01-12T14:55:00Z 27.25
2 2022-01-12T15:55:00Z 26.65
[5]:
print("Time extent\n")
print(f'Begin: {df["Date/Time"].min()}')
print(f'End: {df["Date/Time"].max()}')

print("Summary statistics:\n")
df[['Air temperature (°C)']].describe()
Time extent

Begin: 2022-01-12T13:55:00Z
End: 2022-06-10T14:55:00Z
Summary statistics:

[5]:
Air temperature (°C)
count 5106.000000
mean 23.541559
std 4.053172
min 13.550000
25% 20.950000
50% 23.350000
75% 26.350000
max 37.850000
[ ]:

R

R is a common programming language for data analysis and visualization. R provides easy access to various statiscal analysis libraries. We are going to use the R libraries: sf to load features, dplyr for data manipulation, and

Install Requirements

[ ]:
install.packages("sf")
install.packages("dplyr")

Import Requirements

[1]:
library(sf)
library(dplyr)

oapi <- "http://oapi/oapi" # jupyter is run through docker
#oapi = http://localhost:8999/oapi # jupyter is run on host machine
Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE


Attaching package: ‘dplyr’


The following objects are masked from ‘package:stats’:

    filter, lag


The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union


Stations
[2]:
stations <- read_sf(paste0(oapi,"/collections/stations/items?f=json"))
print(stations)
Simple feature collection with 7 features and 5 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 33.67305 ymin: -15.84052 xmax: 35.27428 ymax: -9.92951
z_range:       zmin: 618 zmax: 1288
Geodetic CRS:  WGS 84
# A tibble: 7 × 6
  wigos_station_identifier              name        url   status    id                  geometry
  <chr>                 <chr>       <chr> <chr>  <int>               <POINT [°]>
1 0-454-2-AWSLOBI       LOBI AWS    http… opera… 65618 Z (34.07244 -14.39528 12…
2 0-454-2-AWSKAYEREKERA KAYEREKERA  http… opera… 91840 Z (33.67305 -9.92951 848)
3 0-454-2-AWSMALOMO     MALOMO      http… opera… 91873 Z (33.83727 -13.14202 10…
4 0-454-2-AWSNKHOMA     NKHOMA UNI… http… opera… 91875 Z (34.10468 -14.04422 12…
5 0-454-2-AWSTOLEZA     TOLEZA      http… opera… 91880    Z (34.955 -14.948 764)
6 0-454-2-AWSNAMITAMBO  NAMITAMBO   http… opera… 91885 Z (35.27428 -15.84052 80…
7 0-454-2-AWSBALAKA     BALAKA      http… opera… 91893 Z (34.96667 -14.98333 61…
Discovery Metadata
[3]:
discovery_metadata <- read_sf(paste0(oapi,"/collections/discovery-metadata/items"))
print(discovery_metadata)
Simple feature collection with 1 feature and 13 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: 32.68817 ymin: -16.8013 xmax: 35.7719 ymax: -9.230599
Geodetic CRS:  WGS 84
# A tibble: 1 × 14
  identifier externalId title description themes providers language type  extent
  <chr>      <chr>      <chr> <chr>       <chr>  <chr>     <chr>    <chr> <chr>
1 data.core… "[ { \"sc… Surf… Surface we… "[ { … "[ { \"n… en       data… "{ \"…
# … with 5 more variables: created <date>, rights <chr>,
#   X_metadata.anytext <chr>, id <chr>, geometry <POLYGON [°]>
Observations
[4]:
malawi_obs <- read_sf(paste0(oapi,"/collections/mwi.mwi_met_centre.data.core.weather.surface-based-observations.SYNOP/items"))
print(malawi_obs)
Simple feature collection with 10 features and 7 fields
Geometry type: POINT
Dimension:     XYZ
Bounding box:  xmin: 35.27 ymin: -15.84 xmax: 35.27 ymax: -15.84
z_range:       zmin: 806 zmax: 806
Geodetic CRS:  WGS 84
# A tibble: 10 × 8
   identifier  phenomenonTime      resultTime          wigos_station_i… metadata
   <chr>       <dttm>              <dttm>              <chr>            <chr>
 1 WIGOS_0-45… 2021-07-07 14:55:00 2022-02-21 14:15:14 0-454-2-AWSNAMI… "[ { \"…
 2 WIGOS_0-45… 2021-07-07 15:55:00 2022-02-21 14:15:14 0-454-2-AWSNAMI… "[ { \"…
 3 WIGOS_0-45… 2021-07-07 16:55:00 2022-02-21 14:15:14 0-454-2-AWSNAMI… "[ { \"…
 4 WIGOS_0-45… 2021-07-07 17:55:00 2022-02-21 14:15:14 0-454-2-AWSNAMI… "[ { \"…
 5 WIGOS_0-45… 2021-07-07 18:55:00 2022-02-21 14:15:14 0-454-2-AWSNAMI… "[ { \"…
 6 WIGOS_0-45… 2021-07-07 19:55:00 2022-02-21 14:15:15 0-454-2-AWSNAMI… "[ { \"…
 7 WIGOS_0-45… 2021-07-07 20:55:00 2022-02-21 14:15:15 0-454-2-AWSNAMI… "[ { \"…
 8 WIGOS_0-45… 2021-07-07 21:55:00 2022-02-21 14:15:15 0-454-2-AWSNAMI… "[ { \"…
 9 WIGOS_0-45… 2021-07-07 22:55:00 2022-02-21 14:15:15 0-454-2-AWSNAMI… "[ { \"…
10 WIGOS_0-45… 2021-07-07 23:55:00 2022-02-21 14:15:15 0-454-2-AWSNAMI… "[ { \"…
# … with 3 more variables: observations <chr>, id <chr>, geometry <POINT [°]>
[ ]:

Using QGIS

Overview

This section provides examples of interacting with wis2box API using QGIS.

QGIS is a free and open-source cross-platform desktop GIS application that supports viewing, editing, and analysis of geospatial data. QGIS supports numerous format and encoding standards, which enables plug-and-play interoperability with wis2box data and discovery metadata.

QGIS desktop GIS application
Accessing the discovery catalogue

QGIS provides support for the OGC API - Records standard (discovery). To interact with the wis2box discovery catalogue:

  • from the QGIS menu, select Web -> MetaSearch -> MetaSearch

  • click the “Services” tab

  • click “New”

  • enter a name for the discovery catalogue endpoint

  • enter the URL to the discovery catalogue endpoint (i.e. http://localhost:8999/oapi/collections/discovery-metadata)

  • ensure “Catalogue Type” is set to “OGC API - Records”

  • click “OK”

QGIS MetaSearch new connection

This adds the discovery catalogue to the MetaSearch catalogue registry. Click “Service Info” to display the properties of the discovery catalogue service metadata.

To search the discovery catalogue, click the “Search” tab, which will provide the ability to search for metadata records by bounding box and/or full text search. Click the “Search” button to search the discovery catalogue and visualize search results. Clicking on metadata records in the search result table will show footprints on the map to help provide the location of the search result. Double-clicking a search result will show the entire metadata record.

QGIS MetaSearch search

Note

For more information on working with catalogues, consult the official QGIS MetaSearch documentation.

Visualizing stations

QGIS provides support for the OGC API - Features standard (access). To interact with the wis2box API:

  • from the QGIS menu, select Layer -> Add Layer -> Add WFS Layer…

  • click “New”

  • enter a name for the API endpoint

  • enter the URL to the API endpoint (i.e. http://localhost:8999/oapi)

  • under “WFS Options”, set “Version” to “OGC API - Features”

  • click “OK”

  • click “Connect”

QGIS OGC API - Features new connection

A list of collections is displayed. Select the “Stations” collection and click “Add”. The Stations collection is now added to the map. To further explore:

  • click on the “Identify” (i) and click on a station to display station properties

  • select Layer -> Open Attribute Table to open all stations in a tabular view

QGIS OGC API - Features stations

Note that the same QGIS workflow can be executed for any other collection listed from wis2box API.

Note

For more information on working with OGC API - Features, consult the official QGIS WFS documentation.

Summary

The above examples provide a number of ways to utilize the wis2box API from the QGIS desktop GIS application.

PubSub

Downloading data from WIS2 using pywis-pubsub

Overview

This section provides examples of using the pywis-pubsub tool to download data from WIS2 global services. WIS2 global services include a Global Broker that provides users the ability to subscribe to data (via topics) and download to their local environment / workstation / decision support system from the WIS2 Global Cache.

The pywis-pubsub tool

This repository includes docker files to help you subscribe and download data from the WIS2 network, by using pywis-pubsub inside the wis2box-subscribe-download container.

pywis-pubsub is a Python package that provides publish, subscription and download capability of data from WIS2 infrastructure services.

Before starting the wis2box-subscribe-download container, the default configuration (provided in docker/wis2box-subscribe-download/local.yml) must be updated, by defining the URL of the MQTT broker as well as the desired topics to subscribe to.

In addition, the storage path should be updated to specify where the downloaded should be saved to.

# fully qualified URL of broker
broker: mqtts://username:password@host:port

# whether to run checksum verification when downloading data (default true)
verify_data: true

# whether to validate broker messages (default true)
validate_message: true

# list of 1..n topics to subscribe to
topics:
    - 'cache/a/wis2/topic1/#'
    - 'cache/a/wis2/topic2/#'

# storage: filesystem
storage:
    type: fs
    options:
        path: /tmp/foo/bar

To start a continuous subscribe-and-download process, run the wis2box-subscribe-download container as follows (-d for detached mode, --build to ensure changes in local.yml are built into the container):

docker-compose -f docker/docker.subscribe-download.yml up -d --build

To stop the subscribe-and-download process, run the following command:

docker-compose -f docker/docker.subscribe-download.yml down
Running pywis-pubsub interactively

pywis-pubsub can also be run interactively from inside the wis2box main container as follows:

# login to wis2box main container
python3 wis2box-ctl.py login

# edit a local configuration by using docker/wis2box-subscribe-download/local.yml as a template
vi /data/wis2box/local.yml

# connect, and simply display data notifications
pywis-pubsub subscribe --config local.yml

# connect, and additionally download messages
pywis-pubsub subscribe --config local.yml --download

# connect, and filter messages by bounding box geometry
pywis-pubsub subscribe --config local.yml --bbox=-142,42,-52,84
Summary

The above examples provide examples of using pywis-pubsub to subscribe and download data from WIS2 global services.

Using Python and paho-mqtt

This example will use widely available and used python language and libraries to download some announcements, and then retrieve the corresponding data, using only the paho-mqtt client library, in addition to Python standard libraries.

[1]:
import json
import paho.mqtt.client as mqtt
import random
import urllib
import urllib.request


host='localhost'
user='wis2box'
password='wis2box'

r = random.Random()
clientId='MyQueueName'+ f"{r.randint(1,1000):04d}"
# number of messages to subscribe to.
messageCount = 0
messageCountMaximum = 5

# maximum size of data download to print.
sizeMaximumThreshold = 1023

The above imports the required modules. It is also assumed that localhost is set up and is publishing messages. Message queueing protocols provide real-time notification about availability of products.

The standard Python package used to subscribe to messages is paho-mqtt (paho.mqtt.client). The package uses callbacks.

Note that messageCount is used to limit the length of the demonstration (otherwise infinite, as it is a continuous flow).

Let’s investigate our callbacks.

[2]:
def sub_connect(client, userdata, flags, rc, properties=None):
    print("on connection to subscribe: ", mqtt.connack_string(rc))
    for s in ["origin/#"]:
        client.subscribe(s, qos=1)

The sub_connect callback needed is called when the connection is established, which required to subscribe to topics we are interested in (topics are: origin/#, where / is a topic separator and # is a wildcard for any tree of topics.

The qos=1 refers to Quality of Service, where 1 establishes reception of messages at least once. qos=1 is recommended.

The next callback is called every time a message is received, and decodes and prints the message.

To keep the output short for the demonstration, we limit the subscriber to a few messages.

[3]:
def sub_message(client, userdata, msg):
    """
    print messages received.  Exit on count received.
    """

    global messageCount,messageCountMaximum

    m = json.loads(msg.payload.decode('utf-8'))

    print(f"message {messageCount} topic: {msg.topic} received: {m}")
    print(f"message {messageCount} data: {getData(m)}")

    messageCount += 1

    if messageCount > messageCountMaximum:
        client.disconnect()
        client.loop_stop()

The message handler above calls the getData() (below). The messages themselves are usually announcements of data availability, but when data is small, they can include the data itself (inline) in the content field. Usually the message refers to the data using a link. Here is a routine to obtain the data given an announcement message:

[4]:
def getData(m, sizeMaximum=1000):
    """
    given a message, return the data it refers to
    """

    if 'size' in m and m['size'] > sizeMaximum:
        return f" data too large {m['size']} bytes"
    elif 'content' in m:
        if m['content']['encoding'] == 'base64':
            return b64decode(m['content']['value'])
        else:
            return m['content']['value'].encode('utf-8')
    else:
        url = m['baseUrl'] + '/' + m['relPath']
        with urllib.request.urlopen(url) as response:
            return response.read()

The calling code then registers the callbacks, connects to the broker, and starts the event loop:

[ ]:
client = mqtt.Client(client_id=clientId, protocol=mqtt.MQTTv5)
client.on_connect = sub_connect
client.on_message = sub_message
client.username_pw_set(user, password)
client.connect(host)

client.loop_forever()
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
on connection to subscribe:  Connection Accepted.
[ ]:

Running the examples

To be able to run these examples, one needs to start up a Jupyter Notebook environment. Below is an example of starting a Jupyter session:

git clone https://github.com/wmo-im/wis2box.git
cd docs/source/data-access
jupyter notebook --ip=0.0.0.0 --port=8888

When Jupyter starts up it may open a browser window for you. If not you would need to to point a browser at http://localhost:8888 to see the menu of notebooks available in this directory.

Summary

The above examples provide a number of ways to utilize the wis2box suite of services.

Extending wis2box

At its core, wis2box is a plugin architecture orchestrating all the required components of a node in the WIS 2.0 network. Driven by topic hierarchies, wis2box can be used to process and publish any type of geospatial data beyond the requirements of the WIS 2.0 itself.

In this section we will to explore how wis2box can be extended. wis2box plugin development requires knowledge of how to program in Python as well as Python’s packaging and module system.

Building your own data plugin

The heart of a wis2box data plugin is driven from the wis2box.data.base abstract base class (ABC) located in wis2box/data/base.py. Any wis2box plugin needs to inherit from wis2box.data.base.BaseAbstractData. A minimal example can be found below:

from datetime import datetime
from wis2box.data.base import BaseAbstractData

class MyCoolData(BaseAbstractData):
    """Observation data"""
    def __init__(self, defs: dict) -> None:
        super().__init__(defs)

    def transform(self, input_data: Path) -> bool:
        # transform data
        # populate self.output_data with a dict as per:
        self.output_data = {
            'c123': {
                '_meta': {
                    'identifier': 'c123',
                    'relative_filepath': '/path/to/item/',
                    'data_date': datetime_object
                },
                'bufr4': bytes(12356),
                'geojson': geojson_string
            }
        }
        return True

The key function that plugin needs to implement is the transform function. This function should return a True or False of the result of the processing, as well as populate the output_data property.

The output_data property is a dict of keys/values. Each key should be the identifier of the item, with the following values dict:

The _meta element can include the following:
  • identifier: identifier for report (WIGOS_<WSI>_<ISO8601>)

  • relative_filepath: path to data, required to publish data with BaseAbstractData.publish

  • geometry: GeoJSON geometry object, required to send geometry with WIS2.0 notification

  • md5: md5 checksum of encoded data

  • wigos_station_identifier: WIGOS identifier

  • data_date: (as Python datetime objects) based on the observed datetime

  • originating_centre: Originating centre (see Common code table C11)

  • data_category: Category of data, see BUFR Table A

  • <format-extension>: 1..n properties for each format representation, with the key being the filename extension. The value of this property can be a string or bytes, depending on whether the underlying data is ASCII or binary, for example

Packaging

The next step is assembling your plugin using standard Python packaging. All plugin code and configuration files should be made part of the package so that it can operate independently when running in wis2box. For distribution and installation, you have the following options:

  • publish to the Python Package Index (PyPI) and install in the wis2node container with pip3 install wis2box-mypackage

  • git clone or download your package, and install via python3 setup.py install

See the Python packaging tutorial or Cookiecutter PyPackage for guidance and templates/examples.

Note

It is recommended to name your wis2box packages with the convention wis2box-MYPLUGIN-NAME, as well as adding the keywords/topics wis2box and plugin to help discovery on platforms such as GitHub.

Integration

Once your package is installed on the wis2box container, the data mappings need to be updated to connect your plugin to a topic hierarchy. See Data mappings for more information.

An example plugin for proof of concept can be found in https://github.com/wmo-cop/wis2box-csv-observations

Example plugins

The following plugins provide useful examples of wis2box plugins implemented by downstream applications.

Plugin(s)

Organization/Project

Description

wis2box-csv-observations

WMO

plugin for CSV surface observation data

wis2box-pyopencdms-plugin

OpenCDMS

plugin for connecting the Open Climate Data Management System to wis2box

Development

wis2box is developed as a free and open source project on GitHub. The wis2box codebase can be found at https://github.com/wmo-im/wis2box.

Testing

wis2box continuous integration (CI) testing is managed by GitHub Actions. All commits and pull requests to wis2box trigger continuous integration (CI) testing on GitHub Actions.

GitHub Actions invokes functional testing as well as integration testing to ensure regressions.

Unit testing

Unit tests are in tests/unit.

Integration testing

Integration tests are in tests/integration/integration.py.

Functional testing

Functional tests are defined as part of GitHub Actions in .github/workflows/tests-docker.yml.

Versioning

wis2box follows the Semantic Versioning Specification (SemVer).

Code Conventions

Python code follows PEP8 coding conventions.

Contributing

wis2box is developed as a free and open source project on GitHub. Contributions to the project (documentation, bug fixes, enhancements, tests, etc.) are welcome and encouraged. Please consult the wis2box Contribution guidelines for more information.

Support

Please consult the wis2box Discussions for support with the project.

License

Software

                                 Apache License
                           Version 2.0, January 2004
                        http://www.apache.org/licenses/

   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION

   1. Definitions.

      "License" shall mean the terms and conditions for use, reproduction,
      and distribution as defined by Sections 1 through 9 of this document.

      "Licensor" shall mean the copyright owner or entity authorized by
      the copyright owner that is granting the License.

      "Legal Entity" shall mean the union of the acting entity and all
      other entities that control, are controlled by, or are under common
      control with that entity. For the purposes of this definition,
      "control" means (i) the power, direct or indirect, to cause the
      direction or management of such entity, whether by contract or
      otherwise, or (ii) ownership of fifty percent (50%) or more of the
      outstanding shares, or (iii) beneficial ownership of such entity.

      "You" (or "Your") shall mean an individual or Legal Entity
      exercising permissions granted by this License.

      "Source" form shall mean the preferred form for making modifications,
      including but not limited to software source code, documentation
      source, and configuration files.

      "Object" form shall mean any form resulting from mechanical
      transformation or translation of a Source form, including but
      not limited to compiled object code, generated documentation,
      and conversions to other media types.

      "Work" shall mean the work of authorship, whether in Source or
      Object form, made available under the License, as indicated by a
      copyright notice that is included in or attached to the work
      (an example is provided in the Appendix below).

      "Derivative Works" shall mean any work, whether in Source or Object
      form, that is based on (or derived from) the Work and for which the
      editorial revisions, annotations, elaborations, or other modifications
      represent, as a whole, an original work of authorship. For the purposes
      of this License, Derivative Works shall not include works that remain
      separable from, or merely link (or bind by name) to the interfaces of,
      the Work and Derivative Works thereof.

      "Contribution" shall mean any work of authorship, including
      the original version of the Work and any modifications or additions
      to that Work or Derivative Works thereof, that is intentionally
      submitted to Licensor for inclusion in the Work by the copyright owner
      or by an individual or Legal Entity authorized to submit on behalf of
      the copyright owner. For the purposes of this definition, "submitted"
      means any form of electronic, verbal, or written communication sent
      to the Licensor or its representatives, including but not limited to
      communication on electronic mailing lists, source code control systems,
      and issue tracking systems that are managed by, or on behalf of, the
      Licensor for the purpose of discussing and improving the Work, but
      excluding communication that is conspicuously marked or otherwise
      designated in writing by the copyright owner as "Not a Contribution."

      "Contributor" shall mean Licensor and any individual or Legal Entity
      on behalf of whom a Contribution has been received by Licensor and
      subsequently incorporated within the Work.

   2. Grant of Copyright License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      copyright license to reproduce, prepare Derivative Works of,
      publicly display, publicly perform, sublicense, and distribute the
      Work and such Derivative Works in Source or Object form.

   3. Grant of Patent License. Subject to the terms and conditions of
      this License, each Contributor hereby grants to You a perpetual,
      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
      (except as stated in this section) patent license to make, have made,
      use, offer to sell, sell, import, and otherwise transfer the Work,
      where such license applies only to those patent claims licensable
      by such Contributor that are necessarily infringed by their
      Contribution(s) alone or by combination of their Contribution(s)
      with the Work to which such Contribution(s) was submitted. If You
      institute patent litigation against any entity (including a
      cross-claim or counterclaim in a lawsuit) alleging that the Work
      or a Contribution incorporated within the Work constitutes direct
      or contributory patent infringement, then any patent licenses
      granted to You under this License for that Work shall terminate
      as of the date such litigation is filed.

   4. Redistribution. You may reproduce and distribute copies of the
      Work or Derivative Works thereof in any medium, with or without
      modifications, and in Source or Object form, provided that You
      meet the following conditions:

      (a) You must give any other recipients of the Work or
          Derivative Works a copy of this License; and

      (b) You must cause any modified files to carry prominent notices
          stating that You changed the files; and

      (c) You must retain, in the Source form of any Derivative Works
          that You distribute, all copyright, patent, trademark, and
          attribution notices from the Source form of the Work,
          excluding those notices that do not pertain to any part of
          the Derivative Works; and

      (d) If the Work includes a "NOTICE" text file as part of its
          distribution, then any Derivative Works that You distribute must
          include a readable copy of the attribution notices contained
          within such NOTICE file, excluding those notices that do not
          pertain to any part of the Derivative Works, in at least one
          of the following places: within a NOTICE text file distributed
          as part of the Derivative Works; within the Source form or
          documentation, if provided along with the Derivative Works; or,
          within a display generated by the Derivative Works, if and
          wherever such third-party notices normally appear. The contents
          of the NOTICE file are for informational purposes only and
          do not modify the License. You may add Your own attribution
          notices within Derivative Works that You distribute, alongside
          or as an addendum to the NOTICE text from the Work, provided
          that such additional attribution notices cannot be construed
          as modifying the License.

      You may add Your own copyright statement to Your modifications and
      may provide additional or different license terms and conditions
      for use, reproduction, or distribution of Your modifications, or
      for any such Derivative Works as a whole, provided Your use,
      reproduction, and distribution of the Work otherwise complies with
      the conditions stated in this License.

   5. Submission of Contributions. Unless You explicitly state otherwise,
      any Contribution intentionally submitted for inclusion in the Work
      by You to the Licensor shall be under the terms and conditions of
      this License, without any additional terms or conditions.
      Notwithstanding the above, nothing herein shall supersede or modify
      the terms of any separate license agreement you may have executed
      with Licensor regarding such Contributions.

   6. Trademarks. This License does not grant permission to use the trade
      names, trademarks, service marks, or product names of the Licensor,
      except as required for reasonable and customary use in describing the
      origin of the Work and reproducing the content of the NOTICE file.

   7. Disclaimer of Warranty. Unless required by applicable law or
      agreed to in writing, Licensor provides the Work (and each
      Contributor provides its Contributions) on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
      implied, including, without limitation, any warranties or conditions
      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
      PARTICULAR PURPOSE. You are solely responsible for determining the
      appropriateness of using or redistributing the Work and assume any
      risks associated with Your exercise of permissions under this License.

   8. Limitation of Liability. In no event and under no legal theory,
      whether in tort (including negligence), contract, or otherwise,
      unless required by applicable law (such as deliberate and grossly
      negligent acts) or agreed to in writing, shall any Contributor be
      liable to You for damages, including any direct, indirect, special,
      incidental, or consequential damages of any character arising as a
      result of this License or out of the use or inability to use the
      Work (including but not limited to damages for loss of goodwill,
      work stoppage, computer failure or malfunction, or any and all
      other commercial damages or losses), even if such Contributor
      has been advised of the possibility of such damages.

   9. Accepting Warranty or Additional Liability. While redistributing
      the Work or Derivative Works thereof, You may choose to offer,
      and charge a fee for, acceptance of support, warranty, indemnity,
      or other liability obligations and/or rights consistent with this
      License. However, in accepting such obligations, You may act only
      on Your own behalf and on Your sole responsibility, not on behalf
      of any other Contributor, and only if You agree to indemnify,
      defend, and hold each Contributor harmless for any liability
      incurred by, or claims asserted against, such Contributor by reason
      of your accepting any such warranty or additional liability.

   END OF TERMS AND CONDITIONS

   APPENDIX: How to apply the Apache License to your work.

      To apply the Apache License to your work, attach the following
      boilerplate notice, with the fields enclosed by brackets "[]"
      replaced with your own identifying information. (Don't include
      the brackets!)  The text should be enclosed in the appropriate
      comment syntax for the file format. We also recommend that a
      file or class name and description of purpose be included on the
      same "printed page" as the copyright notice for easier
      identification within third-party archives.

   Copyright [yyyy] [name of copyright owner]

   Licensed under the Apache License, Version 2.0 (the "License");
   you may not use this file except in compliance with the License.
   You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

Documentation

The documentation is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Indices and tables