Standard formats

The SDMX Information Model provides terms and concepts for data and metadata, but does not specify how that (meta)data is stored, represented, or serialized. Other parts of the SDMX standard describe formats for storing data, metadata, and structures.

sdmx.format captures information about these formats, including their versions and options/parameters. This information is used across other modules including sdmx.reader, sdmx.client, and sdmx.writer.

In general, the sdmx package:

  • reads most SDMX-CSV, SDMX-JSON 1.0, and SDMX-ML messages; see details in the individual sections below and the linked reader submodules.

  • writes certain SDMX-CSV and SDMX-ML formats; see details below and the linked .writer submodules.

  • is tested using collected specimens of messages in various formats, stored in the khaeru/sdmx-test-data Git repository. These are used to check that the code functions as intended, but can also be viewed to understand the data formats.

MEDIA_TYPES

SDMX media types.

Flag(*values)

Flag values for information about MediaType:

MediaType(label, base, , ], _version, flags, ...)

Structure of elements in MEDIA_TYPES.

list_media_types(**filters)

Return the string for each item in MEDIA_TYPES matching filters.

class sdmx.format.Flag(*values)[source]

Bases: IntFlag

Flag values for information about MediaType:

  • data: True if this format contains (meta)data. False if it contains (meta)data structures.

  • meta: True if this format contains metadata (or metadata structures). False otherwise.

  • ss: True if this format contains data that is structure-specific. This distinction is only relevant before SDMX 3.0.

  • ts: True if this format contains time-series data. This distinction is only relevant before SDMX 3.0.

data = 1[source]
meta = 2[source]
ss = 4[source]
ts = 8[source]
sdmx.format.MEDIA_TYPES = [application/vnd.sdmx.generic+xml; version=2.1, application/vnd.sdmx.genericdata+xml; version=2.1, application/vnd.sdmx.genericmetadata+xml; version=2.1, application/vnd.sdmx.generictimeseriesdata+xml; version=2.1, application/vnd.sdmx.schema+xml; version=2.1, application/vnd.sdmx.structure+xml; version=2.1, application/vnd.sdmx.structurespecificdata+xml; version=2.1, application/vnd.sdmx.structurespecificmetadata+xml; version=2.1, application/vnd.sdmx.structurespecifictimeseriesdata+xml; version=2.1, application/xml; version=2.1, text/xml; version=2.1, application/vnd.sdmx.data+xml; version=3.0.0, application/vnd.sdmx.structure+xml; version=3.0.0, application/vnd.sdmx.metadata+xml; version=2.0.0, application/vnd.sdmx.data+json; version=1.0.0, application/vnd.sdmx.data+json; version=2.0.0, application/vnd.sdmx.structure+json; version=1.0.0, application/vnd.sdmx.structure+json; version=2.0.0, application/vnd.sdmx.metadata+json; version=2.0.0, application/vnd.sdmx.draft-sdmx-json+json; version=1.0.0, draft-sdmx-json; version=1.0.0, text/json; version=1.0.0, application/vnd.sdmx.data+csv; version=1.0.0, application/vnd.sdmx.metadata+csv; version=2.0.0][source]

SDMX media types. Each record is an instance of MediaType.

class sdmx.format.MediaType(label: str, base: ~typing.Literal['csv', 'json', 'xml'], _version: dataclasses.InitVar[str | sdmx.format.Version], flags: ~sdmx.format.Flag = <Flag: 0>, full: str | None = None)[source]

Bases: object

Structure of elements in MEDIA_TYPES.

The str() of a MediaType is generally of the form:

application/vnd.sdmx.{label}+{base};version={version}

…unless full is provided, in which case label and base are ignored.

base: Literal['csv', 'json', 'xml'][source]

The base media type or file format.

flags: Flag = 0[source]
full: str | None = None[source]

Specify the full media type string.

property is_data: bool[source]
property is_meta: bool[source]
property is_structure_specific: bool[source]
property is_time_series: bool[source]
label: str[source]

Distinguishing part of the media type.

match(value: str, strict: bool = False) bool[source]

True if value matches the current MediaType.

version: Version[source]

Format version.

sdmx.format.list_media_types(**filters) list[MediaType][source]

Return the string for each item in MEDIA_TYPES matching filters.

SDMX-CSV

Reference: https://github.com/sdmx-twg/sdmx-csv; see in particular the file sdmx-csv-field-guide.md.

Based on Comma-Separated Value (CSV). The SDMX-CSV format is versioned differently from the overall SDMX standard:

  • SDMX-CSV 1.0 corresponds to SDMX 2.1. It supports only data and metadata, not structures. SDMX-CSV 1.0 files are recognizable by the header DATAFLOW in the first column of the first row.

    Added in version 2.9.0: Support for writing SDMX-CSV 1.0. See writer.csv.

    sdmx does not currently support reading SDMX-CSV 1.0.

  • SDMX-CSV 2.0.0 corresponds to SDMX 3.0.0. The format differs from and is not backwards compatible with SDMX-CSV 1.0. SDMX-CSV 2.0.0 files are recognizable by the header STRUCTURE in the first column of the first row.

    reader.csv supports reading SDMX-CSV 2.0.0.

    Added in version 2.19.0: Initial support for reading SDMX-CSV 2.0.0.

    writer.csv supports writing SDMX-CSV 2.0.0. Currently, only Keys.none is supported; passing any other value raises ValueError.

    Added in version 2.23.0: Initial support for writing SDMX-CSV 2.0.0.

Information about SDMX-CSV file formats.

class sdmx.format.csv.common.Attributes(*values)[source]

Attributes to include.

dataset = 8[source]

Attributes attached to the DataSet containing the Observations.

group_key = 4[source]

Attributes attached to any (0 or more) GroupKey associated with each Observation.

none = 0[source]

No attributes.

observation = 1[source]

Attributes attached to each Observation.

series_key = 2[source]

Attributes attached to any (0 or 1) SeriesKey associated with each Observation.

class sdmx.format.csv.common.CSVFormat[source]

Information about an SDMX-CSV format.

suffix: ClassVar[str] = 'csv'[source]

Preferred file name suffix.

class sdmx.format.csv.common.CSVFormatOptions(labels: Labels = Labels.id, time_format: TimeFormat = TimeFormat.original)[source]

SDMX-CSV format options.

These options and default values are common to SDMX-CSV 1.0, 2.0.0, and 2.1.0.

format[source]

alias of CSVFormat

labels: Labels = 1[source]

Types of labels included.

time_format: TimeFormat = 1[source]

Time format.

class sdmx.format.csv.common.Labels(*values)[source]

SDMX-CSV ‘labels’ parameter.

both = 2[source]

Display both the ID and the localized NameableArtefact.name.

id = 1[source]

Display only IdentifiableArtefact.id, for Dimension or DataAttribute in column headers and Code in data rows.

name = 3[source]

Display only the localized name. Not present in SDMX-CSV 1.0

class sdmx.format.csv.common.TimeFormat(*values)[source]

SDMX-CSV ‘timeFormat’ parameter.

normalized = 2[source]

TIME_PERIOD values are converted to the most granular ISO 8601 representation taking into account the highest frequency of the data in the message and the moment in time when the lower-frequency values were collected.

original = 1[source]

Values for any dimension or attribute with ID TIME_PERIOD are displayed as recorded.

sdmx.format.csv.common.kwargs_to_format_options(kwargs: dict, cls: type[CSVFormatOptions]) None[source]

Separate from kwargs any attributes of CSVFormatOptions.

SDMX-CSV 1.0 format.

class sdmx.format.csv.v1.FORMAT[source]
version: ClassVar[str] = '1.0'[source]

Format version.

class sdmx.format.csv.v1.FormatOptions(labels: Labels = Labels.id, time_format: TimeFormat = TimeFormat.original)[source]

Format options for SDMX-CSV version 1.0.

format[source]

alias of FORMAT

SDMX-CSV 2.x formats.

class sdmx.format.csv.v2.FORMAT[source]
version: ClassVar[str] = '2.0.0'[source]

Format version.

class sdmx.format.csv.v2.FormatOptions(labels: Labels = Labels.id, time_format: TimeFormat = TimeFormat.original, keys: Keys = Keys.none, custom_columns: list[bytes] = <factory>, delimiter: str = ', ', delimiter_sub: str = '')[source]

SDMX-CSV 2.x format options.

custom_columns: list[bytes][source]

“Custom columns” detected by Reader.inspect_header().

delimiter: str = ','[source]

CSV field delimiter.

delimiter_sub: str = ''[source]

SDMX-CSV “sub-field” delimiter.

format[source]

alias of FORMAT

keys: Keys = 1[source]

SDMX-CSV ‘keys’ parameter.

class sdmx.format.csv.v2.Keys(*values)[source]

SDMX-CSV 2.x ‘keys’ parameter.

both = 2[source]

Both obs and series.

none = 1[source]

No related columns.

obs = 3[source]

Include OBS_KEY column with key values for all dimension(s).

series = 4[source]

Include SERIES_KEY column with key values for all dimension(s) except the one(s) attached to each observation.

SDMX-JSON

Reference: https://github.com/sdmx-twg/sdmx-json

Based on JavaScript Object Notation (JSON). The SDMX-JSON format is versioned differently from the overall SDMX standard:

  • SDMX-JSON 1.0 corresponds to SDMX 2.1. It supports only data and not structures or metadata.

  • SDMX-JSON 2.0.0 corresponds to SDMX 3.0.0. It adds support for structures.

  • See reader.json.

Added in version 0.5: Support for reading SDMX-JSON 1.0.

SDMX-ML

Reference: https://github.com/sdmx-twg/sdmx-ml

Based on eXtensible Markup Language (XML). SDMX-ML can represent every class and property in the IM.

Added in version 2.11.0: Support for reading SDMX-ML 3.0.0.

class sdmx.format.xml.common.XMLFormat(model, base_ns: str, class_tag: Iterable[tuple[str, str]])[source]

Information about an SDMX-ML format.

class_for_tag(tag) type | None[source]

Return a message or model class for an XML tag.

ns_prefix(url) str[source]

Return the namespace prefix from NS given its full url.

qname(ns_or_name: str, name: str | None = None) QName[source]

Return a fully-qualified tag name in namespace ns.

tag_for_class(cls)[source]

Return an XML tag for a message or model class.

sdmx.format.xml.common.construct_schema(schema_dir: Path | None = None, version: str | Version = Version.2.1) XMLSchema[source]

Construct a lxml.etree.XMLSchema for SDMX-ML of the given version.

SDMXCommon.xsd includes the documentation:

XHTMLType allows for mixed content of text and XHTML tags. When using this type, one will have to provide a reference to the XHTML schema, since the processing of the tags within this type is strict, meaning that they are validated against the XHTML schema provided.

This function does so by inserting an <xs:import> element that refers to http://www.w3.org/2002/08/xhtml/xhtml1-strict.xsd, which is the URL given by https://www.w3.org/TR/xhtml1-schema. With the XMLSchema returned by this document, it is possible to validate <common:StructuredText> elements that represent XHTMLAttributeValue.

Format API

Common code for describing SDMX data formats.

class sdmx.format.common.Format[source]

Information about a SDMX data/file format.

Any concrete subclass corresponds to a specific version of a data/file format defined in a specific version of the SDMX standards.

suffix: ClassVar[str][source]

Preferred file name suffix.

version: ClassVar[str][source]

Format version.

class sdmx.format.common.FormatOptions[source]

Options for an SDMX data/file format.