Implementation notes#
sdmx
aims for a precise, Pythonic, and useful implementation of the SDMX standards.
This means:
Classes and their attributes have the same names, types, cardinality that appear in the standard.
Where the standard uses non-Pythonic naming conventions (for instance, “dimensionAtObservation”),
sdmx
follows the PEP-8 naming conventions (for instance, “dimension_at_observation” for an attribute).Where the standard is ambiguous or imprecise itself, implementation (for instance, naming) choices in
sdmx
are clearly labelled.
Extensions, additional features, and conveniences in
sdmx
that do not appear in the standards are clearly labeled.All behaviour visible “in the wild”—that is, from publicly available data sources and files—is supported, so long as it is verifiably standards compliant.
This page gives brief explanations of how this implementation is achieved. Although this page is organized (see the contents in the sidebar) to correspond to the standards, it (again) does not restate them or set out to explain all their details. For those purposes, see Resources; or the Walkthrough, which includes some incidental explanations.
SDMX standard versions 2.0, 2.1, and 3.0#
Multiple versions of the SDMX standards have been adopted:
2.0 in November 2005.
2.1 in August 2011; published at the International Standards Organization (ISO) in January 2013; and revised multiple times since.
3.0 in October 2021.
The standards are available on the SDMX website: https://sdmx.org/?page_id=5008
In SDMX 2.1, sections of the standards were numbered from 1 to 7. For instance, the Information model (SDMX-IM) is Section 2.
From SDMX 3.0, some of these section numbers have been removed. For instance, SDMX-ML was described in SDMX 2.1 sections 3A and 3B; in SDMX 3.0, these section numbers are no longer used, replaced with a reference to the SDMX Technical Working Group (TWG) Git repository at sdmx-twg/sdmx-ml .
Some of these sections or sub-standards are versioned differently from the overall standard. See in particular SDMX-CSV, SDMX-JSON, and SDMX-REST web service API below.
For the current Python package, sdmx
:
SDMX 2.0 is not implemented, and no implementation is currently planned.
Some data providers still exist which only offer SDMX-ML 2.0 and/or an SDMX 2.0 REST web service. These implementations of SDMX 2.0 can be incomplete, inconsistent, or not fully compliant. This makes it more difficult and costly to support them.
While no SDMX 2.0 implementation is planned, contributions from new developers are possible and welcome.
SDMX 2.1 and 3.0 are implemented as described on this page, with exhaustive implementation as the design goal for
sdmx
.For SDMX 3.0 specifically, as of v2.14.0
sdmx
implements:The SDMX 3.0 information model (
model.v30
), to the same extent as SDMX 2.1.Reading of SDMX-ML 3.0 (
reader.xml.v30
).Construction of URLs and querying SDMX-REST API v2.1.0 data sources (
rest.v30
).
This implies the following are not yet supported:
Writing SDMX-ML 3.0.
Reading and writing SDMX-JSON 2.0 (see SDMX-JSON).
Follow the What’s new?; #87; and other GitHub issues and pull requests for details. Please open an issue on GitHub to report examples of real-world SDMX 3.0 web services examples and specimens of data that can be added.
Information model (SDMX-IM)#
Reference:
In general:
sdmx.model.common
implements:Classes that are fully identical in the SDMX 2.1 and 3.0 information models.
Base classes like
BaseDataStructureDefinition
that contain common attributes and features shared by SDMX 2.1 and 3.0 classes that differ in some ways. These classes should not be instantiated or used directly, except for type checking and hinting.
sdmx.model.v21
andsdmx.model.v30
contain:Classes that only appear in one version of the information models or other other.
Concrete implementations of common base classes—for instance
v21.DataStructureDefinition
andv30.DataStructureDefinition
—with the features specific to each version of the information model.
Python dataclasses
and type hinting are used to enforce the types of attributes that reference instances of other classes.
Some classes have convenience attributes not mentioned in the spec, to ease navigation between related objects.
These are marked “sdmx
extension not in the IM.”
Abstract classes and data types#
Many classes inherit from one of the following.
For example, every Code
is a NameableArtefact
; [2] this means it has name and description attributes. Because every NameableArtefact
is an IdentifiableArtefact
, a Code also has id, URI, and URN attributes.
AnnotableArtefact
has a list of
annotations
.Each annotation has
id
,title
,type
, andurl
attributes, as well as atext
.The Annotation text attribute is an
InternationalString
with zero or morelocalizations
in different locales. This provides support for internationalization of SDMX structures and metadata in multiple languages.
IdentifiableArtefact
is “annotable”; this means it is a subclass of
AnnotableArtefact
and also has the annotations attribute.
The
id
uniquely identifies the object against others of the same type in a SDMX message. The URI and URN are globally unique. See Wikipedia for a discussion of the differences between the two.NameableArtefact
has a
name
anddescription
, bothInternationalString
, andis identifiable, therefore also annotable.
VersionableArtefact
has a
version
number,may be valid between certain times (
valid_from
,valid_to
), andis nameable, identifiable, and annotable.
MaintainableArtefact
is under the authority of a particular
maintainer
, andis versionable, nameable, identifiable, and annotable.
In an SDMX message, a maintainable object might not be given in full; only as a reference (with
is_external_reference
set toTrue
). If so, it might have astructure_url
, where the maintainer provides more information about the object.
The API reference for sdmx.model
shows the parent classes for each class, to describe whether they are maintainable, versionable, nameable, identifiable, and/or annotable.
Items and schemes#
ItemScheme
,Item
These abstract classes allow for the creation of flat or hierarchical taxonomies.
ItemSchemes are maintainable (see above); their
items
is a collection of Items. See the class documentation for details.
Data#
Observation
A single data point/datum.
The value is stored as the
Observation.value
attribute.DataSet
A collection of Observations, SeriesKeys, and/or GroupKeys.
Note
There are no ‘Series’ or ‘Group’ classes in the IM!
Instead, the idea of ‘data series’ within a DataSet is modeled as:
SeriesKeys and GroupKeys are associated with a DataSet.
Observations are each associated with one SeriesKey and, optionally, referred to by one or more GroupKeys.
One can choose to think of a SeriesKey and the associated Observations, collectively, as a ‘data series’. But, in order to avoid confusion with the IM,
sdmx
does not provide ‘Series’ or ‘Group’ objects.sdmx
provides:the
DataSet.series
andDataSet.group
mappings from SeriesKey or GroupKey (respectively) to lists of Observations.DataSet.obs
, which is a list of all observations in the DataSet.
Depending on its structure, a DataSet may be flat, cross-sectional or time series.
Key
Values (
Key.values
) for one or more Dimensions. The meaning varies:- Ordinary Keys, e.g.
Observation.dimension
The dimension(s) varying at the level of a specific observation.
SeriesKey
The dimension(s) shared by all Observations in a conceptual series.
GroupKey
The dimension(s) comprising the group. These may be a subset of all the dimensions in the DataSet, in which case all matching Observations are considered part of the ‘group’—even if they are associated with different SeriesKeys.
GroupKeys are often used to attach AttributeValues; see below.
- Ordinary Keys, e.g.
AttributeValue
Value (
AttributeValue.value
) for a DataAttribute (AttributeValue.value_for
).May be attached to any of: DataSet, SeriesKey, GroupKey, or Observation. In the first three cases, the attachment means that the attribute applies to all Observations associated with the object.
Data structures#
Concept
,ConceptScheme
An abstract idea or general notion, such as ‘age’ or ‘country’.
Concepts are one kind of Item, and are collected in an ItemScheme subclass called ConceptScheme.
Dimension
,DataAttribute
These are
Components
of a data structure, linking a Concept (concept_identity
) to its Representation (local_representation
); see below.A component can be either a DataAttribute that appears as an AttributeValue in data sets; or a Dimension that appears in Keys.
Representation
,Facet
For example: the concept ‘country’ can be represented as:
as a value of a certain type (e.g. ‘Canada’, a
str
), called a Facet;using a Code from a specific CodeList (e.g. ‘CA’); multiple lists of codes are possible (e.g. ‘CAN’). See below.
DataStructureDefinition
(DSD)Collects structures used in data sets and data flows. These are stored as
dimensions
,attributes
,group_dimensions
, andDataStructureDefinition.measures
.For example,
dimensions
is aDimensionDescriptor
object that collects a number of Dimensions in a particular order. Data that is “structured by” this DSD must have all the described dimensions.See the API documentation for details.
Metadata#
Code
,Codelist
…
Category
,CategoryScheme
,Categorisation
Categories serve to classify or categorize things like data flows, e.g. by subject matter.
A
Categorisation
links the thing to be categorized, e.g., a DataFlowDefinition, to a particular Category.
Constraints#
v21.Constraint
,ContentConstraint
Classes that specify a subset of data or metadata to, for example, limit the contents of a data flow.
A ContentConstraint may have:
Zero or more
CubeRegion
stored atdata_content_region
.Zero or one
DataKeySet
stored atdata_content_keys
.
Currently,
ContentConstraint.to_query_string()
, used byClient.get()
to validate keys based on a data flow definition, only usesdata_content_region
, if any.data_content_keys
are ignored. None of the data sources supported bysdmx
appears to use this latter form.
File formats#
The IM provides terms and concepts for data and metadata, but does not specify how that (meta)data is stored or represented.
The SDMX standards include multiple formats for storing data, metadata, and structures.
In general, sdmx
:
Reads most SDMX-ML 2.1 and 3.0 and SDMX-JSON 1.0 messages.
Uses collected specimens of messages in various formats, stored in the khaeru/sdmx-test-data Git repository. These are used by the test suite to check that the code functions as intended, but can also be viewed to understand the data formats.
SDMX-ML#
Reference: sdmx-twg/sdmx-ml
Based on eXtensible Markup Language (XML). SDMX-ML can represent every class and property in the IM.
An SDMX-ML document contains exactly one
Message
. Seesdmx.message
for the different classes of Messages and their attributes.
Added in version 2.11.0: Support for reading SDMX-ML 3.0.
SDMX-JSON#
Reference: sdmx-twg/sdmx-json
Based on JavaScript Object Notation (JSON). The SDMX-JSON format is versioned differently from the overall SDMX standard:
SDMX-JSON 1.0 corresponds to SDMX 2.1. It supports only data and not structures or metadata.
SDMX-JSON 2.0.0 corresponds to SDMX 3.0.0. It adds support for structures.
See
reader.json
.
Added in version 0.5: Support for reading SDMX-JSON 1.0.
SDMX-CSV#
Reference: sdmx-twg/sdmx-csv; see in particular the file sdmx-csv-field-guide.md.
Based on Comma-Separated Value (CSV). The SDMX-CSV format is versioned differently from the overall SDMX standard:
SDMX-CSV 1.0 corresponds to SDMX 2.1. It supports only data and metadata, not structures. SDMX-CSV 1.0 files are recognizable by the header
DATAFLOW
in the first column of the first row.Added in version 2.9.0: Support for writing SDMX-CSV 1.0. See
writer.csv
.sdmx
does not currently support reading SDMX-CSV 1.0.SDMX-CSV 2.0.0 corresponds to SDMX 3.0.0. The format differs from and is not backwards compatible with SDMX-CSV 1.0. SDMX-CSV 2.0.0 files are recognizable by the header
STRUCTURE
in the first column of the first row.Added in version 2.19.0: Initial support for reading SDMX-CSV 2.0.0. See
reader.csv
.sdmx
does not currently support writing SDMX-CSV 2.0.0.
SDMX-REST web service API#
The SDMX standards describe both RESTful and SOAP web service APIs.
sdmx
does not support SDMX-SOAP, and no support is planned.
See Resources for the SDMG Technical Working Group’s specification of the REST API. The help materials from many data providers—for instance, ESTAT: Eurostat and related and ECB: European Central Bank—provide varying descriptions and examples of constructing query URLs and headers. These generally elaborate the SDMX standards, but in some cases also document source-specific quirks and errata.
The SDMX-REST web service API is versioned differently from the overall SDMX standard:
SDMX-REST API v1.5.0 and earlier corresponding to SDMX 2.1 and earlier.
SDMX-REST API v2.0.0 and later corresponding to SDMX 3.0 and later.
sdmx
aims to support:
SDMX-REST API versions in the 1.x series from v1.5.0 and later
SDMX-REST API versions in the 2.x series from v2.1.0 and later.
Data retrieved in SDMX 2.1 and 3.0 formats. Some existing services offer a parameter to select SDMX 2.1 or 2.0 format;
sdmx
does not support the latter. Other services only provide SDMX 2.0-formatted data; these cannot be used withsdmx
(see above).
Client
constructs valid URLs (using URL
subclasses v21.URL
and v30.URL
).
For example,
Client.get()
automatically adds the HTTP headerAccept: application/vnd.sdmx.structurespecificdata+xml;
when astructure=...
argument is provided and the data source supports this content type.v21.URL
supplies some default parameters in certain cases.Query parameters and headers can always be specified exactly via
Client.get()
.
Source
and its subclasses handle documented or well-known idiosyncrasies/quirks/errata of the web services operated by different agencies, such as:
parameters or headers that are not supported, or must take very specific, non-standard values, or
unusual ways of returning data.
See Data source limitations, Data sources, and the source code for the details for each data source.
Please open an issue with reports of or information about data source–specific quirks that may be in scope for sdmx
to handle, or a pull request to contribute code.