Data sources

Contents

Data sources#

SDMX distinguishes:

  • a data provider —the original publisher or maintainer of statistical information and metadata.

  • a data source —a specific web service that provides access to SDMX content via a standard API.

A single data source might aggregate and provide data or metadata from many data providers. Or, an agency might operate a data source that only contains information they provide themselves; in this case, the source and provider are matched one-to-one.

sdmx has built-in support for a number of data sources, each identified with a string such as "ABS". Use list_sources() to list these, or see the file sources.json in the package source code.

https://khaeru.github.io/sdmx displays a summary of every SDMX-REST API endpoint for every data source known to sdmx; this summary is updated daily by an automatic run of the test suite. Read the following sections, for more details on how the limitations and quirks of particular sources are handled.

sdmx also supports adding other data sources; see add_source() and Source.

Data source limitations#

Each SDMX web service provides a subset of the full SDMX feature set, so the same request made to two different sources may yield different results, or an error message. In order to anticipate and handle these differences:

  1. add_source() accepts “data_content_type” and “supported” keys. For example:

    [
      {
        "id": "ABS",
        "data_content_type": "JSON"
      },
      {
        "id": "UNESCO",
        "supported": {"datastructure": false}
      },
    ]
    

    sdmx will raise NotImplementedError on an attempt to query the “datastructure” API endpoint of either of these data sources.

  2. sdmx.source includes adapters (subclasses of Source) with hooks used when querying sources and interpreting their HTTP responses. These are documented below, e.g. ABS, ESTAT, and SGR.

Handling and testing limitations and (un)supported endpoints#

As of version 2.5.0, sdmx handles service limitations as follows. Please open an issue if the supported endpoints or behaviour of a particular service appear to have changed.

  • source.Source.supports lists endpoints/resources that are not supported by any known web service.

  • sources.json contains supports: {"[resource]": false} for any endpoint where the service returns an HTTP 404 Not found response code. This means that the service fails to even give a proper 501 response (see below).

    Client.get() will refuse to query these sources at all, instead raising NotImplementedError. You can override this behaviour by giving force=True as an argument to get().

  • The test suite (test_sources) includes notation of all endpoints for which services return 400 Bad syntax or 501 Not implemented response codes. sdmx will make an actual query to these endpoints, but raise built-in Python exceptions that can be caught and handled by user code:

    • For a 501 response code, NotImplementedError is raised.

      This is behaviour fully compliant with the SDMX standard: the service accurately and honestly responds when a client makes a request that the server does not implement.

    • For a 400 response code, HTTPError is raised.

      Some of these “bad syntax” responses are erroneous: the service actually has a non-standard URL scheme or handling, different from the SDMX-REST standard. The Client is constructing a standards-compliant URL, but the service idiosyncratically rejects it. Handling these idiosyncrasies is currently out-of-scope for sdmx.

  • Because of the large number of services and endpoints, this matrix of support is only periodically updated. https://khaeru.github.io/sdmx includes all endpoints known to return a reply, even if the reply is an error message of some sort.

SDMX-JSON—only services#

A key difference is between sources offering SDMX-ML and SDMX-JSON content. Although the SDMX-JSON 2.0 format (corresponding to SDMX 3.0) includes structure messages, many web services that return SDMX-JSON still do not provide such content or support structure queries; only data queries. The SDMX-REST standard specifies how services should respond to the HTTP Accepts: header and return either SDMX-ML or SDMX-JSON, but implementation of this feature is inconsistent across known sources.

Where data structures are not available, sdmx cannot automatically construct keys. For such services, start by browsing the source’s website to identify a dataflow of interest. Then identify the key format and construct a key for the desired data request.

ABS: Australian Bureau of Statistics (SDMX-ML)#

SDMX-ML — Website

New in version 2.10.0.

ABS_JSON: Australian Bureau of Statistics (SDMX-JSON)#

SDMX-JSON — Website

class sdmx.source.abs_json.Source[source]#

BBK: German Federal Bank#

SDMX-ML — Website (en), (de)

New in version 2.5.0.

  • German name: Deutsche Bundesbank

  • The web service has some non-standard behaviour; see #82.

  • The version path component is not-supported for non-data endpoints. sdmx discards other values with a warning.

  • Some endpoints, including codelist, return malformed URNs and cannot be handled with sdmx.

class sdmx.source.bbk.Source[source]#

Work around non-standard behaviour of the BBK: German Federal Bank web service.

modify_request_args(kwargs)[source]#

Modify arguments used to build query URL.

This hook is called by Client.get() to modify the keyword arguments before the query URL is built.

The default implementation handles requests for ‘structure-specific data’ by adding an HTTP ‘Accepts:’ header when a ‘dsd’ is supplied as one of the kwargs.

See sgr.Source.modify_request_args() for an example override.

Return type:

None

BIS: Bank for International Settlements#

SDMX-ML — WebsiteAPI reference

New in version 2.5.0.

ECB: European Central Bank#

SDMX-ML — Website

  • Supports categorisations of data-flows.

  • Supports preview_data and series-key based key validation.

Changed in version 2.10.1: As of 2023-06-23 the ECB source is part of an “ECB Data Portal” that replaces an earlier “ECB Statistical Data Warehouse (SDW)” (documentation still available). The URL in sdmx is updated. Text on the ECB website (above) states that the previous URL (in sdmx ≤ 2.10.0) should continue to work until about 2024-06-23.

ILO: International Labour Organization#

SDMX-ML — Website

IMF: International Monetary Fund’s “SDMX Central” source#

SDMX-ML — Website

  • Subset of the data available on http://data.imf.org.

  • Supports series-key-only and hence dataset-based key validation and construction.

INEGI: National Institute of Statistics and Geography (Mexico)#

SDMX-ML — Website.

  • Spanish name: Instituto Nacional de Estadística y Geografía.

INSEE: National Institute of Statistics and Economic Studies (France)#

SDMX-ML — Website (en), (fr)

  • French name: Institut national de la statistique et des études économiques.

class sdmx.source.insee.Source[source]#
modify_request_args(kwargs)[source]#

Supply explicit provider agency ID for INSEE.

This web service accepts either “ALL” or “FR1” as a provider agency ID for structure endpoints, but not “INSEE” (see #21).

This hook sets the provider to “ALL” for structure queries if it is not given explicitly.

ISTAT: National Institute of Statistics (Italy)#

SDMX-ML — Website (en), (it)

  • Italian name: Istituto Nazionale di Statistica.

  • Similar server platform to Eurostat, with similar capabilities.

  • Distinct API endpoints are available for:

    • 2010 Agricultural census

    • 2011 Population and housing census

    • 2011 Industry and services census

    …see the above URLs for details.

LSD: National Institute of Statistics (Lithuania)#

SDMX-ML — Website

  • Lithuanian name: Lietuvos statistikos.

  • This web service returns the non-standard HTTP content-type “application/force-download”; sdmx replaces it with “application/xml”.

NB: Norges Bank (Norway)#

SDMX-ML — Website

  • Few data flows, so do not use category scheme.

  • It is unknown whether NB supports series-keys-only.

NBB: National Bank of Belgium (Belgium)#

SDMX-JSON — Website — API documentation (en)

  • French name: Banque Nationale de Belgique.

  • Dutch name: Nationale Bank van België.

  • As of 2020-12-13, this web service (like STAT_EE) uses server software that serves SDMX-ML 2.0 or SDMX-JSON. Since sdmx does not support SDMX-ML 2.0, the package is configured to use the JSON endpoint.

  • The web service returns a custom HTML error page rather than an SDMX error message for certain queries or an internal error. This appears as: ValueError: can't determine a SDMX reader for response content type 'text/html; charset=utf-8'

OECD: Organisation for Economic Cooperation and Development (SDMX-ML)#

SDMX-ML — Website, documentation

class sdmx.source.oecd.Source(id: str, url: str, name: str, headers: ~typing.Dict[str, ~typing.Any] = <factory>, data_content_type: ~sdmx.source.DataContentType = DataContentType.XML, versions: ~typing.Set[~sdmx.format.Version] = <factory>, supports: ~typing.Dict[str | ~sdmx.rest.common.Resource, bool] = <factory>)[source]#
modify_request_args(kwargs)[source]#

Supply explicit provider agency ID for OECD.

The structures and data flows from this provider use a variety of agency IDs—for example “OECD.SDD.TPS”—to identify a specific the organizational unit within the OECD that is responsible for each object. Queries requesting structures or data with agency ID “OECD” (strictly) may return few or zero results.

This hook sets the provider to “ALL” for structure queries if it is not given explicitly.

New in version 2.12.0.

OECD_JSON: Organisation for Economic Cooperation and Development (SDMX-JSON)#

SDMX-JSON — Website

Changed in version 2.12.0: Renamed from OECD.

sdmx.source.oecd_json.Client(*args, **kwargs) sdmx.client.Client[source]#

Work around OECD_JSON legacy SSL issues.

As of 2023-08-16 the OECD_JSON data source uses an old, insecure version of SSL/TLS that—with default SSL configuration on properly patched systems—raises a SSLError “UNSAFE_LEGACY_RENEGOTIATION_DISABLED”.

This function creates a Client using the workaround described at https://stackoverflow.com/a/71646353/ to allow connecting to this data source.

Warning

Using this workaround disables SSL configuration that is intended to mitigate against man-in-the-middle attacks as described in CVE-2009-3555. Use with caution: in particular, do not change the Source.url to use with data sources other than OECD_JSON.

class sdmx.source.oecd_json.HTTPSAdapter(ssl_context=None, **kwargs)[source]#

HTTPAdapter with custom SSLContext.

SGR: SDMX Global Registry#

SDMX-ML — Website

class sdmx.source.sgr.Source[source]#
handle_response(response, content)[source]#

SGR responses do not specify content-type; set it directly.

modify_request_args(kwargs)[source]#

SGR is a data source but not a data provider.

Override the agency argument by setting agency='all' to retrieve all data republished by SGR from different providers.

SPC: Pacific Data Hub DotStat by the Pacific Community (SPC)#

SDMX-ML — API documentationWeb interface

  • French name: Communauté du Pacifique

STAT_EE: Statistics Estonia (Estonia)#

SDMX-JSON — Website (et) — API documentation (en), (et)

  • Estonian name: Eesti Statistika.

  • As of 2023-05-19, the site displays a message:

    From March 2023 onwards, data in this database are no longer updated! Official statistics can be found in the database at andmed.stat.ee.

    The latter URL indicates an API is provided, but it is not an SDMX API, and thus not supported.

  • As of 2020-12-13, this web service (like NBB) uses server software that serves SDMX-JSON or SDMX-ML 2.0. The latter is not supported by sdmx (see SDMX standard versions 2.0, 2.1, and 3.0).

UNESCO: UN Educational, Scientific and Cultural Organization#

SDMX-ML — Website

  • Free registration required; user credentials must be provided either as parameter or HTTP header with each request.

Warning

An issue with structure-specific datasets has been reported. It seems that Series are not recognized due to some oddity in the XML format.

UNICEF: UN Children’s Fund#

SDMX-ML or SDMX-JSON — API documentationWeb interfaceData browser

  • This source always returns structure-specific messages for SDMX-ML data queries; even when the HTTP header Accept: application/vnd.sdmx.genericdata+xml is given.

  • UNICEF also serves data for the Countdown to 2030 initiative under a data flow with the ID CONSOLIDATED. The structures can be obtained by giving the provider argument to a structure query, and then used to query the data:

    import sdmx
    
    UNICEF = sdmx.Client("UNICEF")
    
    # Use the dataflow ID to obtain the data structure definition
    dsd = UNICEF.dataflow("CONSOLIDATED", provider="CD2030").structure[0]
    
    # Use the DSD to construct a query for indicator D5 (“Births”)
    client.data("CONSOLIDATED", key=dict(INDICATOR="D5"), dsd=dsd)
    
  • The example query from the UNICEF API documentation (also used in the sdmx test suite) returns XML like:

    <mes:Structure structureID="UNICEF_GLOBAL_DATAFLOW_1_0" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=UNICEF:GLOBAL_DATAFLOW(1.0):ObsLevelDim:TIME_PERIOD" dimensionAtObservation="TIME_PERIOD">
      <com:StructureUsage>
        <Ref agencyID="UNICEF" id="GLOBAL_DATAFLOW" version="1.0"/>
      </com:StructureUsage>
    </mes:Structure>
    

    Contrary to this, the corresponding DSD actually has the ID DSD_AGGREGATE, not GLOBAL_DATAFLOW. To retrieve the DSD—which is necessary to parse a data message—first query this data flow by ID, and select the DSD from the returned message:

    In [1]: import sdmx
    
    In [2]: msg = sdmx.Client("UNICEF").dataflow("GLOBAL_DATAFLOW")
    
    In [3]: msg
    Out[3]: 
    <sdmx.StructureMessage>
      <Header>
        id: 'IREF315756'
        prepared: '2024-03-20T10:46:15+00:00'
        receiver: <Agency not_supplied>
        sender: <Agency UNICEF>
        source: 
        test: False
      response: <Response [200]>
      Codelist (45): CL_CONF_STATUS CL_SEX CL_UNIT_MULT CL_ADMIN_LEVEL CL_A...
      ConceptScheme (1): UNICEF_CONCEPTS
      ContentConstraint (1): CONSTR_SEX
      DataflowDefinition (1): GLOBAL_DATAFLOW
      AgencyScheme (3): SDMX:AGENCIES UNICEF:AGENCIES UNICEF:DATA_PROVIDERS
      ProvisionAgreement (1): GLOBAL_DATAFLOW_UNICEF_UNICEF
      DataStructureDefinition (1): DSD_AGGREGATE
    
    In [4]: dsd = msg.structure[0]
    

    The resulting object dsd can be passed as an argument to a Client.get() data query. See the sdmx test suite for an example.

UNSD: United Nations Statistics Division#

SDMX-ML — Website

  • Supports preview_data and series-key based key validation.

WB: World Bank Group “World Integrated Trade Solution”#

SDMX-ML — Website

WB_WDI: World Bank Group “World Development Indicators”#

SDMX-ML — Website

  • This web service also supports SDMX-JSON. To retrieve messages in this format, pass the HTTP Accept: header described on the service website.

Source API#

This module defines Source and some utility functions. For built-in subclasses of Source used to provide sdmx’s built-in support for certain data sources, see Data sources.

class sdmx.source.Source[source]#

SDMX-IM RESTDatasource.

This class describes the location and features supported by an SDMX REST API data source/web service.

It also provides three hooks, with default implementations. Subclasses may override the hooks in order to handle specific features of different REST web services:

handle_response(response, content)

Handle response content of unknown type.

finish_message(message, request, **kwargs)

Postprocess retrieved message.

modify_request_args(kwargs)

Modify arguments used to build query URL.

This class should not be instantiated directly. Instead, use add_source(), and then create a new Client with the corresponding source ID.

data_content_type: DataContentType = 3[source]#

DataContentType indicating the type of data returned by the source.

finish_message(message, request, **kwargs)[source]#

Postprocess retrieved message.

This hook is called by Client.get() after a Message object has been successfully parsed from the query response.

See estat.Source.finish_message() for an example implementation.

get_url_class() Type[sdmx.rest.common.URL][source]#

Return a class for constructing URLs for this Source.

handle_response(response: Response, content: IOBase) Tuple[Response, IOBase][source]#

Handle response content of unknown type.

This hook is called by Client.get() only when the content cannot be parsed as XML or JSON.

See estat.Source.handle_response() and sgr.Source.handle_response() for example implementations.

headers: Dict[str, Any][source]#

Additional HTTP headers to supply by default with all requests.

id: str[source]#

id of the DataProvider.

modify_request_args(kwargs)[source]#

Modify arguments used to build query URL.

This hook is called by Client.get() to modify the keyword arguments before the query URL is built.

The default implementation handles requests for ‘structure-specific data’ by adding an HTTP ‘Accepts:’ header when a ‘dsd’ is supplied as one of the kwargs.

See sgr.Source.modify_request_args() for an example override.

Return type:

None

name: str[source]#

Human-readable name of the data source.

supports: Dict[str | Resource, bool][source]#

Mapping from Resource values to bool indicating support for SDMX-REST endpoints and features. If not supplied, the defaults from SDMX_ML_SUPPORTS are used.

Two additional keys are valid:

  • "preview"=True if the source supports ?detail=serieskeysonly. See preview_data().

  • "structure-specific data"=True if the source can return structure- specific data messages.

url: str[source]#

Base URL (API entry point) for queries.

versions: Set[Version][source]#

.Version["2.1"] only.

Type:

SDMX REST API version(s) supported. Default

sdmx.source.list_sources()[source]#

Return a sorted list of valid source IDs.

These can be used to create Client instances.

sdmx.source.load_package_sources()[source]#

Discover all sources listed in sources.json.