Data sources

SDMX makes a distinction between data providers and sources:

  • a data provider is the original publisher of statistical information and metadata.

  • a data source is a specific web service that provides access to statistical information.

Each data source might aggregate and provide data or metadata from multiple data providers. Or, an agency might operate a data source that only contains information they provide themselves; in this case, the source and provider are identical.

sdmx identifies each data source using a string such as 'ABS', and has built-in support for a number of data sources. Use list_sources() to list these. Read the following sections, or the file sources.json in the package source code, for more details.

sdmx also supports adding other data sources; see add_source() and Source.

Data source limitations

Each SDMX web service provides a subset of the full SDMX feature set, so the same request made to two different sources may yield different results, or an error message.

A key difference is between sources offering SDMX-ML and SDMX-JSON APIs. SDMX-JSON APIs do not support metadata, or structure queries; only data queries.

Note

For JSON APIs, start by browsing the source’s website to retrieve the dataflow you’re interested in. Then try to fine-tune a planned data request by providing a valid key (= selection of series from the dataset). Because structure metadata is unavailable, sdmx cannot automatically validate keys.

In order to anticipate and handle these differences:

  1. add_source() accepts “data_content_type” and “supported” keys. For example:

    [
      {
        "id": "ABS",
        "data_content_type": "JSON"
      },
      {
        "id": "UNESCO",
        "unsupported": ["datastructure"]
      },
    ]
    

    sdmx will raise NotImplementedError on an attempt to query the “datastructure” API endpoint of either of these data sources.

  2. sdmx.source includes adapters (subclasses of Source) with hooks used when querying sources and interpreting their HTTP responses. These are documented below: ABS, ESTAT, and SGR.

ABS: Australian Bureau of Statistics

SDMX-JSON — Website

class sdmx.source.abs.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]
handle_response(response, content)[source]

Handle ABS’ own text/html error page for some endpoints.

ESTAT: Eurostat

SDMX-ML — Website

  • Thousands of dataflows on a wide range of topics.

  • No categorisations available.

  • Long response times are reported. Increase the timeout attribute to avoid timeout exceptions.

  • Does not return DSDs for dataflow requests with the references='all' query parameter.

class sdmx.source.estat.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]

Handle Eurostat’s mechanism for large datasets.

For some requests, ESTAT returns a DataMessage that has no content except for a <footer:Footer> element containing a URL where the data will be made available as a ZIP file.

To configure finish_message(), pass its get_footer_url argument to Client.get().

New in version 0.2.1.

finish_message(message, request, get_footer_url=(30, 3), **kwargs)[source]

Handle the initial response.

This hook identifies the URL in the footer of the initial response, makes a second request (polling as indicated by get_footer_url), and returns a new DataMessage with the parsed content.

Parameters

get_footer_url ((int, int)) – Tuple of the form (seconds, attempts), controlling the interval between attempts to retrieve the data from the URL, and the maximum number of attempts to make.

handle_response(response, content)[source]

Handle the polled response.

The request for the indicated ZIP file URL returns an octet-stream; this handler saves it, opens it, and returns the content of the single contained XML file.

modify_request_args(kwargs)[source]

Modify arguments used to build query URL.

This hook is called by Client.get() to modify the keyword arguments before the query URL is built.

The default implementation handles requests for ‘structure-specific data’ by adding an HTTP ‘Accepts:’ header when a ‘dsd’ is supplied as one of the kwargs.

See sgr.Source.modify_request_args() for an example override.

Returns

Return type

None

ECB: European Central Bank

SDMX-ML — Website

  • Supports categorisations of data-flows.

  • Supports preview_data and series-key based key validation.

  • In general short response times.

ILO: International Labour Organization

SDMX-ML — Website

  • sdmx.source.ilo.Source handles some particularities of the ILO web service. Others that are not handled:

    • Data flow IDs take on the role of a filter. E.g., there are dataflows for individual countries, ages, sexes etc. rather than merely for different indicators.

    • The service returns 413 Payload Too Large errors for some queries, with messages like: “Too many results, please specify codelist ID”. Test for sdmx.exceptions.HTTPError (= requests.exceptions.HTTPError) and/or specify a resource_id.

  • It is highly recommended to read the API guide.

class sdmx.source.ilo.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]
modify_request_args(kwargs)[source]

Handle two limitations of ILO’s REST service.

  1. Service returns SDMX-ML 2.0 by default, whereas sdmx only supports SDMX-ML 2.1. Set ?format=generic_2_1 query parameter.

  2. The service does not support values ‘parents’, ‘parentsandsiblings’ (the default), and ‘all’ for the references query parameter. Override the default with ?references=none.

    Note

    Valid values are: none, parents, parentsandsiblings, children, descendants, all, or a specific structure reference such as ‘codelist’.

IMF: International Monetary Fund’s “SDMX Central” source

SDMX-ML — Website

  • Subset of the data available on http://data.imf.org.

  • Supports series-key-only and hence dataset-based key validation and construction.

INEGI: National Institute of Statistics and Geography (Mexico)

SDMX-ML — Website.

  • Spanish name: Instituto Nacional de Estadística y Geografía.

INSEE: National Institute of Statistics and Economic Studies (France)

SDMX-ML — Website

  • French name: Institut national de la statistique et des études économiques.

class sdmx.source.insee.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]
modify_request_args(kwargs)[source]

Supply explicit provider agency ID for INSEE.

This web service accepts either “ALL” or “FR1” as a provider agency ID for structure endpoints, but not “INSEE” (see #21).

This hook sets the provider to “ALL” for structure queries if it is not given explicitly.

ISTAT: National Institute of Statistics (Italy)

SDMX-ML — Website

  • Italian name: Istituto Nazionale di Statistica.

  • Similar server platform to Eurostat, with similar capabilities.

LSD: National Institute of Statistics (Lithuania)

SDMX-ML — Website

  • Lithuanian name: Lietuvos statistikos.

  • This web service returns the non-standard HTTP content-type “application/force-download”; sdmx replaces it with “application/xml”.

NB: Norges Bank (Norway)

SDMX-ML — Website

  • Few dataflows. So do not use categoryscheme.

  • It is unknown whether NB supports series-keys-only.

NBB: National Bank of Belgium (Belgium)

SDMX-JSON — Website — API documentation (en)

  • French name: Banque Nationale de Belgique.

  • Dutch name: Nationale Bank van België.

  • As of 2020-12-13, this web service (like STAT_EE) uses server software that serves SDMX-ML 2.0 or SDMX-JSON. Since sdmx does not support SDMX-ML 2.0, the package is configured to use the JSON endpoint.

  • The web service returns a custom HTML error page rather than an SDMX error message for certain queries or an internal error. This appears as: ValueError: can't determine a SDMX reader for response content type 'text/html; charset=utf-8'

OECD: Organisation for Economic Cooperation and Development

SDMX-JSON — Website

SGR: SDMX Global Registry

SDMX-ML — Website

class sdmx.source.sgr.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]
handle_response(response, content)[source]

SGR responses do not specify content-type; set it directly.

modify_request_args(kwargs)[source]

SGR is a data source but not a data provider.

Override the agency argument by setting agency='all' to retrieve all data republished by SGR from different providers.

SPC: Pacific Data Hub DotStat by the Pacific Community (SPC)

SDMX-ML — API documentationWeb interface

  • French name: Communauté du Pacifique

STAT_EE: Statistics Estonia (Estonia)

SDMX-JSON — Website (et) — API documentation (en), (et)

  • Estonian name: Eesti Statistika.

  • As of 2020-12-13, this web service (like NBB) uses server software that serves SDMX-ML 2.0 or SDMX-JSON. Since sdmx does not support SDMX-ML 2.0, the package is configured to use the JSON endpoint.

UNSD: United Nations Statistics Division

SDMX-ML — Website

  • Supports preview_data and series-key based key validation.

UNESCO: UN Educational, Scientific and Cultural Organization

SDMX-ML — Website

  • Free registration required; user credentials must be provided either as parameter or HTTP header with each request.

Warning

An issue with structure-specific datasets has been reported. It seems that Series are not recognized due to some oddity in the XML format.

UNICEF: UN Children’s Fund

SDMX-ML or SDMX-JSON — API documentationWeb interface

  • This source always returns structure-specific messages for SDMX-ML data queries; even when the HTTP header Accept: application/vnd.sdmx.genericdata+xml is given.

  • The example query from the UNICEF API documentation (also used in the sdmx test suite) returns XML like:

    <mes:Structure structureID="UNICEF_GLOBAL_DATAFLOW_1_0" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=UNICEF:GLOBAL_DATAFLOW(1.0):ObsLevelDim:TIME_PERIOD" dimensionAtObservation="TIME_PERIOD">
      <com:StructureUsage>
        <Ref agencyID="UNICEF" id="GLOBAL_DATAFLOW" version="1.0"/>
      </com:StructureUsage>
    </mes:Structure>
    

    The corresponding DSD actually has the ID DSD_AGGREGATE, which is not obvious from the message. To retrieve the DSD—which is necessary to parse a data message—first query this data flow by ID, and select the DSD from the returned message:

    In [1]: import sdmx
    
    In [2]: msg = sdmx.Client("UNICEF").dataflow("GLOBAL_DATAFLOW")
    
    In [3]: msg
    Out[3]: 
    <sdmx.StructureMessage>
      <Header>
        id: 'IREF531743'
        prepared: '2021-02-27T19:12:54+00:00'
        receiver: <Agency not_supplied>
        sender: <Agency UNICEF>
        source: 
        test: False
      response: <Response [200]>
      Codelist (29): CL_CONF_STATUS CL_SEX CL_UNIT_MULT CL_ADMIN_LEVEL CL_A...
      ConceptScheme (1): UNICEF_CONCEPTS
      ContentConstraint (1): CONSTR_SEX
      DataflowDefinition (1): GLOBAL_DATAFLOW
      DataStructureDefinition (1): DSD_AGGREGATE
      AgencyScheme (2): AGENCIES DATA_PROVIDERS
      ProvisionAgreement (1): GLOBAL_DATAFLOW_UNICEF_UNICEF
    
    In [4]: dsd = msg.structure[0]
    

    The resulting object dsd can be passed as an argument to a Client.get() data query. See the sdmx test suite for an example.

WB: World Bank Group “World Integrated Trade Solution”

SDMX-ML — Website

WB_WDI: World Bank Group “World Development Indicators”

SDMX-ML — Website

  • This web service also supports SDMX-JSON. To retrieve messages in this format, pass the HTTP Accept: header described on the service website.