Data sources¶
SDMX makes a distinction between data providers and sources:
a data provider is the original publisher of statistical information and metadata.
a data source is a specific web service that provides access to statistical information.
Each data source might aggregate and provide data or metadata from multiple data providers. Or, an agency might operate a data source that only contains information they provide themselves; in this case, the source and provider are identical.
sdmx identifies each data source using a string such as 'ABS', and has built-in support for a number of data sources.
Use list_sources() to list these.
Read the following sections, or the file sources.json in the package source code, for more details.
sdmx also supports adding other data sources; see add_source() and Source.
Data source limitations¶
Each SDMX web service provides a subset of the full SDMX feature set, so the same request made to two different sources may yield different results, or an error message.
A key difference is between sources offering SDMX-ML and SDMX-JSON APIs. SDMX-JSON APIs do not support metadata, or structure queries; only data queries.
Note
For JSON APIs, start by browsing the source’s website to retrieve the dataflow you’re interested in. Then try to fine-tune a planned data request by providing a valid key (= selection of series from the dataset).
Because structure metadata is unavailable, sdmx cannot automatically validate keys.
In order to anticipate and handle these differences:
add_source()accepts “data_content_type” and “supported” keys. For example:[ { "id": "ABS", "data_content_type": "JSON" }, { "id": "UNESCO", "unsupported": ["datastructure"] }, ]
sdmxwill raiseNotImplementedErroron an attempt to query the “datastructure” API endpoint of either of these data sources.sdmx.sourceincludes adapters (subclasses ofSource) with hooks used when querying sources and interpreting their HTTP responses. These are documented below: ABS, ESTAT, and SGR.
ABS: Australian Bureau of Statistics¶
SDMX-JSON — Website
BBK: German Federal Bank¶
New in version 2.5.0.
German name: Deutsche Bundesbank
The web service has some non-standard behaviour; see #82.
The version path component is not-supported for non-data endpoints.
sdmxdiscards other values with a warning.Some endpoints, including
codelist, return malformed URNs and cannot be handled withsdmx.
- class sdmx.source.bbk.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]¶
Work around non-standard behaviour of the BBK: German Federal Bank web service.
- modify_request_args(kwargs)[source]¶
Modify arguments used to build query URL.
This hook is called by
Client.get()to modify the keyword arguments before the query URL is built.The default implementation handles requests for ‘structure-specific data’ by adding an HTTP ‘Accepts:’ header when a ‘dsd’ is supplied as one of the kwargs.
See
sgr.Source.modify_request_args()for an example override.- Returns
- Return type
BIS: Bank for International Settlements¶
SDMX-ML — Website — API reference
New in version 2.5.0.
ESTAT: Eurostat¶
SDMX-ML — Website
Thousands of dataflows on a wide range of topics.
No categorisations available.
Long response times are reported. Increase the timeout attribute to avoid timeout exceptions.
Does not return DSDs for dataflow requests with the
references='all'query parameter.
- class sdmx.source.estat.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]¶
Handle Eurostat’s mechanism for large datasets.
For some requests, ESTAT returns a DataMessage that has no content except for a
<footer:Footer>element containing a URL where the data will be made available as a ZIP file.To configure
finish_message(), pass its get_footer_url argument toClient.get().New in version 0.2.1.
- finish_message(message, request, get_footer_url=(30, 3), **kwargs)[source]¶
Handle the initial response.
This hook identifies the URL in the footer of the initial response, makes a second request (polling as indicated by get_footer_url), and returns a new DataMessage with the parsed content.
- handle_response(response, content)[source]¶
Handle the polled response.
The request for the indicated ZIP file URL returns an octet-stream; this handler saves it, opens it, and returns the content of the single contained XML file.
- modify_request_args(kwargs)[source]¶
Modify arguments used to build query URL.
This hook is called by
Client.get()to modify the keyword arguments before the query URL is built.The default implementation handles requests for ‘structure-specific data’ by adding an HTTP ‘Accepts:’ header when a ‘dsd’ is supplied as one of the kwargs.
See
sgr.Source.modify_request_args()for an example override.- Returns
- Return type
ECB: European Central Bank¶
SDMX-ML — Website
Supports categorisations of data-flows.
Supports preview_data and series-key based key validation.
In general short response times.
ILO: International Labour Organization¶
SDMX-ML — Website
sdmx.source.ilo.Sourcehandles some particularities of the ILO web service. Others that are not handled:Data flow IDs take on the role of a filter. E.g., there are dataflows for individual countries, ages, sexes etc. rather than merely for different indicators.
The service returns 413 Payload Too Large errors for some queries, with messages like: “Too many results, please specify codelist ID”. Test for
sdmx.exceptions.HTTPError(=requests.exceptions.HTTPError) and/or specify aresource_id.
It is highly recommended to read the API guide.
- class sdmx.source.ilo.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]¶
- modify_request_args(kwargs)[source]¶
Handle two limitations of ILO’s REST service.
Service returns SDMX-ML 2.0 by default, whereas
sdmxonly supports SDMX-ML 2.1. Set?format=generic_2_1query parameter.The service does not support values ‘parents’, ‘parentsandsiblings’ (the default), and ‘all’ for the
referencesquery parameter. Override the default with?references=none.Note
Valid values are: none, parents, parentsandsiblings, children, descendants, all, or a specific structure reference such as ‘codelist’.
IMF: International Monetary Fund’s “SDMX Central” source¶
SDMX-ML — Website
Subset of the data available on http://data.imf.org.
Supports series-key-only and hence dataset-based key validation and construction.
INEGI: National Institute of Statistics and Geography (Mexico)¶
SDMX-ML — Website.
Spanish name: Instituto Nacional de Estadística y Geografía.
INSEE: National Institute of Statistics and Economic Studies (France)¶
SDMX-ML — Website
French name: Institut national de la statistique et des études économiques.
ISTAT: National Institute of Statistics (Italy)¶
SDMX-ML — Website
Italian name: Istituto Nazionale di Statistica.
Similar server platform to Eurostat, with similar capabilities.
LSD: National Institute of Statistics (Lithuania)¶
SDMX-ML — Website
Lithuanian name: Lietuvos statistikos.
This web service returns the non-standard HTTP content-type “application/force-download”;
sdmxreplaces it with “application/xml”.
NB: Norges Bank (Norway)¶
SDMX-ML — Website
Few dataflows. So do not use categoryscheme.
It is unknown whether NB supports series-keys-only.
NBB: National Bank of Belgium (Belgium)¶
SDMX-JSON — Website — API documentation (en)
French name: Banque Nationale de Belgique.
Dutch name: Nationale Bank van België.
As of 2020-12-13, this web service (like STAT_EE) uses server software that serves SDMX-ML 2.0 or SDMX-JSON. Since
sdmxdoes not support SDMX-ML 2.0, the package is configured to use the JSON endpoint.The web service returns a custom HTML error page rather than an SDMX error message for certain queries or an internal error. This appears as:
ValueError: can't determine a SDMX reader for response content type 'text/html; charset=utf-8'
OECD: Organisation for Economic Cooperation and Development¶
SDMX-JSON — Website
SGR: SDMX Global Registry¶
SDMX-ML — Website
- class sdmx.source.sgr.Source(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]¶
SPC: Pacific Data Hub DotStat by the Pacific Community (SPC)¶
SDMX-ML — API documentation — Web interface
French name: Communauté du Pacifique
STAT_EE: Statistics Estonia (Estonia)¶
SDMX-JSON — Website (et) — API documentation (en), (et)
Estonian name: Eesti Statistika.
As of 2020-12-13, this web service (like NBB) uses server software that serves SDMX-ML 2.0 or SDMX-JSON. Since
sdmxdoes not support SDMX-ML 2.0, the package is configured to use the JSON endpoint.
UNSD: United Nations Statistics Division¶
SDMX-ML — Website
Supports preview_data and series-key based key validation.
UNESCO: UN Educational, Scientific and Cultural Organization¶
SDMX-ML — Website
Free registration required; user credentials must be provided either as parameter or HTTP header with each request.
Warning
An issue with structure-specific datasets has been reported. It seems that Series are not recognized due to some oddity in the XML format.
UNICEF: UN Children’s Fund¶
SDMX-ML or SDMX-JSON — API documentation — Web interface — Data browser
This source always returns structure-specific messages for SDMX-ML data queries; even when the HTTP header
Accept: application/vnd.sdmx.genericdata+xmlis given.
UNICEF also serves data for the Countdown to 2030 initiative under a data flow with the ID
CONSOLIDATED. The structures can be obtained by giving the provider argument to a structure query, and then used to query the data:import sdmx UNICEF = sdmx.Client("UNICEF") # Use the dataflow ID to obtain the data structure definition dsd = UNICEF.dataflow("CONSOLIDATED", provider="CD2030").structure[0] # Use the DSD to construct a query for indicator D5 (“Births”) client.data("CONSOLIDATED", key=dict(INDICATOR="D5"), dsd=dsd)
The example query from the UNICEF API documentation (also used in the
sdmxtest suite) returns XML like:<mes:Structure structureID="UNICEF_GLOBAL_DATAFLOW_1_0" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=UNICEF:GLOBAL_DATAFLOW(1.0):ObsLevelDim:TIME_PERIOD" dimensionAtObservation="TIME_PERIOD"> <com:StructureUsage> <Ref agencyID="UNICEF" id="GLOBAL_DATAFLOW" version="1.0"/> </com:StructureUsage> </mes:Structure>
Contrary to this, the corresponding DSD actually has the ID
DSD_AGGREGATE, notGLOBAL_DATAFLOW. To retrieve the DSD—which is necessary to parse a data message—first query this data flow by ID, and select the DSD from the returned message:In [1]: import sdmx In [2]: msg = sdmx.Client("UNICEF").dataflow("GLOBAL_DATAFLOW") In [3]: msg Out[3]: <sdmx.StructureMessage> <Header> id: 'IREF009764' prepared: '2021-06-27T13:36:19+00:00' receiver: <Agency not_supplied> sender: <Agency UNICEF> source: test: False response: <Response [200]> Codelist (42): CL_CONF_STATUS CL_SEX CL_UNIT_MULT CL_ADMIN_LEVEL CL_A... ConceptScheme (1): UNICEF_CONCEPTS ContentConstraint (2): CONSTR_SEX UNICEF_USED_INDICATORS DataflowDefinition (1): GLOBAL_DATAFLOW DataStructureDefinition (1): DSD_AGGREGATE AgencyScheme (2): AGENCIES DATA_PROVIDERS ProvisionAgreement (1): GLOBAL_DATAFLOW_UNICEF_UNICEF In [4]: dsd = msg.structure[0]
The resulting object dsd can be passed as an argument to a
Client.get()data query. See the sdmx test suite for an example.
WB: World Bank Group “World Integrated Trade Solution”¶
SDMX-ML — Website
WB_WDI: World Bank Group “World Development Indicators”¶
SDMX-ML — Website
This web service also supports SDMX-JSON. To retrieve messages in this format, pass the HTTP
Accept:header described on the service website.