Data sources¶
SDMX makes a distinction between data providers and sources:
a data provider is the original publisher of statistical information and metadata.
a data source is a specific web service that provides access to statistical information.
Each data source might aggregate and provide data or metadata from multiple data providers. Or, an agency might operate a data source that only contains information they provide themselves; in this case, the source and provider are identical.
sdmx
identifies each data source using a string such as 'ABS'
, and has built-in support for a number of data sources.
Use list_sources()
to list these.
Read the following sections, or the file sources.json
in the package source code, for more details.
sdmx
also supports adding other data sources; see add_source()
and Source
.
Data source limitations¶
Each SDMX web service provides a subset of the full SDMX feature set, so the same request made to two different sources may yield different results, or an error message.
A key difference is between sources offering SDMX-ML and SDMX-JSON APIs. SDMX-JSON APIs do not support metadata, or structure queries; only data queries.
Note
For JSON APIs, start by browsing the source’s website to retrieve the dataflow you’re interested in. Then try to fine-tune a planned data request by providing a valid key (= selection of series from the dataset).
Because structure metadata is unavailable, sdmx
cannot automatically validate keys.
In order to anticipate and handle these differences:
add_source()
accepts “data_content_type” and “supported” keys. For example:[ { "id": "ABS", "data_content_type": "JSON" }, { "id": "UNESCO", "unsupported": ["datastructure"] }, ]
sdmx
will raiseNotImplementedError
on an attempt to query the “datastructure” API endpoint of either of these data sources.sdmx.source
includes adapters (subclasses ofSource
) with hooks used when querying sources and interpreting their HTTP responses. These are documented below: ABS, ESTAT, and SGR.
ABS
: Australian Bureau of Statistics¶
SDMX-JSON — Website
ESTAT
: Eurostat¶
SDMX-ML — Website
Thousands of dataflows on a wide range of topics.
No categorisations available.
Long response times are reported. Increase the timeout attribute to avoid timeout exceptions.
Does not return DSDs for dataflow requests with the
references='all'
query parameter.
-
class
sdmx.source.estat.
Source
(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]¶ Handle Eurostat’s mechanism for large datasets.
For some requests, ESTAT returns a DataMessage that has no content except for a
<footer:Footer>
element containing a URL where the data will be made available as a ZIP file.To configure
finish_message()
, pass its get_footer_url argument toClient.get()
.New in version 0.2.1.
-
finish_message
(message, request, get_footer_url=(30, 3), **kwargs)[source]¶ Handle the initial response.
This hook identifies the URL in the footer of the initial response, makes a second request (polling as indicated by get_footer_url), and returns a new DataMessage with the parsed content.
-
handle_response
(response, content)[source]¶ Handle the polled response.
The request for the indicated ZIP file URL returns an octet-stream; this handler saves it, opens it, and returns the content of the single contained XML file.
-
modify_request_args
(kwargs)[source]¶ Modify arguments used to build query URL.
This hook is called by
Client.get()
to modify the keyword arguments before the query URL is built.The default implementation handles requests for ‘structure-specific data’ by adding an HTTP ‘Accepts:’ header when a ‘dsd’ is supplied as one of the kwargs.
See
sgr.Source.modify_request_args()
for an example override.- Returns
- Return type
-
ECB
: European Central Bank¶
SDMX-ML — Website
Supports categorisations of data-flows.
Supports preview_data and series-key based key validation.
In general short response times.
ILO
: International Labour Organization¶
SDMX-ML — Website
sdmx.source.ilo.Source
handles some particularities of the ILO web service. Others that are not handled:Data flow IDs take on the role of a filter. E.g., there are dataflows for individual countries, ages, sexes etc. rather than merely for different indicators.
The service returns 413 Payload Too Large errors for some queries, with messages like: “Too many results, please specify codelist ID”. Test for
sdmx.exceptions.HTTPError
(=requests.exceptions.HTTPError
) and/or specify aresource_id
.
It is highly recommended to read the API guide.
-
class
sdmx.source.ilo.
Source
(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]¶ -
modify_request_args
(kwargs)[source]¶ Handle two limitations of ILO’s REST service.
Service returns SDMX-ML 2.0 by default, whereas
sdmx
only supports SDMX-ML 2.1. Set?format=generic_2_1
query parameter.The service does not support values ‘parents’, ‘parentsandsiblings’ (the default), and ‘all’ for the
references
query parameter. Override the default with?references=none
.Note
Valid values are: none, parents, parentsandsiblings, children, descendants, all, or a specific structure reference such as ‘codelist’.
-
IMF
: International Monetary Fund’s “SDMX Central” source¶
SDMX-ML — Website
Subset of the data available on http://data.imf.org.
Supports series-key-only and hence dataset-based key validation and construction.
INEGI
: National Institute of Statistics and Geography (Mexico)¶
SDMX-ML — Website.
Spanish name: Instituto Nacional de Estadística y Geografía.
INSEE
: National Institute of Statistics and Economic Studies (France)¶
SDMX-ML — Website
French name: Institut national de la statistique et des études économiques.
ISTAT
: National Institute of Statistics (Italy)¶
SDMX-ML — Website
Italian name: Istituto Nazionale di Statistica.
Similar server platform to Eurostat, with similar capabilities.
LSD
: National Institute of Statistics (Lithuania)¶
SDMX-ML — Website
Lithuanian name: Lietuvos statistikos.
This web service returns the non-standard HTTP content-type “application/force-download”;
sdmx
replaces it with “application/xml”.
NB
: Norges Bank (Norway)¶
SDMX-ML — Website
Few dataflows. So do not use categoryscheme.
It is unknown whether NB supports series-keys-only.
NBB
: National Bank of Belgium (Belgium)¶
SDMX-JSON — Website — API documentation (en)
French name: Banque Nationale de Belgique.
Dutch name: Nationale Bank van België.
As of 2020-12-13, this web service (like STAT_EE) uses server software that serves SDMX-ML 2.0 or SDMX-JSON. Since
sdmx
does not support SDMX-ML 2.0, the package is configured to use the JSON endpoint.The web service returns a custom HTML error page rather than an SDMX error message for certain queries or an internal error. This appears as:
ValueError: can't determine a SDMX reader for response content type 'text/html; charset=utf-8'
SGR
: SDMX Global Registry¶
SDMX-ML — Website
-
class
sdmx.source.sgr.
Source
(*, id: str, url: str, name: str, headers: Dict[str, Any] = {}, data_content_type: sdmx.source.DataContentType = <DataContentType.XML: 1>, supports: Dict[Union[str, sdmx.util.Resource], bool] = {<Resource.data: 'data'>: True})[source]¶
SPC
: Pacific Data Hub DotStat by the Pacific Community (SPC)¶
SDMX-ML — API documentation — Web interface
French name: Communauté du Pacifique
STAT_EE
: Statistics Estonia (Estonia)¶
SDMX-JSON — Website (et) — API documentation (en), (et)
Estonian name: Eesti Statistika.
As of 2020-12-13, this web service (like NBB) uses server software that serves SDMX-ML 2.0 or SDMX-JSON. Since
sdmx
does not support SDMX-ML 2.0, the package is configured to use the JSON endpoint.
UNSD
: United Nations Statistics Division¶
SDMX-ML — Website
Supports preview_data and series-key based key validation.
UNESCO
: UN Educational, Scientific and Cultural Organization¶
SDMX-ML — Website
Free registration required; user credentials must be provided either as parameter or HTTP header with each request.
Warning
An issue with structure-specific datasets has been reported. It seems that Series are not recognized due to some oddity in the XML format.
UNICEF
: UN Children’s Fund¶
SDMX-ML or SDMX-JSON — API documentation — Web interface
This source always returns structure-specific messages for SDMX-ML data queries; even when the HTTP header
Accept: application/vnd.sdmx.genericdata+xml
is given.The example query from the UNICEF API documentation (also used in the
sdmx
test suite) returns XML like:<mes:Structure structureID="UNICEF_GLOBAL_DATAFLOW_1_0" namespace="urn:sdmx:org.sdmx.infomodel.datastructure.Dataflow=UNICEF:GLOBAL_DATAFLOW(1.0):ObsLevelDim:TIME_PERIOD" dimensionAtObservation="TIME_PERIOD"> <com:StructureUsage> <Ref agencyID="UNICEF" id="GLOBAL_DATAFLOW" version="1.0"/> </com:StructureUsage> </mes:Structure>
The corresponding DSD actually has the ID
DSD_AGGREGATE
, which is not obvious from the message. To retrieve the DSD—which is necessary to parse a data message—first query this data flow by ID, and select the DSD from the returned message:In [1]: import sdmx In [2]: msg = sdmx.Client("UNICEF").dataflow("GLOBAL_DATAFLOW") In [3]: msg Out[3]: <sdmx.StructureMessage> <Header> id: 'IREF434067' prepared: '2021-02-26T12:43:27+00:00' receiver: <Agency not_supplied> sender: <Agency UNICEF> source: test: False response: <Response [200]> Codelist (29): CL_CONF_STATUS CL_SEX CL_UNIT_MULT CL_ADMIN_LEVEL CL_A... ConceptScheme (1): UNICEF_CONCEPTS ContentConstraint (1): CONSTR_SEX DataflowDefinition (1): GLOBAL_DATAFLOW DataStructureDefinition (1): DSD_AGGREGATE AgencyScheme (2): AGENCIES DATA_PROVIDERS ProvisionAgreement (1): GLOBAL_DATAFLOW_UNICEF_UNICEF In [4]: dsd = msg.structure[0]
The resulting object dsd can be passed as an argument to a
Client.get()
data query. See the sdmx test suite for an example.