Ten-line usage example¶
Suppose we want to analyze annual unemployment data for some European countries.
All we need to know in advance is the data provider: Eurostat.
sdmx makes it easy to inspect all data flows available from this provider. [1]
The data we want is in a data flow with the identifier ‘UNE_RT_A’.
The description of this data flow references a data structure definition (DSD) that happens to also have the ID ‘UNE_RT_A’. [2]
First we create a Client that we will use to make multiple queries to this provider’s SDMX-REST web service:
In [1]: import sdmx
In [2]: estat = sdmx.Client("ESTAT")
Next, we download a structure message containing the DSD and other structural information that it references. These include structural metadata that together completely describe the data available through this dataflow: the concepts, things measured, dimensions, lists of codes used to label each dimension, attributes, and so on:
In [3]: sm = estat.datastructure("UNE_RT_A", params=dict(references="descendants"))
In [4]: sm
Out[4]:
<sdmx.StructureMessage>
<Header>
id: 'DSD1773379213'
prepared: '2026-03-13T05:20:13.204000+00:00'
sender: <Agency ESTAT>
source:
test: False
response: <Response [200]>
Codelist (7): FREQ AGE UNIT SEX GEO OBS_FLAG CONF_STATUS
ConceptScheme (1): UNE_RT_A
DataStructureDefinition (1): UNE_RT_A
sm is a Python object of class StructureMessage.
We can explore some of the specific artifacts
—for example, three code lists—
using StructureMessage.get() to retrieve them
and to_pandas() to convert to pandas.Series:
In [5]: for cl in "ESTAT:AGE(15.0)", "ESTAT:SEX(1.13)", "ESTAT:UNIT(72.0)":
...: print(sdmx.to_pandas(sm.get(cl)))
...:
AGE
TOTAL Total
LFD Late foetal death
LFD1 Late foetal death (group 1)
LFD2 Late foetal death (group 2)
MN0 Zero minutes
...
AVG Average
NRP No response
NSP Not specified
OTH Other
UNK Unknown
Name: Age class, Length: 679, dtype: str
SEX
T Total
M Males
F Females
DIFF Absolute difference between males and females
NAP Not applicable
NRP No response
UNK Unknown
Name: Sex, dtype: str
UNIT
TOTAL Total
NR Number
NR_HAB Number per inhabitant
NR_HHAB Number per hundred inhabitants
NR_HTHAB Number per hundred thousand persons
...
PTIR_LT_AVG Price-to-income ratio relative to long-term av...
SPTIR Standardised price-to-income ratio
M2_THAB Square metres per 1000 inhabitants
CFU Colony-forming unit (CFU)
SC Score
Name: Unit of measure, Length: 729, dtype: str
Next, we download a data set containing a portion of the data in this data flow, structured by this DSD.
To obtain data only for Greece, Ireland and Spain, we use codes from the code list with the ID ‘GEO’ to specify a key for the dimension with the ID ‘geo’. [3]
We also use a query parameter, ‘startPeriod’, to limit the scope of the data returned along the ‘TIME_PERIOD’ dimension.
The query returns a data message (Python object of DataMessage) containing the data set:
In [6]: dm = estat.data(
...: "UNE_RT_A",
...: key={"geo": "EL+ES+IE"},
...: params={"startPeriod": "2014"},
...: )
...:
We again use to_pandas() to convert the entire dm to a pandas.Series with a multi-level index (one level per dimension of the DSD).
Then we can use pandas’ built-in methods, like pandas.Series.xs() to take a cross-section, selecting on the ‘age’ index level (=SDMX dimension):
In [7]: data = (
...: sdmx.to_pandas(dm)
...: .xs("Y15-74", level="age", drop_level=False)
...: )
...:
We further examine the retrieved data set in the familiar form of a pandas.Series.
For one example, show dimension names:
In [8]: data.index.names
Out[8]: FrozenList(['freq', 'age', 'unit', 'sex', 'geo', 'TIME_PERIOD'])
…and corresponding key values along these dimensions:
In [9]: data.index.levels
Out[9]: FrozenList([['A'], ['Y15-24', 'Y15-29', 'Y15-74', 'Y20-64', 'Y25-54', 'Y25-74', 'Y55-74'], ['PC_ACT', 'PC_POP', 'THS_PER'], ['F', 'M', 'T'], ['EL', 'ES', 'IE'], ['2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021', '2022', '2023', '2024', '2025']])
Select some data of interest: show aggregate unemployment rates across ages (“Y15-74” on the ‘age’ dimension) and sexes (“T” on the ‘sex’ dimension), expressed as a percentage of active population (“PC_ACT” on the ‘unit’ dimension):
In [10]: data.loc[("A", "Y15-74", "PC_ACT", "T")]
Out[10]:
geo TIME_PERIOD
EL 2014 26.6
2015 25.0
2016 23.9
2017 21.8
2018 19.7
2019 17.9
2020 17.6
2021 14.7
2022 12.5
2023 11.1
2024 10.1
2025 8.9
ES 2014 24.5
2015 22.1
2016 19.6
2017 17.2
2018 15.3
2019 14.1
2020 15.5
2021 14.9
2022 13.0
2023 12.2
2024 11.4
2025 10.5
IE 2014 11.9
2015 9.9
2016 8.4
2017 6.7
2018 5.8
2019 5.0
2020 5.9
2021 6.2
2022 4.5
2023 4.3
2024 4.3
2025 4.7
Name: value, dtype: float64