Ten-line usage example

Suppose we want to analyze annual unemployment data for some European countries. All we need to know in advance is the data provider: Eurostat.

sdmx makes it easy to search the directory of dataflows, and the complete structural metadata about the datasets available through the selected dataflow. (This example skips these steps; see the walkthrough.)

The data we want is in a data flow with the identifier une_rt_a. This dataflow references a data structure definition (DSD) with the ID DSD_une_rt_a. The DSD, in turn, contains or references all the metadata describing data sets available through this dataflow: the concepts, things measured, dimensions, and lists of codes used to label each dimension.

In [1]: import sdmx

In [2]: estat = sdmx.Client('ESTAT')

Download the metadata and expose:

In [3]: metadata = estat.datastructure('DSD_une_rt_a')

In [4]: metadata
Out[4]: 
<sdmx.StructureMessage>
  <Header>
    id: 'IDREF168025'
    prepared: '2021-05-31T09:22:16.858000+00:00'
    receiver: <Agency Unknown>
    sender: <Agency Unknown>
    source: 
    test: False
  response: <Response [200]>
  Codelist (7): CL_AGE CL_FREQ CL_GEO CL_OBS_FLAG CL_OBS_STATUS CL_SEX ...
  ConceptScheme (1): CS_DSD_une_rt_a
  DataStructureDefinition (1): DSD_une_rt_a

Explore the contents of some code lists:

In [5]: for cl in 'CL_AGE', 'CL_UNIT':
   ...:     print(sdmx.to_pandas(metadata.codelist[cl]))
   ...: 
CL_AGE
Y15-24    From 15 to 24 years
Y15-74    From 15 to 74 years
Y20-64    From 20 to 64 years
Y25-54    From 25 to 54 years
Y25-74    From 25 to 74 years
Y55-74    From 55 to 74 years
Name: AGE, dtype: object
CL_UNIT
THS_PER                   Thousand persons
PC_POP      Percentage of total population
PC_ACT     Percentage of active population
Name: UNIT, dtype: object

Next we download a dataset. To obtain data on Greece, Ireland and Spain only, we use codes from the code list ‘CL_GEO’ to specify a key for the dimension named ‘GEO’. We also use a query parameter, ‘startPeriod’, to limit the scope of the data returned:

In [6]: resp = estat.data(
   ...:     'une_rt_a',
   ...:     key={'GEO': 'EL+ES+IE'},
   ...:     params={'startPeriod': '2007'},
   ...:     )
   ...: 

resp is now a DataMessage object. We use the built-in to_pandas() function to convert it to a pandas.Series, then select on the AGE dimension:

In [7]: data = (sdmx.to_pandas(resp)
   ...:             .xs('Y15-74', level='AGE', drop_level=False))
   ...: 

We can now explore the data set as expressed in a familiar pandas object. First, show dimension names:

In [8]: data.index.names
Out[8]: FrozenList(['FREQ', 'AGE', 'UNIT', 'SEX', 'GEO', 'TIME_PERIOD'])

…and corresponding key values along these dimensions:

In [9]: data.index.levels
Out[9]: FrozenList([['A'], ['Y15-24', 'Y15-74', 'Y20-64', 'Y25-54', 'Y25-74', 'Y55-74'], ['PC_ACT', 'PC_POP', 'THS_PER'], ['F', 'M', 'T'], ['EL', 'ES', 'IE'], ['2007', '2008', '2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016', '2017', '2018', '2019', '2020']])

Select some data of interest: show aggregate unemployment rates across ages (‘Y15-74’ on the AGE dimension) and sexes (‘T’ on the SEX dimension), expressed as a percentage of active population (‘PC_ACT’ on the UNIT dimension):

In [10]: data.loc[('A', 'Y15-74', 'PC_ACT', 'T')]
Out[10]: 
GEO  TIME_PERIOD
EL   2007            8.4
     2008            7.8
     2009            9.6
     2010           12.7
     2011           17.9
     2012           24.5
     2013           27.5
     2014           26.5
     2015           24.9
     2016           23.6
     2017           21.5
     2018           19.3
     2019           17.3
     2020           16.3
ES   2007            8.2
     2008           11.3
     2009           17.9
     2010           19.9
     2011           21.4
     2012           24.8
     2013           26.1
     2014           24.5
     2015           22.1
     2016           19.6
     2017           17.2
     2018           15.3
     2019           14.1
     2020           15.5
IE   2007            5.0
     2008            6.8
     2009           12.6
     2010           14.6
     2011           15.4
     2012           15.5
     2013           13.8
     2014           11.9
     2015           10.0
     2016            8.4
     2017            6.7
     2018            5.8
     2019            5.0
     2020            5.7
Name: value, dtype: float64