What’s new?¶
v2.4.0 (2021-03-28)¶
IdentifiableArtefact
can besorted()
(PR #71).Add
DataStructureDefinition.iter_keys()
to iterate over valid keys, optionally with aConstraint
(PR #72)Also add
ContentConstraint.iter_keys()
,DataflowDefinition.iter_keys()
.Implement or improve
Constraint.__contains__()
,CubeRegion.__contains__()
,ContentConstraint.__contains__()
,KeyValue.__eq__()
, andKey.__eq__()
.
Speed up creation of
Key
objects by improvingpydantic
usage, updatingKey.__init__()
, and addingKey._fast()
.Simplify
validate_dictlike()
; adddictlike_field()
, and simplifypydantic
validation ofDictLike
objects, keys, and values.
v2.3.0 (2021-03-10)¶
v2.2.0 (2021-02-26)¶
New convenience method
AnnotableArtefact.get_annotation()
to return but not remove an Annotation, e.g. by its ID (PR #60).Add
py.typed
to support type checking (e.g. with mypy) in packages that depend onsdmx
.
v2.1.0 (2021-02-22)¶
ItemScheme.append()
now raisesValueError
on duplicate IDs (PR #58).Item.parent
stores a reference to the containingItemScheme
for top-level Items that have no hierarchy/parent of their own. This allows navigating from any Item to the ItemScheme that contains it.Item.get_scheme()
is added as a convenience method (PR #58).reader.xml
internals reworked for significant speedups in parsing of SDMX-ML (PR #58).New convenience method
StructureMessage.get()
to retrieve objects by ID across the multiple collections in StructureMessage (PR #58).New convenience method
AnnotableArtefact.pop_annotation()
to locate, remove, and return a Annotation, e.g. by its ID (PR #58).len()
of aDataKeySet
gives the length ofDataKeySet.keys
(PR #58).
v2.0.1 (2021-01-31)¶
v2.0.0 (2021-01-26)¶
Migration notes¶
Code that calls Request()
emits DeprecationWarning
and logs a message with level WARNING
:
>>> sdmx.Request("ECB")
Request class will be removed in v3.0; use Client(...)
<sdmx.client.Client object at 0x7f98787e7d60>
Instead, use:
sdmx.Client("ECB")
Per the standard semantic versioning approach, this feature is marked as deprecated in version 2.0, and will be removed no sooner than version 3.0.
References to sdmx.logger
should be updated to sdmx.log
.
Instead of passing the log_level parameter to Client
, access this standard Python Logger
and change its level, as described at HOWTO control logging.
All changes¶
The large library of test specimens for
sdmx
is no longer shipped with the package, reducing the archive size by about 80% (#18, PR #52). The specimens can be retrieved for running tests locally; see Test specimens.The
Request
class is renamedClient
for semantic clarity (#11, PR #44):A Client can open a
requests.Session
and might make manyrequests.Requests
against the same web service.The log_level parameter to
Client
is deprecated.Some internal modules are renamed. These should not affect user code; if they do, adjust that code to use the top-level objects.
sdmx.api
is renamedsdmx.client
.sdmx.remote
is renamedsdmx.session
.sdmx.reader.sdmxml
is renamedsdmx.reader.xml
, to conform withsdmx.format.xml
andsdmx.writer.xml
.sdmx.reader.sdmxjson
is renamedsdmx.reader.json
.
v1.7 and earlier¶
v1.7.0 (2021-01-26)¶
New features¶
Add The Pacific Community’s Pacific Data Hub as a data source (PR #30).
Add classes to
sdmx.model
:TimeRangeValue
,Period
,RangePeriod
, and parse<com:TimeRange>
and related tags in SDMX-ML (PR #30).
Bug fixes¶
Output SDMX-ML header elements in order expected by standard XSD (#42, PR #43).
Respect override argument to
add_source()
(PR #41).
v1.6.0 (2020-12-16)¶
New features¶
Support Python 3.9 (using pydantic ≥ 1.7) (PR #37).
Add National Bank of Belgium as a data source (PR #32).
Add Statistics Lithuania as a data source (PR #33).
v1.5.0 (2020-11-12)¶
Add a brief tutorial on creating SDMX-ML messages from pure Python objects (#23, PR #24).
Add Statistics Estonia as a data source (PR #25).
Supply provider=“ALL” to INSEE structure queries by default (#21, PR #22)
v1.4.0 (2020-08-17)¶
New features¶
Enhance
to_xml()
to handleDataMessages
(PR #13).In v1.4.0, this feature supports a subset of DataMessages and DataSets. If you have an example of a DataMessages that
sdmx1
1.4.0 cannot write, please file an issue on GitHub with a file attachment. SDMX-ML features used in such examples will be prioritized for future improvements.Add
compare()
methods toDataMessage
,DataSet
, and related classes (PR #13).
Bug fixes¶
Fix parsing of
MeasureDimension
returned by SGR for data structure queries (PR #14).
v1.3.0 (2020-08-02)¶
Adjust imports for compatibility with pandas 1.1.0 (PR #10).
Add World Bank World Development Indicators (WDI) service to supported sources (PR #10).
v1.2.0 (2020-06-04)¶
New features¶
Methods like
IdentifiableArtefact.compare()
are added for recursive comparison ofmodel
objects (PR #6).to_xml()
covers a larger subset of SDMX-ML, including almost all contents of aStructureMessage
(PR #6).
v1.1.0 (2020-05-18)¶
Data model changes¶
…to bring sdmx
into closer alignment with the standard Information Model (PR #4):
Change
Header.receiver
andHeader.sender
to optionalAgency
, notstr
.Add
Header.source
andtest
.IdentifiableArtefact.id
is strictly typed asstr
, with a singleton object (analogous toNone
) used for missing IDs.IdentifiableArtefact.id
,VersionableArtefact.version
, andMaintainableArtefact.maintainer
are inferred from a URN if one is passed during construction.VersionableArtefact.identical()
andMaintainableArtefact.identical()
compare on version and maintainer attributes, respectively.Facet
,Representation
, andISOConceptReference
are strictly validated, i.e. cannot be assigned non-IM attributes.Add
OrganisationScheme
,NoSpecifiedRelationship
,PrimaryMeasureRelationship
,DimensionRelationship
, andGroupRelationship
as distinct classes.Type of
DimensionRelationship.dimensions
isDimensionComponent
, not the narrowerDimension
.DataStructureDefinition.measures
is an emptyMeasureDescriptor
by default, notNone
.DataSet.add_obs()
now acceptsObservations
with noSeriesKey
association, and sets this association to the one provided as an argument.String representations are simplified but contain more information.
New features¶
Item.hierarchical_id
andItemScheme.get_hierarchical()
create and search on IDs like ‘A.B.C’ for Item ‘A’ with child/grandchild Items ‘B’ and ‘C’ (PR #4).New methods
parent_class()
,get_reader_for_path()
,detect_content_reader()
, andreader.register()
(PR #4).sdmxml.Reader
uses an event-driven, rather than recursive/tree iterating, parser (PR #4).The codebase is improved to pass static type checking with mypy (PR #4).
Add
to_xml()
to generate SDMX-ML for a subset of the IM (PR #3).
v1.0.0 (2020-05-01)¶
Project forked and renamed to
sdmx
(module) /sdmx1
(on PyPI, due to an older, unmaintained package with the same name).sdmx.model
is reimplemented.Python typing and pydantic are used to force tight compliance with the SDMX Information Model (IM). Users familiar with the IM can use
sdmx
without the need to understand implementation-specific details.IM classes are no longer tied to
sdmx.reader
instances and can be created and manipulated outside of a read operation.
sdmx.api
andsdmx.remote
are reimplemented to (1) match the semantics of the requests package and (2) be much thinner.Data sources are modularized in
Source
.Idiosyncrasies of particular data sources (e.g. ESTAT’s process for large requests) are handled by source-specific subclasses. As a result,
sdmx.api
is leaner.
Testing coverage is significantly expanded.
Promised, but untested, features of the 0.x series now have tests, to ensure feature parity.
There are tests for each data source (
tests/test_sources.py`
) to ensure the package can handle idiosyncratic behaviour.The pytest-remotedata pytest plugin allows developers and users to run or skip network tests with –remote-data.
Breaking changes¶
Python 3.6 and earlier (including Python 2) are not supported.
Migrating¶
Writer.write(…, reverse_obs=True)
: use the standard pandas indexing approach to reverse a pd.Series:s.iloc[::-1]
odo support is no longer built-in; however, users can still register a SDMX resource with odo. See the HOWTO.
write_dataset()
: the parse_time and fromfreq arguments are replaced by datetime; see the method documentation and the walkthrough section for examples.
pandaSDMX (versions 0.9 and earlier)¶
pandaSDMX v0.9 (2018-04)¶
This version is the last tested on Python 2.x. Future versions will be tested on Python 3.5+ only
New features¶
four new data providers INEGI (Mexico), Norges Bank (Norway), International Labour Organization (ILO) and Italian statistics office (ISTAT)
model: make Ref instances callable for resolving them, i.e. getting the referenced object by making a remote request if needed
improve loading of structure-specific messages when DSD is not passed / must be requested on the fly
process multiple and cascading content constraints as described in the Technical Guide (Chap. 6 of the SDMX 2.1 standard)
StructureMessages and DataMessages now have properties to compute the constrained and unconstrained codelists as dicts of frozensets of codes. For DataMessage this is useful when
series_keys
was set to True when making the request. This prompts the data provider to generate a dataset without data, but with the complete set of series keys. This is the most accurate representation of the available series. Agencies such as IMF and ECB support this feature.
v0.8.2 (2017-12-21)¶
fix reading of structure-specific data sets when DSD_ID is present in the data set
v0.8.1 (2017-12-20)¶
fix broken package preventing pip installs of the wheel
v0.8 (2017-12-12)¶
add support for an alternative data set format defined for SDMXML messages. These so-called structure-specific data sets lend themselves for large data queries. File sizes are typically about 60 % smaller than with equivalent generic data sets. To make use of structure-specific data sets, instantiate Request objects with agency IDs such as ‘ECB_S’, ‘INSEE_S’ or ‘ESTAT_S’ instead of ‘ECB’ etc. These alternative agency profiles prompt pandaSDMX to execute data queries for structure-specific data sets. For all other queries they behave exactly as their siblings. See a code example in chapter 5 of the docs.
raise ValueError when user attempts to request a resource other than data from an agency delivering data in SCMX-JSON format only (OECD and ABS).
Update INSEE profile
handle empty series properly
data2pd writer: the code for Series index generation was rewritten from scratch to make better use of pandas’ time series functionality. However, some data sets, in particular from INSEE, which come with bimonthly or semestrial frequencies cannot be rendered as PeriodIndex. Pass
parse_time=False
to the .write method to prevent errors.
v0.7.0 (2017-06-10)¶
add new data providers:
Australian Bureau of Statistics
International Monetary Fund - SDMXCentral only
United Nations Division of Statistics
UNESCO (free registration required)
World Bank - World Integrated Trade Solution (WITS)
new feature: load metadata on data providers from json file; allow the user to add new agencies on the fly by specifying an appropriate JSON file using the
pandasdmx.api.Request.load_agency_profile()
.new
pandasdmx.api.Request.preview_data()
providing a powerful fine-grain key validation algorithm by downloading all series-keys of a dataset and exposing them as a pandas DataFrame which is then mapped to the cartesian product of the given dimension values. Works only with data providers such as ECB and UNSD which support “series-keys-only” requests. This feature could be wrapped by a browser-based UI for building queries.sdjxjson reader: add support for flat and cross-sectional datasets, preserve dimension order where possible
structure2pd writer: in codelists, output Concept rather than Code attributes in the first line of each code-list. This may provide more information.
v0.6.1 (2017-02-03)¶
fix 2to3 issue which caused crashes on Python 2.7
v0.6 (2017-01-07)¶
This release contains some important stability improvements.
Bug fixes¶
JSON data from OECD is now properly downloaded
The data writer tries to gleen a frequency value for a time series from its attributes. This is helpful when exporting data sets, e.g., from INSEE (Issue 41).
Known issues¶
A data set which lacks a FREQ dimension or attribute can be exported as pandas DataFrame only when parse_time=False?, i.e. no DateTime index is generated. The resulting DataFrame has a string index. Use pandas magic to create a DateTimeIndex from there.
v0.5 (2016-10-30)¶
New features¶
new reader module for SDMX JSON data messages
add OECD as data provider (data messages only)
pandasdmx.model.Category
is now an iterator over categorised objects. This greatly simplifies category usage. Besides, categories with the same ID while belonging to multiple category schemes are no longer conflated.
API changes¶
Request constructor: make agency ID case-insensitive
As
Category
is now an iterator over categorised objects,Categorisations
is no longer considered part of the public API.
Bug fixes¶
sdmxml reader: fix AttributeError in write_source method, thanks to Topas
correctly distinguish between categories with same ID while belonging to different category schemes
v0.4 (2016-04-11)¶
New features¶
add new provider INSEE, the French statistics office (thanks to Stéphan Rault)
register ‘.sdmx’ files with Odo if available
logging of http requests and file operations.
new structure2pd writer to export codelists, dataflow-definitions and other structural metadata from structure messages as multi-indexed pandas DataFrames. Desired attributes can be specified and are represented by columns.
API changes¶
pandasdmx.api.Request
constructor accepts alog_level
keyword argument which can be set to a log-level for the pandasdmx logger and its children (currently only pandasdmx.api)pandasdmx.api.Request
now has atimeout
property to set the timeout for http requestsextend api.Request._agencies configuration to specify agency- and resource-specific settings such as headers. Future versions may exploit this to provide reader selection information.
api.Request.get: specify http_headers per request. Defaults are set according to agency configuration
Response instances expose Message attributes to make application code more succinct
rename
pandasdmx.api.Message
attributes to singular form. Old names are deprecated and will be removed in the future.pandasdmx.api.Request
exposes resource names such as data, datastructure, dataflow etc. as descriptors calling ‘get’ without specifying the resource type as string. In interactive environments, this saves typing and enables code completion.data2pd writer: return attributes as namedtuples rather than dict
use patched version of namedtuple that accepts non-identifier strings as field names and makes all fields accessible through dict syntax.
remove GenericDataSet and GenericDataMessage. Use DataSet and DataMessage instead
sdmxml reader: return strings or unicode strings instead of LXML smart strings
sdmxml reader: remove most of the specialized read methods. Adapt model to use generalized methods. This makes code more maintainable.
pandasdmx.model.Representation
for DSD attributes and dimensions now supports text not just codelists.
Other changes and enhancements¶
documentation has been overhauled. Code examples are now much simpler thanks to the new structure2pd writer
testing: switch from nose to py.test
improve packaging. Include tests in sdist only
numerous bug fixes
v0.3.1 (2015-10-04)¶
This release fixes a few bugs which caused crashes in some situations.
v0.3.0 (2015-09-22)¶
support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite.
pythonic selection of series when requesting a dataset: Request.get allows the
key
keyword argument in a data request to be a dict mapping dimension names to values. In this case, the dataflow definition and datastructure definition, and content-constraint are downloaded on the fly, cached in memory and used to validate the keys. The dotted key string needed to construct the URL will be generated automatically.The Response.write method takes a
parse_time
keyword arg. Set it to False to avoid parsing of dates, times and time periods as exotic formats may cause crashes.The Request.get method takes a
memcache
keyward argument. If set to a string, the received Response instance will be stored in the dictRequest.cache
for later use. This is useful when, e.g., a DSD is needed multiple times to validate keys.fixed base URL for Eurostat
major refactorings to enhance code maintainability
v0.2.2¶
Make HTTP connections configurable by exposing the requests.get API through the
pandasdmx.api.Request
constructor. Hence, proxy servers, authorisation information and other HTTP-related parameters consumed byrequests.get
can be specified for eachRequest
instance and used in subsequent requests. The configuration is exposed as a dict through a newRequest.client.config
attribute.Responses have a new
http_headers
attribute containing the HTTP headers returned by the SDMX server
v0.2.1¶
Request.get: allow fromfile to be a file-like object
extract SDMX messages from zip archives if given. Important for large datasets from Eurostat
automatically get a resource at an URL given in the footer of the received message. This allows to automatically get large datasets from Eurostat that have been made available at the given URL. The number of attempts and the time to wait before each request are configurable via the
get_footer_url
argument.
v0.2.0 (2015-04-13)¶
This version is a quantum leap. The whole project has been redesigned and rewritten from scratch to provide robust support for many SDMX features. The new architecture is centered around a pythonic representation of the SDMX information model. It is extensible through readers and writers for alternative input and output formats. Export to pandas has been dramatically improved. Sphinx documentation has been added.
v0.1.2 (2014-09-17)¶
fix xml encoding. This brings dramatic speedups when downloading and parsing data
extend description.rst
v0.1 (2014-09)¶
Initial release