Development¶
This page gives development guidelines and some possible future enhancements to sdmx
.
For current development priorities, see the list of GitHub milestones and issues/PRs targeted to each.
Contributions are welcome!
Code style¶
Apply the following to new or modified code:
isort -rc . && black . && mypy . && flake8
Respectively, these:
Write docstrings in the numpydoc style.
Roadmap¶
SDMX features & miscellaneous¶
Serialize
Message
objects as SDMX-CSV (simplest), -JSON, or -ML (most complex).Parse SDMX-JSON structure messages.
Selective/partial parsing of SDMX-ML messages.
sdmx.api.Request._resources only contains a subset of: https://ec.europa.eu/eurostat/web/sdmx-web-services/rest-sdmx-2.1 (see “NOT SUPPORTED OPERATIONS”); provide the rest.
Get a set of API keys for testing UNESCO and encrypt them for use in CI: https://docs.travis-ci.com/user/encryption-keys/
Use the XML Schema definitions of SDMX-ML to validate messages and snippets.
Implement SOAP web service APIs. This would allow access to, e.g., a broader set of IMF: International Monetary Fund’s “SDMX Central” source data.
Support SDMX-ML 2.0. Several data providers still exist which only return SDMX-ML 2.0 messages.
Performance. Parsing some messages can be slow. Install pytest-profiling and run, for instance:
$ py.test --profile --profile-svg -k xml_structure_insee $ python3 -m pstats prof/combined.prof % sort cumulative % stats
Use pd.DataFrame for internal storage¶
sdmx
handles Observations
as individual object instances.
An alternative is to use pandas
or other data structures internally.
See:
sdmx/experimental.py for a partial mock-up of such code, and
tests/test_experimental.py for tests.
Choosing either the current or experimental DataSet as a default should be based on detailed performance (memory and time) evaluation under a variety of use-cases. To that end, note that the experimental DataSet involves three conversions:
a reader parses the XML or JSON source, creates Observation instances, and adds them using DataSet.add_obs()
experimental.DataSet.add_obs() populates a pd.DataFrame from these Observations, but discards them.
experimental.DataSet.obs() creates new Observation instances.
For a fair comparison, the API between the readers and DataSet could be changed to eliminate the round trip in #1/#2, but without sacrificing the data model consistency provided by pydantic on Observation instances.
Inline TODOs¶
Todo
Support selection of language for conversion of
InternationalString
.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/checkouts/v1.1.0/doc/api.rst, line 142.)