Development

This page gives development guidelines and some possible future enhancements to sdmx. For current development priorities, see the list of GitHub milestones and issues/PRs targeted to each. Contributions are welcome!

Code style

  • Apply the following to new or modified code:

    isort -rc . && black . && mypy . && flake8
    

    Respectively, these:

    • isort: sort import lines at the top of code files in a consistent way, using isort.

    • black: apply black code style.

    • mypy: check typing using mypy.

    • flake8: check code style against PEP 8 using flake8.

  • Write docstrings in the numpydoc style.

Roadmap

SDMX features & miscellaneous

  • Serialize Message objects as SDMX-CSV (simplest), -JSON, or -ML (most complex).

  • Parse SDMX-JSON structure messages.

  • Selective/partial parsing of SDMX-ML messages.

  • sdmx.api.Request._resources only contains a subset of: https://ec.europa.eu/eurostat/web/sdmx-web-services/rest-sdmx-2.1 (see “NOT SUPPORTED OPERATIONS”); provide the rest.

  • Get a set of API keys for testing UNESCO and encrypt them for use in CI: https://docs.travis-ci.com/user/encryption-keys/

  • Use the XML Schema definitions of SDMX-ML to validate messages and snippets.

  • Implement SOAP web service APIs. This would allow access to, e.g., a broader set of IMF: International Monetary Fund’s “SDMX Central” source data.

  • Support SDMX-ML 2.0. Several data providers still exist which only return SDMX-ML 2.0 messages.

  • Performance. Parsing some messages can be slow. Install pytest-profiling and run, for instance:

    $ py.test --profile --profile-svg -k xml_structure_insee
    $ python3 -m pstats prof/combined.prof
    % sort cumulative
    % stats
    

Use pd.DataFrame for internal storage

sdmx handles Observations as individual object instances. An alternative is to use pandas or other data structures internally. See:

  • sdmx/experimental.py for a partial mock-up of such code, and

  • tests/test_experimental.py for tests.

Choosing either the current or experimental DataSet as a default should be based on detailed performance (memory and time) evaluation under a variety of use-cases. To that end, note that the experimental DataSet involves three conversions:

  1. a reader parses the XML or JSON source, creates Observation instances, and adds them using DataSet.add_obs()

  2. experimental.DataSet.add_obs() populates a pd.DataFrame from these Observations, but discards them.

  3. experimental.DataSet.obs() creates new Observation instances.

For a fair comparison, the API between the readers and DataSet could be changed to eliminate the round trip in #1/#2, but without sacrificing the data model consistency provided by pydantic on Observation instances.

Inline TODOs

Todo

Support selection of language for conversion of InternationalString.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/checkouts/v1.2.0/doc/api.rst, line 142.)