Development

This page describes the development of sdmx. Contributions are welcome!

Code style

Testing

Specimens and data

Added in version 2.0.

A variety of specimens—example files from real web services, or published with the standards—are used to test that sdmx correctly reads and writes the different SDMX message formats.

Specimens are stored in the separate sdmx-test-data repository.

Running the test suite requires these files. The simplest way to do this is to give the --sdmx-fetch-data option when invoking pytest:

$ pytest --sdmx-fetch-data

This invokes SpecimenCollection.fetch(), which uses git (via GitPython) to retrieve and unpack the files to a directory like $HOME/.cache/sdmx/test-data/. See below for more advanced options.

Contents and layout

Specimen files are:

  • Arranged in directories with names matching particular sources in sources.json.

  • Named with:

    • Certain keywords:

      • -structure: a structure message, often associated with a file with a similar name containing a data message.

      • ts: time-series data, i.e. with a TimeDimensions at the level of individual Observations.

      • xs: cross-sectional data arranged in other ways.

      • flat: flat DataSets with all Dimensions at the Observation level.

      • ss: structure-specific data messages.

    • In some cases, the query string or data flow/structure ID as the file name.

    • Hyphens - instead of underscores _.

The recorded/ directory contains recorded HTTP responses from certain SDMX-REST web services. These files are stored using the requests_cache file system backend; see those docs for the name and format of the files.

Custom test data directory

It is also possible to place the test data in a specific directory; for instance, in order to commit new files to the specimen collection. Use one of the following methods:

  1. Obtain the files by one of two methods:

    1. Clone sdmx-test-data:

      $ git clone git@github.com:khaeru/sdmx-test-data.git
      
    2. Download https://github.com/khaeru/sdmx-test-data/archive/main.zip

  2. Indicate where pytest can find the files, by one of two methods:

    1. Set the SDMX_TEST_DATA environment variable:

      # Set the variable only for one command
      $ SDMX_TEST_DATA=/path/to/files pytest
      
      # Export the variable to the environment
      $ export SDMX_TEST_DATA
      $ pytest
      
    2. Give the option --sdmx-test-data=<PATH> when invoking pytest:

      $ pytest --sdmx-test-data=/path/to/files
      

Network vs. offline tests

Tests related to particular SDMX-REST web services can be categorized as:

  • Ensuring sdmx can interact with the service as-is.

    These include the full matrix of source-endpoint tests, which run on a nightly schedule because they are slow. They also include other tests (for instance, of code snippets appearing in this documentation) marked with the custom pytest mark @pytest.mark.network that make actual network requests. These tests may appear ‘flaky’: they are vulnerable to network interruptions, or temporary downtime/incapacity of the targeted service(s).

  • Ensuring sdmx can handle certain SDMX messages or HTTP responses returned by services. This should remain true whether or not those services actually return the same content as they did at the moment the tests were written.

    These are handled using recorded responses, as described above. This makes the test outcomes deterministic, even if the services are periodically unavailable.

    These tests use session_with_stored_responses(), which is an in-memory CachedSession prepared using:

Releasing

Before releasing, check:

Address any failures before releasing.

  1. Create a new branch:

    $ git checkout -v release/X.Y.Z
    
  2. Edit doc/whatsnew.rst. Comment the heading “Next release”, then insert another heading below it, at the same level, with the version number and date.

  3. Make a commit with a message like “Mark vX.Y.Z in doc/whatsnew”.

  4. Tag the version as a release candidate, i.e. with a rcN suffix, and push:

    $ git tag vX.Y.ZrcN
    $ git push --tags --set-upstream origin release/X.Y.Z
    
  5. Open a pull request with the title “Release vX.Y.Z” using this branch. Check:

    If needed, address any warnings or errors that appear and then continue from step (3), i.e. make (a) new commit(s) and tag, incrementing the release candidate number, e.g. from rc1 to rc2.

  6. Merge the PR using the “rebase and merge” method.

  7. (optional) Tag the release itself and push:

    $ git tag vX.Y.Z
    $ git push --tags origin main
    

    This step (but not step (3)) can also be performed directly on GitHub; see (7), next.

  8. Visit https://github.com/khaeru/sdmx/releases and mark the new release: either using the pushed tag from (7), or by creating the tag and release simultaneously.

  9. Check at https://github.com/khaeru/sdmx/actions?query=workflow:publish and https://pypi.org/project/sdmx1/ that the distributions are published.

Internal code reference

testing: Testing utilities

class sdmx.testing.MessageTest[source]

Bases: object

Base class for tests of specific specimen files.

directory: str | Path = PosixPath('.')[source]
filename: str[source]
msg(path)[source]
path(test_data_path)[source]
sdmx.testing.XFAIL = {'unsupported': MarkDecorator(mark=Mark(name='xfail', args=(), kwargs={'strict': True, 'reason': 'Not implemented by service', 'raises': (<class 'requests.exceptions.HTTPError'>, <class 'NotImplementedError'>, <class 'ValueError'>)})), 503: MarkDecorator(mark=Mark(name='xfail', args=(), kwargs={'raises': <class 'requests.exceptions.HTTPError'>, 'reason': '503 Server Error: Service Unavailable'}))}[source]

Marks for use below.

sdmx.testing.assert_pd_equal(left, right, **kwargs)[source]

Assert equality of two pandas objects.

sdmx.testing.generate_endpoint_tests(metafunc)[source]

pytest hook for parametrizing tests that need an “endpoint” fixture.

This function relies on the DataSourceTest base class defined in test_sources. It:

  • Generates one parametrization for every Resource (= REST API endpoint).

  • Applies pytest “xfail” (expected failure) marks according to:

    1. Source.supports, i.e. if the particular source is marked as not supporting certain endpoints, the test is expected to fail.

    2. DataSourceTest.xfail, any other failures defined on the source test class (e.g. DataSourceTest subclass).

    3. DataSourceTest.xfail_common, common failures.

sdmx.testing.installed_schemas(worker_id, mock_gh_api, tmp_path_factory)[source]

Fixture that ensures schemas are installed locally in a temporary directory.

sdmx.testing.mock_gh_api()[source]

Mock GitHub API responses to avoid hitting rate limits.

For each API endpoint URL queried by :func:.`_gh_zipball`, return a pared-down JSON response that contains the required “zipball_url” key.

sdmx.testing.pytest_addoption(parser)[source]

Add pytest command-line options.

sdmx.testing.pytest_configure(config)[source]

Handle the --sdmx-test-data command-line option.

sdmx.testing.pytest_generate_tests(metafunc)[source]

Generate tests.

Calls both parametrize_specimens() and generate_endpoint_tests().

sdmx.testing.pytest_sessionstart(session: Session) None[source]

Create session-wide objects.

These are used by the fixtures specimen(), testsource().

sdmx.testing.session_with_pytest_cache(pytestconfig)[source]

Fixture: A Session that caches within .pytest_cache.

This subdirectory is ephemeral, and tests must pass whether or not it exists and is populated.

sdmx.testing.session_with_stored_responses(pytestconfig)[source]

Fixture: A Session returns only stored responses from sdmx-test-data.

This session…

  1. uses the ‘memory’ requests_cache backend;

  2. contains the responses from testing.data.add_responses(); and

  3. is treated with offline(), so that only stored responses can be returned.

sdmx.testing.specimen(pytestconfig)[source]

Fixture: the SpecimenCollection.

sdmx.testing.test_data_path(pytestconfig)[source]

Fixture: the Path given as –sdmx-test-data.

sdmx.testing.testsource(pytestconfig)[source]

Fixture: the Source.id of a temporary data source.

Code for working with sdmx-test-data.

sdmx.testing.data.DEFAULT_DIR = PosixPath('/home/docs/.cache/sdmx/test-data')[source]

Default directory for local copy of sdmx-test-data.

sdmx.testing.data.EXPECTED = {'flat-json': {'index_col': [0, 1, 2, 3, 4, 5]}, 'ng-flat-xml': {'index_col': [0, 1, 2, 3, 4, 5]}, 'ng-ts-gf-xml': {'use': 'ng-flat-xml'}, 'ng-ts-xml': {'use': 'ng-flat-xml'}, 'ng-xs-xml': {'index_col': [0, 1, 2, 3, 4, 5]}, 'ts-json': {'use': 'flat-json'}, 'xs-json': {'index_col': [0, 1, 2, 3, 4, 5]}}[source]

Expected to_pandas() results for data files; see SpecimenCollection.expected_data()

  • Keys are the file name (see add_specimens()) with ‘.’ -> ‘-’: ‘foo.xml’ -> ‘foo-xml’.

  • Data is stored in sdmx-test-data/expected/KEY.txt.

  • Values are either argument to pandas.read_csv(); or a dict(use=’other-key’), in which case the info for other-key is used instead.

sdmx.testing.data.REMOTE_URL = 'https://github.com/khaeru/sdmx-test-data.git'[source]

Git remote URL for cloning test data.

class sdmx.testing.data.SpecimenCollection(path: Path, fetch: bool)[source]

Bases: object

Collection of test specimens.

Parameters:
  • path – Path containing sdmx-test-data, or to which to clone the repository.

  • fetch – If True, call fetch()

as_params(format: str | None = None, kind: str | None = None, marks: dict | None = None)[source]

Generate pytest.param() from specimens.

One param() is generated for each specimen that matches the format and kind arguments (if any). Marks are attached to each param from marks, wherein the keys are partial paths.

base_path: Path[source]

Base path containing the specimen collection.

expected_data(path)[source]

Return the expected to_pandas() result for the specimen path.

Data is retrieved from EXPECTED.

fetch() None[source]

Fetch test data from GitHub.

parametrize(metafunc) None[source]

Handle the parametrize_specimens mark for a specific test.

specimens: list[tuple[Path, str, str | None]][source]

Each tuple contains:

  1. Path to specimen file.

  2. Format: one of “csv”, “json”, “xml”.

  3. Message type: either “data”, “structure”, or None.

sdmx.testing.data.add_responses(session: CachedSession, file_cache_path: Path, source: Source) None[source]

Populate cached responses for session.

Two sources are used:

  1. Responses stored in sdmx-test-data/responses/, as indicated by file_cache_path.

  2. For the TEST source as indicated by source, responses generated by this function. These are not stored in sdmx-test-data.

sdmx.testing.data.add_specimens(target: list[tuple[Path, str, str | None]], base: Path) None[source]

Populate the target collection with specimens from sdmx-test-data.

class sdmx.testing.report.ServiceReporter(config)[source]

Bases: object

Report tests of individual data sources.

pytest_runtest_makereport(item, call)[source]
pytest_sessionfinish(session, exitstatus)[source]

Write results for each source to a separate JSON file.

sdmx.testing.report.main(base_path: Path | None = None)[source]

Collate results from multiple JSON files.

util: Utilities

sdmx.util.compare(attr, a, b, strict: bool) bool[source]

Return True if a.attr == b.attr.

If strict is False, None is permissible as a or b; otherwise,

sdmx.util.direct_fields(cls) Iterable[Field][source]

Return the data class fields defined on cls or its class.

This is like the __fields__ attribute, but excludes the fields defined on any parent class(es).

sdmx.util.only(iterator: Iterator) Any[source]

Return the only element of iterator, or None.

sdmx.util.parse_content_type(value: str) tuple[str, dict[str, Any]][source]

Return content type and parameters from value.

Modified from requests.util.

sdmx.util.ucfirst(value: str) str[source]

Return value with its first character transformed to upper-case.

Utilities for working with requests and related packages.

class sdmx.util.requests.CacheMixin(cache_name: Path | str = 'http_cache', backend: str | BaseCache | None = None, serializer: str | SerializerPipeline | Stage | None = None, expire_after: None | int | float | str | datetime | timedelta = -1, urls_expire_after: Dict[str | Pattern, None | int | float | str | datetime | timedelta] | None = None, cache_control: bool = False, allowable_codes: Iterable[int] = (200,), allowable_methods: Iterable[str] = ('GET', 'HEAD'), always_revalidate: bool = False, ignored_parameters: Iterable[str] = ('Authorization', 'X-API-KEY', 'access_token', 'api_key'), match_headers: Iterable[str] | bool = False, filter_fn: Callable[[Response], bool] | None = None, key_fn: Callable[[...], str] | None = None, stale_if_error: bool | int = False, **kwargs)[source]

Bases: object

Mixin class that extends requests.Session with caching features. See CachedSession for usage details.

cache_disabled()[source]

Context manager for temporary disabling the cache

Warning

This method is not thread-safe.

Example

>>> s = CachedSession()
>>> with s.cache_disabled():
...     s.get('https://httpbin.org/ip')
close()[source]

Close the session and any open backend connections

delete(url: str, **kwargs) OriginalResponse | CachedResponse[source]
property expire_after: None | int | float | str | datetime | timedelta[source]
get(url: str, params=None, **kwargs) OriginalResponse | CachedResponse[source]
head(url: str, **kwargs) OriginalResponse | CachedResponse[source]
options(url: str, **kwargs) OriginalResponse | CachedResponse[source]
patch(url: str, data=None, **kwargs) OriginalResponse | CachedResponse[source]
post(url: str, data=None, **kwargs) OriginalResponse | CachedResponse[source]
put(url: str, data=None, **kwargs) OriginalResponse | CachedResponse[source]
request(method: str, url: str, *args, headers: MutableMapping[str, str] | None = None, expire_after: None | int | float | str | datetime | timedelta = None, only_if_cached: bool = False, refresh: bool = False, force_refresh: bool = False, **kwargs) OriginalResponse | CachedResponse[source]

This method prepares and sends a request while automatically performing any necessary caching operations. This will be called by any other method-specific requests functions (get, post, etc.). This is not used by PreparedRequest objects, which are handled by send().

See requests.Session.request() for base parameters. Additional parameters:

Parameters:
  • expire_after – Expiration time to set only for this request. See Expiration for details.

  • only_if_cached – Only return results from the cache. If not cached, return a 504 response instead of sending a new request.

  • refresh – Revalidate with the server before using a cached response, and refresh if needed (e.g., a “soft refresh,” like F5 in a browser)

  • force_refresh – Always make a new request, and overwrite any previously cached response (e.g., a “hard refresh”, like Ctrl-F5 in a browser))

Returns:

Either a new or cached response

send(request: PreparedRequest, expire_after: None | int | float | str | datetime | timedelta = None, only_if_cached: bool = False, refresh: bool = False, force_refresh: bool = False, **kwargs) OriginalResponse | CachedResponse[source]

Send a prepared request, with caching. See requests.Session.send() for base parameters, and see request() for extra parameters.

Order of operations: For reference, a request will pass through the following methods:

  1. requests.get(), CachedSession.get(), etc. (optional)

  2. CachedSession.request()

  3. requests.Session.request()

  4. CachedSession.send()

  5. BaseCache.get_response()

  6. requests.Session.send() (if not using a cached response)

  7. BaseCache.save_response() (if not using a cached response)

property settings: CacheSettings[source]

Settings that affect cache behavior

classmethod wrap(original_session: Session, **kwargs) CacheMixin[source]

Add caching to an existing Session object, while retaining all original session settings.

Parameters:
  • original_session – Session object to wrap

  • kwargs – Keyword arguments for CachedSession

sdmx.util.requests.HAS_REQUESTS_CACHE = True[source]

True if requests_cache is installed.

class sdmx.util.requests.OfflineAdapter[source]

Bases: BaseAdapter

A request Adapter that raises RuntimeError for every request.

See also

offline

send(request, **kwargs)[source]

Sends PreparedRequest object. Returns Response object.

Parameters:
  • request – The PreparedRequest being sent.

  • stream – (optional) Whether to stream the request content.

  • timeout (float or tuple) – (optional) How long to wait for the server to send data before giving up, as a float, or a (connect timeout, read timeout) tuple.

  • verify – (optional) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use

  • cert – (optional) Any user-provided SSL certificate to be trusted.

  • proxies – (optional) The proxies dictionary to apply to the request.

class sdmx.util.requests.SessionAttrs[source]

Bases: TypedDict

Attributes of requests.Session.

These are not available from requests itself, thus recorded here for use in sdmx.session.Session.__init__().

adapters: dict[source]
auth: object | None[source]
cert: str | tuple[str, str][source]
cookies: http.cookiejar.CookieJar[source]
headers: dict[source]
hooks: dict[source]
max_redirects: int[source]
params: dict[source]
proxies: dict[source]
stream: bool[source]
trust_env: bool[source]
verify: bool[source]
sdmx.util.requests.offline(s) None[source]

Make session s behave as if offline.

Replace all of the Session.adapters of s with instances of OfflineAdapter. This has the effect that any request made through s that is not handled in some other way (for instance, by requests_cache) will raise RuntimeError.

sdmx.util.requests.save_response(session: CachedSession, method: str, url: str, content: bytes, headers: dict) None[source]

Store a response in the cache of session.

Inline TODOs

Todo

Support passing a URN.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/envs/stable/lib/python3.13/site-packages/sdmx/message.py:docstring of sdmx.message.StructureMessage.get, line 7.)

Todo

Store as Annotation or temporary attribute values on obs.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/envs/stable/lib/python3.13/site-packages/sdmx/reader/csv.py:docstring of sdmx.reader.csv.Custom, line 5.)

Todo

Support selection of language for conversion of InternationalString.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/checkouts/stable/doc/api/writer.rst, line 80.)

Todo

Currently other functions in writer.xml all pass the style argument to this function. As an enhancement, allow user or automatic selection of different reference styles.

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/envs/stable/lib/python3.13/site-packages/sdmx/writer/xml.py:docstring of sdmx.writer.xml.reference, line 3.)