Development¶
This page describes the development of sdmx
.
Contributions are welcome!
For current development priorities, see the list of GitHub milestones and issues/PRs targeted to each.
For wishlist features, see issues on GitHub tagged ‘enh’ or ‘wishlist’.
Code style¶
This project uses, via pre-commit:
These must be applied to new or modified code. This can be done manually, or through code editor plug-ins. Pre-commit hooks for git can be installed via:
pip install pre-commit pre-commit install
These will ensure that each commit is compliant with the code style.
The pytest.yaml GitHub Actions workflow checks code quality for pull requests and commits. This check must pass for pull requests to be merged.
Write docstrings in the numpydoc style.
Testing¶
Specimens and data¶
Added in version 2.0.
A variety of specimens—example files from real web services, or published with the standards—are used to test that sdmx
correctly reads and writes the different SDMX message formats.
Specimens are stored in the separate sdmx-test-data repository.
Running the test suite requires these files. The simplest way to do this is to give the --sdmx-fetch-data option when invoking pytest:
$ pytest --sdmx-fetch-data
This invokes SpecimenCollection.fetch()
, which uses git (via GitPython) to retrieve and unpack the files to a directory like $HOME/.cache/sdmx/test-data/
.
See below for more advanced options.
Contents and layout¶
Specimen files are:
Arranged in directories with names matching particular sources in
sources.json
.Named with:
Certain keywords:
-structure
: a structure message, often associated with a file with a similar name containing a data message.ts
: time-series data, i.e. with a TimeDimensions at the level of individual Observations.xs
: cross-sectional data arranged in other ways.flat
: flat DataSets with all Dimensions at the Observation level.ss
: structure-specific data messages.
In some cases, the query string or data flow/structure ID as the file name.
Hyphens
-
instead of underscores_
.
The recorded/
directory contains recorded HTTP responses from certain SDMX-REST web services.
These files are stored using the requests_cache
file system backend; see those docs for the name and format of the files.
Custom test data directory¶
It is also possible to place the test data in a specific directory; for instance, in order to commit new files to the specimen collection. Use one of the following methods:
Obtain the files by one of two methods:
Clone
sdmx-test-data
:$ git clone git@github.com:khaeru/sdmx-test-data.git
Download https://github.com/khaeru/sdmx-test-data/archive/main.zip
Indicate where pytest can find the files, by one of two methods:
Set the
SDMX_TEST_DATA
environment variable:# Set the variable only for one command $ SDMX_TEST_DATA=/path/to/files pytest # Export the variable to the environment $ export SDMX_TEST_DATA $ pytest
Give the option
--sdmx-test-data=<PATH>
when invoking pytest:$ pytest --sdmx-test-data=/path/to/files
Network vs. offline tests¶
Tests related to particular SDMX-REST web services can be categorized as:
Ensuring
sdmx
can interact with the service as-is.These include the full matrix of source-endpoint tests, which run on a nightly schedule because they are slow. They also include other tests (for instance, of code snippets appearing in this documentation) marked with the custom pytest mark
@pytest.mark.network
that make actual network requests. These tests may appear ‘flaky’: they are vulnerable to network interruptions, or temporary downtime/incapacity of the targeted service(s).Ensuring
sdmx
can handle certain SDMX messages or HTTP responses returned by services. This should remain true whether or not those services actually return the same content as they did at the moment the tests were written.These are handled using recorded responses, as described above. This makes the test outcomes deterministic, even if the services are periodically unavailable.
These tests use
session_with_stored_responses()
, which is an in-memoryCachedSession
prepared using:The recorded/stored responses from
sdmx-test-data
.Other responses generated by
add_responses()
/save_response()
.offline()
/OfflineAdapter
. This ensures that only the cached URLs/requests can be queried; all other queries raiseRuntimeError
.
Releasing¶
Before releasing, check:
https://github.com/khaeru/sdmx/actions?query=workflow:test+branch:main to ensure that the push and scheduled builds are passing.
https://readthedocs.org/projects/sdmx1/builds/ to ensure that the docs build is passing.
Address any failures before releasing.
Create a new branch:
$ git checkout -v release/X.Y.Z
Edit
doc/whatsnew.rst
. Comment the heading “Next release”, then insert another heading below it, at the same level, with the version number and date.Make a commit with a message like “Mark vX.Y.Z in doc/whatsnew”.
Tag the version as a release candidate, i.e. with a
rcN
suffix, and push:$ git tag vX.Y.ZrcN $ git push --tags --set-upstream origin release/X.Y.Z
Open a pull request with the title “Release vX.Y.Z” using this branch. Check:
at https://github.com/khaeru/sdmx/actions?query=workflow:publish that the workflow completes: the package builds successfully and is published to TestPyPI.
at https://test.pypi.org/project/sdmx1/ that:
The package can be downloaded, installed and run.
The README is rendered correctly.
If needed, address any warnings or errors that appear and then continue from step (3), i.e. make (a) new commit(s) and tag, incrementing the release candidate number, e.g. from
rc1
torc2
.Merge the PR using the “rebase and merge” method.
(optional) Tag the release itself and push:
$ git tag vX.Y.Z $ git push --tags origin main
This step (but not step (3)) can also be performed directly on GitHub; see (7), next.
Visit https://github.com/khaeru/sdmx/releases and mark the new release: either using the pushed tag from (7), or by creating the tag and release simultaneously.
Check at https://github.com/khaeru/sdmx/actions?query=workflow:publish and https://pypi.org/project/sdmx1/ that the distributions are published.
Internal code reference¶
testing
: Testing utilities¶
- class sdmx.testing.MessageTest[source]¶
Bases:
object
Base class for tests of specific specimen files.
- sdmx.testing.XFAIL = {'unsupported': MarkDecorator(mark=Mark(name='xfail', args=(), kwargs={'strict': True, 'reason': 'Not implemented by service', 'raises': (<class 'requests.exceptions.HTTPError'>, <class 'NotImplementedError'>, <class 'ValueError'>)})), 503: MarkDecorator(mark=Mark(name='xfail', args=(), kwargs={'raises': <class 'requests.exceptions.HTTPError'>, 'reason': '503 Server Error: Service Unavailable'}))}[source]¶
Marks for use below.
- sdmx.testing.generate_endpoint_tests(metafunc)[source]¶
pytest hook for parametrizing tests that need an “endpoint” fixture.
This function relies on the
DataSourceTest
base class defined intest_sources
. It:Generates one parametrization for every
Resource
(= REST API endpoint).Applies pytest “xfail” (expected failure) marks according to:
Source.supports
, i.e. if the particular source is marked as not supporting certain endpoints, the test is expected to fail.DataSourceTest.xfail
, any other failures defined on the source test class (e.g.DataSourceTest
subclass).DataSourceTest.xfail_common
, common failures.
- sdmx.testing.installed_schemas(worker_id, mock_gh_api, tmp_path_factory)[source]¶
Fixture that ensures schemas are installed locally in a temporary directory.
- sdmx.testing.mock_gh_api()[source]¶
Mock GitHub API responses to avoid hitting rate limits.
For each API endpoint URL queried by :func:.`_gh_zipball`, return a pared-down JSON response that contains the required “zipball_url” key.
- sdmx.testing.pytest_generate_tests(metafunc)[source]¶
Generate tests.
Calls both
parametrize_specimens()
andgenerate_endpoint_tests()
.
- sdmx.testing.pytest_sessionstart(session: Session) None [source]¶
Create session-wide objects.
These are used by the fixtures
specimen()
,testsource()
.
- sdmx.testing.session_with_pytest_cache(pytestconfig)[source]¶
Fixture: A
Session
that caches within.pytest_cache
.This subdirectory is ephemeral, and tests must pass whether or not it exists and is populated.
- sdmx.testing.session_with_stored_responses(pytestconfig)[source]¶
Fixture: A
Session
returns only stored responses from sdmx-test-data.This session…
uses the ‘memory’
requests_cache
backend;contains the responses from
testing.data.add_responses()
; andis treated with
offline()
, so that only stored responses can be returned.
Code for working with sdmx-test-data
.
- sdmx.testing.data.DEFAULT_DIR = PosixPath('/home/docs/.cache/sdmx/test-data')[source]¶
Default directory for local copy of sdmx-test-data.
- sdmx.testing.data.EXPECTED = {'flat-json': {'index_col': [0, 1, 2, 3, 4, 5]}, 'ng-flat-xml': {'index_col': [0, 1, 2, 3, 4, 5]}, 'ng-ts-gf-xml': {'use': 'ng-flat-xml'}, 'ng-ts-xml': {'use': 'ng-flat-xml'}, 'ng-xs-xml': {'index_col': [0, 1, 2, 3, 4, 5]}, 'ts-json': {'use': 'flat-json'}, 'xs-json': {'index_col': [0, 1, 2, 3, 4, 5]}}[source]¶
Expected to_pandas() results for data files; see
SpecimenCollection.expected_data()
Keys are the file name (see
add_specimens()
) with ‘.’ -> ‘-’: ‘foo.xml’ -> ‘foo-xml’.Data is stored in
sdmx-test-data/expected/KEY.txt
.Values are either argument to
pandas.read_csv()
; or a dict(use=’other-key’), in which case the info for other-key is used instead.
- sdmx.testing.data.REMOTE_URL = 'https://github.com/khaeru/sdmx-test-data.git'[source]¶
Git remote URL for cloning test data.
- class sdmx.testing.data.SpecimenCollection(path: Path, fetch: bool)[source]¶
Bases:
object
Collection of test specimens.
- Parameters:
- as_params(format: str | None = None, kind: str | None = None, marks: dict | None = None)[source]¶
Generate
pytest.param()
from specimens.One
param()
is generated for each specimen that matches the format and kind arguments (if any). Marks are attached to each param from marks, wherein the keys are partial paths.
- expected_data(path)[source]¶
Return the expected
to_pandas()
result for the specimen path.Data is retrieved from
EXPECTED
.
- sdmx.testing.data.add_responses(session: CachedSession, file_cache_path: Path, source: Source) None [source]¶
Populate cached responses for session.
Two sources are used:
Responses stored in
sdmx-test-data/responses/
, as indicated by file_cache_path.For the
TEST
source as indicated by source, responses generated by this function. These are not stored in sdmx-test-data.
- sdmx.testing.data.add_specimens(target: list[tuple[Path, str, str | None]], base: Path) None [source]¶
Populate the target collection with specimens from
sdmx-test-data
.
util
: Utilities¶
- sdmx.util.compare(attr, a, b, strict: bool) bool [source]¶
Return
True
ifa.attr
==b.attr
.If strict is
False
,None
is permissible as a or b; otherwise,
- sdmx.util.direct_fields(cls) Iterable[Field] [source]¶
Return the data class fields defined on cls or its class.
This is like the
__fields__
attribute, but excludes the fields defined on any parent class(es).
- sdmx.util.parse_content_type(value: str) tuple[str, dict[str, Any]] [source]¶
Return content type and parameters from value.
Modified from
requests.util
.
- sdmx.util.ucfirst(value: str) str [source]¶
Return value with its first character transformed to upper-case.
Utilities for working with requests
and related packages.
- class sdmx.util.requests.CacheMixin(cache_name: Path | str = 'http_cache', backend: str | BaseCache | None = None, serializer: str | SerializerPipeline | Stage | None = None, expire_after: None | int | float | str | datetime | timedelta = -1, urls_expire_after: Dict[str | Pattern, None | int | float | str | datetime | timedelta] | None = None, cache_control: bool = False, allowable_codes: Iterable[int] = (200,), allowable_methods: Iterable[str] = ('GET', 'HEAD'), always_revalidate: bool = False, ignored_parameters: Iterable[str] = ('Authorization', 'X-API-KEY', 'access_token', 'api_key'), match_headers: Iterable[str] | bool = False, filter_fn: Callable[[Response], bool] | None = None, key_fn: Callable[[...], str] | None = None, stale_if_error: bool | int = False, **kwargs)[source]¶
Bases:
object
Mixin class that extends
requests.Session
with caching features. SeeCachedSession
for usage details.- cache_disabled()[source]¶
Context manager for temporary disabling the cache
Warning
This method is not thread-safe.
Example
>>> s = CachedSession() >>> with s.cache_disabled(): ... s.get('https://httpbin.org/ip')
- delete(url: str, **kwargs) OriginalResponse | CachedResponse [source]¶
- get(url: str, params=None, **kwargs) OriginalResponse | CachedResponse [source]¶
- head(url: str, **kwargs) OriginalResponse | CachedResponse [source]¶
- options(url: str, **kwargs) OriginalResponse | CachedResponse [source]¶
- patch(url: str, data=None, **kwargs) OriginalResponse | CachedResponse [source]¶
- post(url: str, data=None, **kwargs) OriginalResponse | CachedResponse [source]¶
- put(url: str, data=None, **kwargs) OriginalResponse | CachedResponse [source]¶
- request(method: str, url: str, *args, headers: MutableMapping[str, str] | None = None, expire_after: None | int | float | str | datetime | timedelta = None, only_if_cached: bool = False, refresh: bool = False, force_refresh: bool = False, **kwargs) OriginalResponse | CachedResponse [source]¶
This method prepares and sends a request while automatically performing any necessary caching operations. This will be called by any other method-specific
requests
functions (get, post, etc.). This is not used byPreparedRequest
objects, which are handled bysend()
.See
requests.Session.request()
for base parameters. Additional parameters:- Parameters:
expire_after – Expiration time to set only for this request. See Expiration for details.
only_if_cached – Only return results from the cache. If not cached, return a 504 response instead of sending a new request.
refresh – Revalidate with the server before using a cached response, and refresh if needed (e.g., a “soft refresh,” like F5 in a browser)
force_refresh – Always make a new request, and overwrite any previously cached response (e.g., a “hard refresh”, like Ctrl-F5 in a browser))
- Returns:
Either a new or cached response
- send(request: PreparedRequest, expire_after: None | int | float | str | datetime | timedelta = None, only_if_cached: bool = False, refresh: bool = False, force_refresh: bool = False, **kwargs) OriginalResponse | CachedResponse [source]¶
Send a prepared request, with caching. See
requests.Session.send()
for base parameters, and seerequest()
for extra parameters.Order of operations: For reference, a request will pass through the following methods:
requests.get()
,CachedSession.get()
, etc. (optional)CachedSession.request()
CachedSession.send()
BaseCache.get_response()
requests.Session.send()
(if not using a cached response)BaseCache.save_response()
(if not using a cached response)
- property settings: CacheSettings[source]¶
Settings that affect cache behavior
- classmethod wrap(original_session: Session, **kwargs) CacheMixin [source]¶
Add caching to an existing
Session
object, while retaining all original session settings.- Parameters:
original_session – Session object to wrap
kwargs – Keyword arguments for
CachedSession
- class sdmx.util.requests.OfflineAdapter[source]¶
Bases:
BaseAdapter
A request Adapter that raises
RuntimeError
for every request.See also
- send(request, **kwargs)[source]¶
Sends PreparedRequest object. Returns Response object.
- Parameters:
request – The
PreparedRequest
being sent.stream – (optional) Whether to stream the request content.
timeout (float or tuple) – (optional) How long to wait for the server to send data before giving up, as a float, or a (connect timeout, read timeout) tuple.
verify – (optional) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use
cert – (optional) Any user-provided SSL certificate to be trusted.
proxies – (optional) The proxies dictionary to apply to the request.
- class sdmx.util.requests.SessionAttrs[source]¶
Bases:
TypedDict
Attributes of
requests.Session
.These are not available from
requests
itself, thus recorded here for use insdmx.session.Session.__init__()
.- cookies: http.cookiejar.CookieJar[source]¶
- sdmx.util.requests.offline(s) None [source]¶
Make session s behave as if offline.
Replace all of the
Session.adapters
of s with instances ofOfflineAdapter
. This has the effect that any request made through s that is not handled in some other way (for instance, byrequests_cache
) will raiseRuntimeError
.
Inline TODOs¶
Todo
Support passing a URN.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/envs/stable/lib/python3.13/site-packages/sdmx/message.py:docstring of sdmx.message.StructureMessage.get, line 7.)
Todo
Store as Annotation
or temporary attribute values on obs.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/envs/stable/lib/python3.13/site-packages/sdmx/reader/csv.py:docstring of sdmx.reader.csv.Custom, line 5.)
Todo
Support selection of language for conversion of
InternationalString
.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/checkouts/stable/doc/api/writer.rst, line 80.)
Todo
Currently other functions in writer.xml
all pass the style
argument to this function. As an enhancement, allow user or automatic selection
of different reference styles.
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/sdmx1/envs/stable/lib/python3.13/site-packages/sdmx/writer/xml.py:docstring of sdmx.writer.xml.reference, line 3.)