Ska Package Helpers

Ska_helpers is a collection of utilities for the Ska3 runtime environment.

ska_helpers.test(*args, **kwargs)[source]

Run py.test unit tests.

Chandra Models Data

Get data from chandra_models repository.

ska_helpers.chandra_models.chandra_models_cache(func)[source]

Decorator to cache outputs for a function that gets chandra_models data.

The key used for caching the function output includes the passed arguments and keyword arguments, as well as the values of the environment variables below. This ensures that the cache is invalidated if any of these environment variables change:

  • CHANDRA_MODELS_REPO_DIR

  • CHANDRA_MODELS_DEFAULT_VERSION

  • THERMAL_MODELS_DIR_FOR_MATLAB_TOOLS_SW

Example:

@chandra_models_cache
def get_aca_spec_info(version=None):
    _, info = get_data("chandra_models/xija/aca/aca_spec.json", version=version)
    return info
ska_helpers.chandra_models.get_data(file_path: str | Path, version: str | None = None, repo_path: str | Path | None = None, require_latest_version: bool = False, timeout: int | float = 5, read_func: Callable | None = None, read_func_kwargs: dict | None = None) tuple[source]

Get data from chandra_models repository.

There are three environment variables that impact the behavior:

  • CHANDRA_MODELS_REPO_DIR or THERMAL_MODELS_DIR_FOR_MATLAB_TOOLS_SW: override the default root for the chandra_models repository

  • CHANDRA_MODELS_DEFAULT_VERSION: override the default repo version. You can set this to a fixed version in unit tests (e.g. with monkeypatch), or set to a developement branch to test a model file update with applications like yoshi where specifying a version would require a long chain of API updates.

THERMAL_MODELS_DIR_FOR_MATLAB_TOOLS_SW is used to define the chandra_models repository location when running in the MATLAB tools software environment. If this environment variable is set, then the git is_dirty() check of the chandra_models directory is skipped as the chandra_models repository is verified via SVN in the MATLAB tools software environment. Users in the FOT Matlab tools should exercise caution if using locally-modified files for testing, as the version information reported by this function in that case will not be correct.

Parameters:
file_pathstr, Path

Name of model

versionstr

Tag, branch or commit of chandra_models to use (default=latest tag from repo). If the CHANDRA_MODELS_DEFAULT_VERSION environment variable is set then this is used as the default. This is useful for testing.

repo_pathstr, Path

Path to directory or URL containing chandra_models repository (default is $SKA/data/chandra_models or either of the CHANDRA_MODELS_REPO_DIR or THERMAL_MODELS_DIR_FOR_MATLAB_TOOLS_SW environment variables if set).

require_latest_versionbool

Require that version matches the latest release on GitHub

timeoutint, float

Timeout (sec) for querying GitHub for the expected chandra_models version. Default = 5 sec.

read_funccallable

Optional function to read the data file. This function must take the file path as its first argument. If not provided then read the file as a text file.

read_func_kwargsdict

Optional dict of kwargs to pass to read_func.

Returns:
tuple of dict, str

Xija model specification dict, chandra_models version

Examples

First we read the model specification for the ACA model. The get_data() function returns the text of the model spec so we need to use json.loads() to convert it to a dict.

>>> import json
>>> from astropy.io import fits
>>> from ska_helpers import chandra_models

>>> txt, info = chandra_models.get_data("chandra_models/xija/aca/aca_spec.json")
>>> model_spec = json.loads(txt)
>>> model_spec["name"]
'aacccdpt'

Next we read the acquisition probability model image. Since the image is a gzipped FITS file we need to use a helper function to read it.

>>> def read_fits_image(file_path):
...     with fits.open(file_path) as hdus:
...         out = hdus[1].data
...     return out, file_path
...
>>> acq_model_image, info = chandra_models.get_data(
...     "chandra_models/aca_acq_prob/grid-floor-2018-11.fits.gz",
...     read_func=read_fits_image
... )
>>> acq_model_image.shape
(141, 31, 7)

Now let’s get the version of the chandra_models repository:

>>> chandra_models.get_repo_version()
'3.47'

Finally get version 3.30 of the ACA model spec from GitHub. The use of a lambda function to read the JSON file is compact but not recommended for production code.

>>> model_spec_3_30, info = chandra_models.get_data(
...     "chandra_models/xija/aca/aca_spec.json",
...     version="3.30",
...     repo_path="https://github.com/sot/chandra_models.git",
...     read_func=lambda fn: (json.load(open(fn, "rb")), fn),
... )
>>> model_spec_3_30 == model_spec
False
ska_helpers.chandra_models.get_github_version(url: str = 'https://api.github.com/repos/sot/chandra_models/releases/latest', timeout: int | float = 5) bool | None[source]

Get latest chandra_models GitHub repo release tag (version).

This queries GitHub for the latest release of chandra_models.

Parameters:
urlstr

URL for latest chandra_models release on GitHub API

timeoutint, float

Request timeout (sec, default=5)

Returns:
str, None

Tag name (str) of latest chandra_models release on GitHub. None if the request timed out, indicating indeterminate answer.

ska_helpers.chandra_models.get_repo_version(repo_path: Path | None = None, repo: Repo | None = None) str[source]

Return version (most recent tag) of models repository.

Returns:
str

Version (most recent tag) of models repository

Environment

The ska_helpers.environment module provides a function to configure the Ska3 runtime environment at the point of import of every Ska3 package.

ska_helpers.environment.configure_ska_environment()[source]

Configure environment for Ska3 runtime.

This is called by ska_helpers.version.get_version() and thus gets called upon import of every Ska3 package.

This includes setting NUMBA_CACHE_DIR to $HOME/.ska3/cache/numba if that env var is not already defined. This is to avoid problems with read-only filesystems.

Git helpers

Helper functions for using git.

ska_helpers.git_helpers.make_git_repo_safe(path: str | Path) None[source]

Ensure git repo at path is a safe git repository.

A “safe” repo is one which is owned by the user calling this function. See: https://github.blog/2022-04-12-git-security-vulnerability-announced/#cve-2022-24765

If an unsafe repo is detected then this command issues a warning to that effect and then updates the user’s git config to add this repo as a safe directory.

This function should only be called for known safe git repos such as $SKA/data/chandra_models.

Parameters:

path – str, Path Path to top level of a git repository

Logging

ska_helpers.logging.basic_logger(name, format='%(asctime)s %(funcName)s: %(message)s', propagate=False, **kwargs)[source]

Create logger name using logging.basicConfig.

This is a thin wrapper around logging.basicConfig, except:

  • Uses logger named name instead of the root logger

  • Defaults to a standard format for Ska applications. Specify format=None to use the default basicConfig format.

  • Not recommended for multithreaded or multiprocess applications due to using a temporary monkey-patch of a global variable to create the logger. It will probably work but it is not guaranteed.

This function does nothing if the name logger already has handlers configured, unless the keyword argument force is set to True. It is a convenience method intended to do one-shot creation of a logger.

The default behaviour is to create a StreamHandler which writes to sys.stderr, set a formatter using the format string "%(asctime)s %(funcName)s: %(message)s", and add the handler to the name logger with a level of WARNING.

By default the created logger will not propagate to parent loggers. This is to prevent unexpected logging from other packages that set up a root logger. To propagate to parent loggers, set propagate=True. See https://docs.python.org/3/howto/logging.html#logging-flow, in particular how the log level of parent loggers is ignored in message handling.

Example:

# In __init__.py for a package or in any module
from ska_helpers.logging import basic_logger
logger = basic_logger(__name__, level='INFO')

# In other submodules within a package the normal usage is to inherit
# the package logger.
import logging
logger = logging.getLogger(__name__)

A number of optional keyword arguments may be specified, which can alter the default behaviour.

filename

Specifies that a FileHandler be created, using the specified filename, rather than a StreamHandler.

filemode

Specifies the mode to open the file, if filename is specified (if filemode is unspecified, it defaults to ‘a’).

format

Use the specified format string for the handler.

datefmt

Use the specified date/time format.

style

If a format string is specified, use this to specify the type of format string (possible values ‘%’, ‘{’, ‘$’, for %-formatting, str.format() and string.Template - defaults to ‘%’).

level

Set the name logger level to the specified level. This can be a number (10, 20, …) or a string (‘NOTSET’, ‘DEBUG’, ‘INFO’, ‘WARNING’, ‘ERROR’, ‘CRITICAL’) or logging.DEBUG, etc.

stream

Use the specified stream to initialize the StreamHandler. Note that this argument is incompatible with ‘filename’ - if both are present, ‘stream’ is ignored.

handlers

If specified, this should be an iterable of already created handlers, which will be added to the name handler. Any handler in the list which does not have a formatter assigned will be assigned the formatter created in this function.

force

If this keyword is specified as true, any existing handlers attached to the name logger are removed and closed, before carrying out the configuration as specified by the other arguments.

Note that you could specify a stream created using open(filename, mode) rather than passing the filename and mode in. However, it should be remembered that StreamHandler does not close its stream (since it may be using sys.stdout or sys.stderr), whereas FileHandler closes its stream when the handler is closed.

Note this function is probably not thread-safe.

Parameters:
namestr

logger name

formatstr

format string for handler

propagate: bool

propagate to parent loggers (default=False)

**kwargsdict

other keyword arguments for logging.basicConfig

Returns:
loggerLogger object

Retry

Retry package initially copied from https://github.com/invl/retry.

This project appears to be abandoned so moving it to ska_helpers.

LICENSE:

Copyright 2014 invl

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
exception ska_helpers.retry.RetryError(failures)[source]

Keep track of the stack of exceptions when trying multiple times.

Parameters:
failureslist of dict, each with keys ‘type’, ‘value’, ‘trace’.
ska_helpers.retry.retry(exceptions=<class 'Exception'>, tries=-1, delay=0, max_delay=None, backoff=1, jitter=0, logger=<Logger ska_helpers.retry.api (WARNING)>, mangle_alert_words=False)[source]

Returns a retry decorator.

Parameters:
  • exceptions – an exception or a tuple of exceptions to catch. default: Exception.

  • tries – the maximum number of attempts. default: -1 (infinite).

  • delay – initial delay between attempts. default: 0.

  • max_delay – the maximum value of delay. default: None (no limit).

  • backoff – multiplier applied to delay between attempts. default: 1 (no backoff).

  • jitter – extra seconds added to delay between attempts. default: 0. fixed if a number, random if a range tuple (min, max)

  • logger – logger.warning(fmt, error, delay) will be called on failed attempts. default: retry.logging_logger. if None, logging is disabled.

  • mangle_alert_words – if True, mangle alert words “warning”, “error”, “fatal”, “exception” when issuing a logger warning message. Default: False.

Returns:

a retry decorator.

ska_helpers.retry.retry_call(f, args=None, kwargs=None, exceptions=<class 'Exception'>, tries=-1, delay=0, max_delay=None, backoff=1, jitter=0, logger=<Logger ska_helpers.retry.api (WARNING)>, mangle_alert_words=False)[source]

Calls a function and re-executes it if it failed.

Parameters:
  • f – the function to execute.

  • args – the positional arguments of the function to execute.

  • kwargs – the named arguments of the function to execute.

  • exceptions – an exception or a tuple of exceptions to catch. default: Exception.

  • tries – the maximum number of attempts. default: -1 (infinite).

  • delay – initial delay between attempts. default: 0.

  • max_delay – the maximum value of delay. default: None (no limit).

  • backoff – multiplier applied to delay between attempts. default: 1 (no backoff).

  • jitter – extra seconds added to delay between attempts. default: 0. fixed if a number, random if a range tuple (min, max)

  • logger – logger.warning(fmt, error, delay) will be called on failed attempts. default: retry.logging_logger. if None, logging is disabled.

  • mangle_alert_words – if True, mangle alert words “warning”, “error”, “fatal”, “exception”, “fail” when issuing a logger warning message. Default: False.

Returns:

the result of the f function.

ska_helpers.retry.tables_open_file(*args, **kwargs)[source]

Call tables.open_file(*args, **kwargs) with retry up to 3 times.

This only catches tables.exceptions.HDF5ExtError. After an initial failure it will try again after 2 seconds and once more after 4 seconds.

Parameters:
  • *args

    args passed through to tables.open_file()

  • mangle_alert_words – (keyword-only) if True, mangle alert words “warning”, “error”, “fatal”, “exception”, “fail” when issuing a logger warning message. Default: True.

  • retry_delay – (keyword-only) initial delay between attempts. default: 2.

  • retry_tries – (keyword-only) the maximum number of attempts. default: 3.

  • retry_backoff – (keyword-only) multiplier applied to delay between attempts. default: 2.

  • retry_logger – (keyword-only) logger.warning(msg) will be called.

  • **kwargs

    additional kwargs passed through to tables.open_file()

Returns:

tables file handle

Setup Helpers

ska_helpers.setup_helper.duplicate_package_info(vals, name_in, name_out)[source]

Duplicate a list or dict of values inplace, replacing name_in with name_out.

Normally used in setup.py for making a namespace package that copies a flat one. For an example see setup.py in the ska_sun or Ska.Sun repo.

Parameters:
  • vals – list or dict of values

  • name_in – string to replace at start of each value

  • name_out – output string

Utilities

class ska_helpers.utils.LRUDict(capacity=128)[source]

Dict that maintains a fixed capacity and evicts least recently used item when full.

Inherits from collections.OrderedDict to maintain the order of insertion.

class ska_helpers.utils.LazyDict(load_func, *args, **kwargs)[source]

Dict which is lazy-initialized using supplied function load_func.

This class allows defining a module-level dict that is expensive to initialize, where the initialization is done lazily (only when actually needed).

Parameters:
load_funcfunction

Reference to a function that returns a dict to init this dict object

*args

Arguments list for load_func

**kwargs

Keyword arguments for load_func

Examples

from ska_helpers.utils import LazyDict

def load_func(a, b):
    # Some expensive function in practice
    print('Here in load_func')
    return {'a': a, 'b': b}

ONE = LazyDict(load_func, 1, 2)

print('ONE is defined but not yet loaded')
print(ONE['a'])
copy() a shallow copy of D
get(key, default=None, /)

Return the value for key if key is in the dictionary, else default.

items() a set-like object providing a view on D's items
keys() a set-like object providing a view on D's keys
pop(k[, d]) v, remove specified key and return the corresponding value.

If the key is not found, return the default if given; otherwise, raise a KeyError.

popitem()

Remove and return a (key, value) pair as a 2-tuple.

Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.

setdefault(key, default=None, /)

Insert key with a value of default if key is not in the dictionary.

Return the value for key if key is in the dictionary, else default.

values() an object providing a view on D's values
class ska_helpers.utils.LazyVal(load_func, *args, **kwargs)[source]

Value which is lazy-initialized using supplied function load_func.

This class allows defining a module-level value that is expensive to initialize, where the initialization is done lazily (only when actually needed).

The lazy value is accessed using the val property.

Parameters:
load_funcfunction

Reference to a function that returns a dict to init this dict object

*args

Arguments list for load_func

**kwargs

Keyword arguments for load_func

Examples

from ska_helpers.utils import LazyVal

def load_func(a):
    # Some expensive function in practice
    print('Here in load_func')
    return a

ONE = LazyVal(load_func, 1)

print('ONE is defined but not yet loaded')
print(ONE.val)
class ska_helpers.utils.TypedDescriptor(*, default=None, required=False, cls=None)[source]

Class to create a descriptor for a dataclass attribute that is cast to a type.

This is a base class for creating a descriptor that can be used to define an attribute on a dataclass that is cast to a specific type. The type is specified by setting the cls class attribute on the descriptor class.

Most commonly cls is a class like CxoTime or Quat, but it could also be a built-in like int or float or any callable function.

This descriptor can be used either as a base class with the cls class attribute set accordingly, or as a descriptor with the cls keyword argument set.

Warning

This descriptor class is recommended for use within a dataclass. In a normal

class the default value must be set to the correct type since it will not be coerced to the correct type automatically.

The default value cannot be list, dict, or set since these are mutable and are disallowed by the dataclass machinery. In most cases a list can be replaced by a tuple and a dict can be replaced by an OrderedDict.

Parameters:
defaultoptional

Default value for the attribute. If specified and not None, it will be coerced to the correct type via cls(default). If not specified, the default for the attribute is None.

requiredbool, optional

If True, the attribute is required to be set explicitly when the object is created. If False the default value is used if the attribute is not set.

Examples

>>> from dataclasses import dataclass
>>> from ska_helpers.utils import TypedDescriptor

Here we make a dataclass with an attribute that is cast to an int.

>>> @dataclass
>>> class SomeClass:
...     int_val: int = TypedDescriptor(required=True, cls=int)
>>> obj = SomeClass(10.5)
>>> obj.int_val
10

Here we define a QuatDescriptor class that can be used repeatedly for any quaternion attribute.

>>> from Quaternion import Quat
>>> class QuatDescriptor(TypedDescriptor):
...     cls = Quat
>>> @dataclass
... class MyClass:
...     att1: Quat = QuatDescriptor(required=True)
...     att2: Quat = QuatDescriptor(default=[10, 20, 30])
...     att3: Quat | None = QuatDescriptor()
...
>>> obj = MyClass(att1=[0, 0, 0, 1])
>>> obj.att1
<Quat q1=0.00000000 q2=0.00000000 q3=0.00000000 q4=1.00000000>
>>> obj.att2.equatorial
array([10., 20., 30.])
>>> obj.att3 is None
True
>>> obj.att3 = [10, 20, 30]
>>> obj.att3.equatorial
array([10., 20., 30.])
ska_helpers.utils.convert_to_int_float_str(val: str) int | float | str[source]

Convert an input string into an int, float, or string.

This tries to convert the input string into an int using the built-in int() function. If that fails then it tries float(), and finally if that fails it returns the original string.

This function is often useful when parsing text representations of structured data where the data types are implicit.

Parameters:
valstr

The input string to convert

Returns:
int, float, or str

The input value as an int, float, or string.

Notes

An input string like “01234” is interpreted as a decimal integer and will be returned as the integer 1234. In some contexts a leading 0 indicates an octal number and to avoid confusion in Python a leading 0 is not allowed in a decimal integer literal.

ska_helpers.utils.lru_cache_timed(maxsize=128, typed=False, timeout=3600)[source]

LRU cache decorator where the cache expires after timeout seconds.

This wraps the functools.lru_cache decorator so that the entire cache gets cleared if the cache is older than timeout seconds.

This is mostly copied from this gist, with no license specified: https://gist.github.com/helix84/05ee246d6c80bc7bacdfa6a62fbff3fa

The cachetools package provides a way to apply the timeout per-item, if that is required.

Parameters:
maxsizeint

functools.lru_cache maxsize parameter

typedbool

functools.lru_cache typed parameter

timeoutint, float

Clear cache after timeout seconds from last clear

ska_helpers.utils.temp_env_var(name, value)[source]

A context manager that temporarily sets an environment variable.

Example:

>>> os.environ.get("MY_VARIABLE")
None
>>> with temp_env_var("MY_VARIABLE", "my_value"):
...     os.environ.get("MY_VARIABLE")
...
'my_value'
>>> os.environ.get("MY_VARIABLE")
None
Parameters:
  • name – str Name of the environment variable to set.

  • value – str Value to set the environment variable to.

Version Info

The ska_helpers.version module provides utilities to handle package versions. The version of a package is determined using importlib if it is installed, and setuptools_scm otherwise.

ska_helpers.version.get_version(package, distribution=None)[source]

Get version string for package with optional distribution name.

If the package is not from an installed distribution then get version from git using setuptools_scm.

Parameters:
package

package name, typically __package__

distribution

name of distribution if different from package (Default value = None)

Returns:
str

Version string

ska_helpers.version.parse_version(version)[source]

Parse version string and return a dictionary with version information. This only handles the default scheme.

Parameters:
version

str

Returns:
dict

version information

Default versioning scheme

What follows is the scheme as described in setuptools_scm’s documentation.

In the standard configuration setuptools_scm takes a look at three things:

  1. latest tag (with a version number)

  2. the distance to this tag (e.g. number of revisions since latest tag)

  3. workdir state (e.g. uncommitted changes since latest tag)

and uses roughly the following logic to render the version:

no distance and clean:

{tag}

distance and clean:

{next_version}.dev{distance}+{scm letter}{revision hash}

no distance and not clean:

{tag}+dYYYMMMDD

distance and not clean:

{next_version}.dev{distance}+{scm letter}{revision hash}.dYYYMMMDD

The next version is calculated by adding 1 to the last numeric component of the tag.

For Git projects, the version relies on git describe, so you will see an additional g prepended to the {revision hash}.

Due to the default behavior it’s necessary to always include a patch version (the 3 in 1.2.3), or else the automatic guessing will increment the wrong part of the SemVer (e.g. tag 2.0 results in 2.1.devX instead of 2.0.1.devX). So please make sure to tag accordingly.

Run time information

The ska_helpers.run_info module provides convenience functions to get and print relevant run time information such as machine name, user name, date, program version, and so on. This is aimed at executable scripts and cron jobs.

ska_helpers.run_info.get_run_info(opt=None, *, version=None, stack_level=1)[source]

Get run time information as dict.

Parameters:
opt

argparse options (Default value = None)

version

program version (default=__version__ in calling module)

stack_level

stack level for getting calling module (Default value = 1)

Returns:
dcit

run information

ska_helpers.run_info.get_run_info_lines(opt=None, *, version=None, stack_level=2)[source]

Get run time information as formatted lines.

Parameters:
opt

argparse options (Default value = None)

version

program version (default=__version__ in calling module)

stack_level

stack level for getting calling module (Default value = 2)

Returns:
list

formatted information lines

ska_helpers.run_info.log_run_info(log_func, opt=None, *, version=None, stack_level=3)[source]

Output run time information as formatted lines via log_func.

Each formatted line is passed to log_func.

Parameters:
log_func

logger output function (e.g. logger.info)

opt

argparse options (Default value = None)

version

program version (default=__version__ in calling module)

stack_level

stack level for getting calling module (Default value = 3)

Indices and tables