ska_numpy

Provide useful utilities for numpy.

ska_numpy.Numpy.add_column(recarray, name, val, index=None)[source]

Add a column name with value val to recarray and return a new record array.

Parameters
  • recarray – Input record array

  • name – Name of the new column

  • val – Value of the new column (np.array or list)

  • index – Add column before index (default: append at end)

Return type

New record array with column appended

ska_numpy.Numpy.compress(recarray, delta=None, indexcol=None, diff=None, avg=None, colnames=None)[source]

Compress recarray rows into intervals where adjacent rows are similar.

In addition to the original column names, the output recarray will have these columns:

<indexcol>_start

start value of the indexcol column.

<indexcol>_stop

stop value of the indexcol column (inclusive up to the next interval).

samples

number of samples in interval

If indexcol is None (default) then the table row index will be used and the output columns will be row_start and row_stop.

delta is a dict mapping column names to a delta value defining whether a column is sufficiently different to break the interval. These are used when generating the default diff functions for numerical columns (i.e. those for which abs(x) succeeds).

diff is a dict mapping column names to functions that take as input two values and return a boolean indicating whether the values are sufficiently different to break the interval. Default diff functions will be generated if diff is None or for columns without an entry.

avg is a dict mapping column names to functions that calculate the average of a numpy array of values for that column. Default avg functions will be generated if avg is None or for columns without an entry.

Example:

a = ((1, 2, 'hello', 2.),
     (1, 4, 'hello', 3.),
     (1, 2, 'hello', 4.),
     (1, 2, 'hi there', 5.),
     (1, 2, 'hello', 6.),
     (3, 2, 'hello', 7.),
     (1, 2, 'hello', 8.),
     (2, 2, 'hello', 9.))
arec = numpy.rec.fromrecords(a, names=('col1','col2','greet','time'))
acomp = compress(arec, indexcol='time', delta={'col1':1.5})
Parameters
  • delta – dict of delta thresholds defining when to break interval

  • indexcol – name of column to report start and stop values for interval.

  • diff – dict of functions defining the diff of 2 vals for that column name.

  • avg – dict of functions defining the average value for that column name.

  • colnames – list of column names to include (default = all).

Return type

record array of compressed values

ska_numpy.Numpy.filter(recarray, filters)[source]

Apply the list of filters to the numpy record array recarray and return the filtered recarray. See L{match} for description of the filter syntax.

Parameters
  • recarray – Input numpy record array

  • filters – List of filters

Return type

Filtered record array

ska_numpy.Numpy.interpolate(yin, xin, xout, method='linear', sorted=False, cython=True)[source]

Interpolate the curve defined by (xin, yin) at points xout. The array xin must be monotonically increasing. The output has the same data type as the input yin.

Parameters
  • yin – y values of input curve

  • xin – x values of input curve

  • xout – x values of output interpolated curve

  • method – interpolation method (‘linear’ | ‘nearest’)

  • sortedxout values are sorted so use search_both_sorted

  • cython – use Cython interpolation code if possible (default=True)

@:rtype: numpy array with interpolated curve

ska_numpy.Numpy.match(recarray, filters)[source]

Apply the list of filters to the numpy record array recarray and return the corresponding boolean mask array.

Each filter is a string with a simple boolean comparison of the form:

colname op value

where colname is a column name in recarray, op is an operator (e.g. == or < or >= etc), and value is a value. String values can optionally be enclosed in single or double quotes.

The pseudo-column name ‘_row_’ can be used to filter on the row number.

Parameters
  • recarray – Input numpy record array

  • filters – List of filters or string with one filter

Return type

list of strings

ska_numpy.Numpy.pformat(recarray, fmt=None)[source]

Light wrapper around ska_numpy.pprint to return a string instead of printing to a file.

Parameters
  • recarray – input record array

  • fmt – dict of format specifiers (optional)

Return type

string

ska_numpy.Numpy.pprint(recarray, fmt=None, out=<_io.TextIOWrapper name='<stdout>' mode='w' encoding='utf-8'>)[source]

Print a nicely-formatted version of recarray to out file-like object. If fmt is provided it should be a dict of colname:fmt_spec pairs where fmt_spec is a format specifier (e.g. ‘%5.2f’).

Parameters
  • recarray – input record array

  • fmt – dict of format specifiers (optional)

  • out – output file-like object

Return type

None

ska_numpy.Numpy.search_both_sorted(a, v)[source]

Find indices where elements should be inserted to maintain order.

Find the indices into a sorted float array a such that, if the corresponding elements in float array v were inserted before the indices, the order of a would be preserved.

Similar to np.searchsorted but BOTH a and v must be sorted in ascending order. If len(v) < len(a) / 100 then the normal np.searchsorted is called. Otherwise both v and a are cast to np.float64 internally and a Cython function is called to compute the indices in a fast way.

Parameters
  • a – input float array, sorted in ascending order

  • v – float values to insert into a, sorted in ascending order

Returns

indices as int np.array

ska_numpy.Numpy.smooth(x, window_len=10, window='hanning')[source]

Smooth the data using a window with requested size.

This method is based on the convolution of a scaled window with the signal. The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimized in the begining and end part of the output signal.

Example:

t = linspace(-2, 2, 50)
y = sin(t) + randn(len(t)) * 0.1
ys = ska_numpy.smooth(y)
plot(t, y, t, ys)

See also:

numpy.hanning, numpy.hamming, numpy.bartlett, numpy.blackman, numpy.convolve
scipy.signal.lfilter
Parameters
  • x – input signal

  • window_len – dimension of the smoothing window

  • window – type of window (‘flat’, ‘hanning’, ‘hamming’, ‘bartlett’, ‘blackman’)

Return type

smoothed signal

ska_numpy.Numpy.structured_array(vals, colnames=None)[source]

Create a numpy structured array (ndarray) given a dict of numpy arrays. The arrays can be multidimensional but must all have the same length (same size of the first dimension).

Parameters
  • vals – dict of numpy ndarrays

  • colnames – column names (default=sorted vals keys)