traitschema

https://travis-ci.org/mivade/traitschema.svg?branch=master Documentation Status https://codecov.io/gh/mivade/traitschema/branch/master/graph/badge.svg PyPI https://img.shields.io/github/release/mivade/traitschema.svg

Create serializable, type-checked schema using traits and Numpy. A typical use case involves saving several Numpy arrays of varying shape and type.

Defining schema

Note

The following assumes a basic familiarity with the traits package. See its documentation for details.

In order to be able to properly serialize data, non-scalar traits should be declared as a traits.api.Array type. Example:

import numpy as np
from traits.api import Array, String
from traitschema import Schema

class NamedMatrix(Schema):
    name = String()
    data = Array(dtype=np.float64)

matrix = NamedMatrix(name="name", data=np.random.random((8, 8)))

For other demos, see the demos directory.

Saving and loading

Data can be stored in the following formats:

  • HDF5 via h5py
  • JSON via the standard library json module
  • Numpy npz format

Multiple schema can be saved at once to a zip file via traitschema.bundle_schema and loaded with traitschema.load_bundle.

Reference

class traitschema.Schema(**kwargs)[source]

Extension to HasTraits to add methods for automatically saving and loading typed data.

Examples

Create a new data class:

import numpy as np
from traits.api import Array
from traitschema import Schema

class Matrix(Schema):
    data = Array(dtype=np.float64)

matrix = Matrix(data=np.random.random((8, 8)))

Serialize to HDF5 using h5py:

matrix.to_hdf("out.h5")

Load from HDF5:

matrix_copy = Matrix.from_hdf("out.h5")
classmethod from_hdf(filename, decode_string_arrays=True, encoding='utf-8')[source]

Deserialize from HDF5 using h5py.

Parameters:
  • filename (str) –
  • decode_string_arrays (bool) – Arrays of bytes should be decoded into strings
  • encoding (str) – Encoding scheme to use for decoding
Returns:

Return type:

Deserialized instance

classmethod from_json(data)[source]

Deserialize from a JSON string or file.

Parameters:data (str or file-like) –
Returns:
Return type:Deserialized instance
classmethod from_npz(filename)[source]

Load data from numpy’s npz format.

Parameters:filename (str) –
classmethod load(filename)[source]

Counterpart to save().

save(filename)[source]

Serialize using the type determined by the file extension.

Parameters:filename (str) – Full output path.

Notes

Only default saving options are used, so this method is less flexible than using the to_xyz methods instead.

to_dict()[source]

Return all visible traits as a dictionary.

to_hdf(filename, mode='w', compression=None, compression_opts=None, encode_string_arrays=True, encoding='utf8')[source]

Serialize to HDF5 using h5py.

Parameters:
  • filename (str) – Path to save HDF5 file to.
  • mode (str) – Default: 'w'
  • compression (str or None) – Compression to use with arrays (see h5py documentation for valid choices).
  • compression_opts (int or None) – Compression options, generally a number specifying compression level (see h5py documentation for details).
  • encode_string_arrays (bool) – When True, force encoding of arrays of unicode strings using the encoding keyword argument. Not setting this will result in errors if using arrays of unicode strings. Default: True.
  • encoding (str) – Encoding to use when forcing encoding of unicode string arrays. Default: 'utf8'.

Notes

Each stored dataset will also have a desc attribute which uses the desc attribute of each trait.

The root node also has attributes:

  • classname - the class name of the instance being serialized
  • python_module - the Python module in which the class is defined
to_json(json_kwargs={})[source]

Serialize to JSON.

Parameters:json_kwargs (dict) – Keyword arguments to pass to json.dumps().
Returns:
Return type:JSON string.

Notes

This uses a custom JSON encoder to handle numpy arrays but could conceivably lose precision. If this is important, please consider serializing in HDF5 format instead. As a consequence of using a custom encoder, the cls keyword arugment, if passed, will be ignored.

to_npz(filename, compress=False)[source]

Save in numpy’s npz archive format.

Parameters:
  • filename (str) –
  • compress (bool) – Save as a compressed archive (default: False)

Notes

To ensure loading of scalar values works as expected, casting traits should be used (e.g., CStr instead of String or Str). See the traits documentation for details.

exception traitschema.io.UnsupportedArchiveFormat[source]

Raised when a file extension doesn’t match up with a supported archive format.

traitschema.io.bundle_schema(outfile, schema, format='npz')[source]

Bundle several Schema objects into a single archive.

Parameters:
  • outfile (str) – Output bundle filename. Only zip archives are supported.
  • schema (Dict[str, Schema]) – Dictionary of Schema objects to bundle together. Keys are names to give each schema and are used when loading a bundle.
  • format (str) – Format to save individual schema as (default: 'npz').

Notes

Default options are used with all saving functions (e.g., no compression is used for individual serialized schema).

traitschema.io.load_bundle(filename)[source]

Loads a bundle of schema saved with bundle_schema().

Parameters:filename (str) – Path to bundled schema archive.
Returns:schema – A dictionary of stored schema where the keys are the keys used when bundling. Additionally, a __meta__ key will contain other info that was stored when saved (e.g., bundling format version number).
Return type:dict