datasafe.client module

Client components of the LabInform datasafe.

Clients of the datasafe connect to a server component of the datasafe and are responsible for a series of different tasks:

  • Deposit and retrieve items (currently: datasets) to and from the datasafe.

  • Prepare items (currently: datasets) for depositing in the datasafe.

    For datasets, this means that a manifest needs to be written. And for a manifest to be written, at least the format of the actual data needs to be provided. In case of info or YAML files as metadata files, the datasafe.manifest.Manifest class should be able to auto-detect the format and version of the metadata format, using datasafe.manifest.FormatDetector. See there for details.

Types of clients

Currently, there are two different clients implemented:

  • LocalClient

    A client connecting locally directly within Python to a local server.

  • HTTPClient

    A client connecting via HTTP to a remote server.

    The remote server connected by this client needs to have an API conforming to that implemented by datasafe.server.HTTPServerAPI.

    HTTP status codes returned from the server are handled correctly and converted into exceptions.

Furthermore, there is a base class Client both concrete client implementations inherit from. This base class deals with all aspects of a client that can be performed completely local, such as Manifest creation and check of LOIs for overall validity. Thus, implementing concrete clients is rather straight-forward.

Working with a client

A database client operates as interface the datasafe. Hence the user (be it a human user or other code) needs not care about where and how data are stored.

The functionality of the datasafe client can be split into two categories:

  • Tasks entirely local

  • Tasks interacting with a datasafe server component

Both will briefly be described below.

Local tasks

There is currently one task in this category:

Manifests contain information on the data and metadata and allow the datasafe to operate independently of special-purpose functions such as importers for the diverse set of file formats. For details of manifests, see the datasafe.manifest module.

Tasks interacting with a datasafe server

The following tasks are currently implemented:

From the four basic operations of persistent storage, create, read, update, and delete (CRUD), this covers create (Client.create() and Client.upload()), read (Client.download()), and update (Client.update()). The last, delete, is covered by the storage backend the server components connect to, but not (yet) exposed by the servers (and hence not accessible to the clients). The reason for this is that data shall usually not be deleted in any case, but long-term archived. Therefore, while there may be legitimate use cases for actually deleting items in the datasafe, this will most probably be an administrative task not available to the regular client.

Module documentation

class datasafe.client.Client[source]

Bases: object

Client part of the datasafe.

The client interacts with a server to transfer data from and to the datasafe.

This class provides all functionality that is local to the client. The actual connection to the server is done in non-public methods that need to be implemented in concrete classes.

There are currently two concrete client classes available:

  • LocalClient

    A client connecting locally directly within Python to a local server.

  • HTTPClient

    A client connecting via HTTP protocol to an HTTP server.

loi_parser

Parser for lab object identifiers (LOIs). Used to check a given LOI for complying to certain criteria (and be a valid LOI).

Type:

datasafe.loi.Parser

metadata_extensions

File extensions that are regarded as metadata files.

Used when automatically creating manifest files to distinguish between data and metadata files of a dataset.

Default: (‘.info’, ‘.yaml’)

Type:

tuple

create(loi='')[source]

Create new LOI.

Useful to “reserve” and register a LOI in the datasafe, e.g. at the start of a new measurement.

The storage corresponding to the LOI will be created and the LOI returned if successful. This does, however, not add any data to the datasafe. Therefore, calling create() will usually be followed by calling upload() at some later point. On the other hand, before calling upload(), you need to call create() to create the new LOI storage space.

Parameters:

loi (str) – LOI the storage should be created for

Returns:

loi – LOI the storage has been created for

Return type:

str

Raises:
  • datasafe.loi.MissingLoiError – Raised if no LOI is provided

  • datasafe.loi.InvalidLoiError – Raised if LOI is not valid (for the given operation)

create_manifest(filename='', path='')[source]

Create a manifest file for a given dataset.

Different scenarios for determining which files belong to the dataset and for distinguishing between data and metadata files are:

  • Neither parameter filename nor path given

    All files of the current directory will be assumed to belong to the dataset.

  • Parameter filename given

    Only files starting with the value of filename will be considered. Note that the value is used as pattern.

  • Parameter path, but no parameter filename given

    Only files in the directory given by path will be considered.

  • Both parameters, filename and path given

    Only files starting with the value of filename and located in the directory given by path will be considered. Note that the value is used as pattern.

Metadata will be identified by using the metadata_extensions attribute of the class. For details see there.

Note

As the manifest file has always the same name, it is generally a good idea to have one dataset per directory. Otherwise, only one manifest file (for one dataset) at a time can be created.

Things to decide about and implement:

  • How to define or detect the file format?

Parameters:
  • filename (str) –

    Name of the file(s) belonging to a dataset.

    This is taken as pattern and extended with “.*” and used with glob.glob() if given.

  • path (str) – File system path where to look for files belonging to a dataset

upload(loi='', filename='', path='')[source]

Upload data belonging to a dataset to the datasafe.

If no manifest file exists, it will automatically be created.

Different scenarios for determining which files belong to the dataset and for distinguishing between data and metadata files are:

  • Neither parameter filename nor path given

    All files of the current directory will be assumed to belong to the dataset.

  • Parameter filename given

    Only files starting with the value of filename will be considered. Note that the value is used as pattern.

  • Parameter path, but no parameter filename given

    Only files in the directory given by path will be considered.

  • Both parameters, filename and path given

    Only files starting with the value of filename and located in the directory given by path will be considered. Note that the value is used as pattern.

Parameters:
  • loi (str) – LOI the data should be uploaded for

  • filename (str) –

    Name of the file(s) belonging to a dataset.

    This is taken as pattern and extended with “.*” and used with glob.glob() if given.

  • path (str) – File system path where to look for files belonging to a dataset

Returns:

integrity – dict with fields data and all containing boolean values

For details see datasafe.manifest.Manifest.check_integrity().

Return type:

dict

Raises:

datasafe.loi.MissingLoiError – Raised if no LOI is provided

download(loi='')[source]

Download data from the datasafe.

The LOI is checked for belonging to the datasafe. Further checks will be done on the server side, resulting in exceptions raised if there are some problems.

Upon successful download data are checked for integrity and in case of possible data or metadata corruption a warning is issued. Take care of handling this warning downstream accordingly.

Parameters:

loi (str) – LOI the data should be downloaded for

Returns:

download_dir – Directory the data obtained from the datasafe have been saved to

Return type:

str

Warns:

UserWarning – Issued if the consistency check fails, i.e. data or metadata may be corrupted

Raises:

datasafe.loi.MissingLoiError – Raised if no LOI is provided

update(loi='', filename='', path='')[source]

Update data belonging to a dataset to the datasafe.

If no manifest file exists, it will automatically be created.

Different scenarios for determining which files belong to the dataset and for distinguishing between data and metadata files are:

  • Neither parameter filename nor path given

    All files of the current directory will be assumed to belong to the dataset.

  • Parameter filename given

    Only files starting with the value of filename will be considered. Note that the value is used as pattern.

  • Parameter path, but no parameter filename given

    Only files in the directory given by path will be considered.

  • Both parameters, filename and path given

    Only files starting with the value of filename and located in the directory given by path will be considered. Note that the value is used as pattern.

Parameters:
  • loi (str) – LOI the data should be updated

  • filename (str) –

    Name of the file(s) belonging to a dataset.

    This is taken as pattern and extended with “.*” and used with glob.glob() if given.

  • path (str) – File system path where to look for files belonging to a dataset

Returns:

integrity – dict with fields data and all containing boolean values

For details see datasafe.manifest.Manifest.check_integrity().

Return type:

dict

Raises:

datasafe.loi.MissingLoiError – Raised if no LOI is provided

class datasafe.client.LocalClient[source]

Bases: Client

Client connecting locally directly within Python to a local server.

server

Datasafe server component to talk to. The server itself will communicate with a backend to do the actual storage.

Type:

datasafe.server.Server

class datasafe.client.HTTPClient[source]

Bases: Client

Client connecting via HTTP to a remote server.

The remote server connected by this client needs to have an API conforming to that implemented by datasafe.server.HTTPServerAPI.

HTTP status codes returned from the server are handled correctly and converted into exceptions.

server_url

URL of a datasafe HTTP server to connect to.

Default: ‘http://127.0.0.1:5000/

Type:

str

url_prefix

Prefix of the URLs appended to server_url

Default: ‘api/’

Type:

str