datasafe.client module
Client components of the LabInform datasafe.
Clients of the datasafe connect to a server component of the datasafe and are responsible for a series of different tasks:
Deposit and retrieve items (currently: datasets) to and from the datasafe.
Prepare items (currently: datasets) for depositing in the datasafe.
For datasets, this means that a manifest needs to be written. And for a manifest to be written, at least the format of the actual data needs to be provided. In case of info or YAML files as metadata files, the
datasafe.manifest.Manifest
class should be able to auto-detect the format and version of the metadata format, usingdatasafe.manifest.FormatDetector
. See there for details.
Types of clients
Currently, there are two different clients implemented:
-
A client connecting locally directly within Python to a local server.
-
A client connecting via HTTP to a remote server.
The remote server connected by this client needs to have an API conforming to that implemented by
datasafe.server.HTTPServerAPI
.HTTP status codes returned from the server are handled correctly and converted into exceptions.
Furthermore, there is a base class Client
both concrete client
implementations inherit from. This base class deals with all aspects of a
client that can be performed completely local, such as Manifest creation and
check of LOIs for overall validity. Thus, implementing concrete clients is
rather straight-forward.
Working with a client
A database client operates as interface the datasafe. Hence the user (be it a human user or other code) needs not care about where and how data are stored.
The functionality of the datasafe client can be split into two categories:
Tasks entirely local
Tasks interacting with a datasafe server component
Both will briefly be described below.
Local tasks
There is currently one task in this category:
Creating a manifest file
Manifests contain information on the data and metadata and allow the
datasafe to operate independently of special-purpose functions such as
importers for the diverse set of file formats. For details of manifests,
see the datasafe.manifest
module.
Tasks interacting with a datasafe server
The following tasks are currently implemented:
Create a resource in the datasafe
Upload data to a resource in the datasafe
Download data from a resource in the datasafe
Update data of a resource in the datasafe
From the four basic operations of persistent storage, create, read, update,
and delete (CRUD), this covers create (Client.create()
and
Client.upload()
), read (Client.download()
), and update
(Client.update()
). The last, delete, is covered by the storage backend
the server components connect to, but not (yet) exposed by the servers (and
hence not accessible to the clients). The reason for this is that data shall
usually not be deleted in any case, but long-term archived. Therefore,
while there may be legitimate use cases for actually deleting items in the
datasafe, this will most probably be an administrative task not available to
the regular client.
Module documentation
- class datasafe.client.Client[source]
Bases:
object
Client part of the datasafe.
The client interacts with a server to transfer data from and to the datasafe.
This class provides all functionality that is local to the client. The actual connection to the server is done in non-public methods that need to be implemented in concrete classes.
There are currently two concrete client classes available:
-
A client connecting locally directly within Python to a local server.
-
A client connecting via HTTP protocol to an HTTP server.
- loi_parser
Parser for lab object identifiers (LOIs). Used to check a given LOI for complying to certain criteria (and be a valid LOI).
- Type:
- metadata_extensions
File extensions that are regarded as metadata files.
Used when automatically creating manifest files to distinguish between data and metadata files of a dataset.
Default: (‘.info’, ‘.yaml’)
- Type:
- create(loi='')[source]
Create new LOI.
Useful to “reserve” and register a LOI in the datasafe, e.g. at the start of a new measurement.
The storage corresponding to the LOI will be created and the LOI returned if successful. This does, however, not add any data to the datasafe. Therefore, calling
create()
will usually be followed by callingupload()
at some later point. On the other hand, before callingupload()
, you need to callcreate()
to create the new LOI storage space.
- create_manifest(filename='', path='')[source]
Create a manifest file for a given dataset.
Different scenarios for determining which files belong to the dataset and for distinguishing between data and metadata files are:
Neither parameter
filename
norpath
givenAll files of the current directory will be assumed to belong to the dataset.
Parameter
filename
givenOnly files starting with the value of
filename
will be considered. Note that the value is used as pattern.Parameter
path
, but no parameterfilename
givenOnly files in the directory given by
path
will be considered.Both parameters,
filename
andpath
givenOnly files starting with the value of
filename
and located in the directory given bypath
will be considered. Note that the value is used as pattern.
Metadata will be identified by using the
metadata_extensions
attribute of the class. For details see there.Note
As the manifest file has always the same name, it is generally a good idea to have one dataset per directory. Otherwise, only one manifest file (for one dataset) at a time can be created.
Things to decide about and implement:
How to define or detect the file format?
- Parameters:
filename (
str
) –Name of the file(s) belonging to a dataset.
This is taken as pattern and extended with “.*” and used with
glob.glob()
if given.path (
str
) – File system path where to look for files belonging to a dataset
- upload(loi='', filename='', path='')[source]
Upload data belonging to a dataset to the datasafe.
If no manifest file exists, it will automatically be created.
Different scenarios for determining which files belong to the dataset and for distinguishing between data and metadata files are:
Neither parameter
filename
norpath
givenAll files of the current directory will be assumed to belong to the dataset.
Parameter
filename
givenOnly files starting with the value of
filename
will be considered. Note that the value is used as pattern.Parameter
path
, but no parameterfilename
givenOnly files in the directory given by
path
will be considered.Both parameters,
filename
andpath
givenOnly files starting with the value of
filename
and located in the directory given bypath
will be considered. Note that the value is used as pattern.
- Parameters:
loi (
str
) – LOI the data should be uploaded forfilename (
str
) –Name of the file(s) belonging to a dataset.
This is taken as pattern and extended with “.*” and used with
glob.glob()
if given.path (
str
) – File system path where to look for files belonging to a dataset
- Returns:
integrity – dict with fields
data
andall
containing boolean valuesFor details see
datasafe.manifest.Manifest.check_integrity()
.- Return type:
- Raises:
datasafe.loi.MissingLoiError – Raised if no LOI is provided
- download(loi='')[source]
Download data from the datasafe.
The LOI is checked for belonging to the datasafe. Further checks will be done on the server side, resulting in exceptions raised if there are some problems.
Upon successful download data are checked for integrity and in case of possible data or metadata corruption a warning is issued. Take care of handling this warning downstream accordingly.
- Parameters:
loi (
str
) – LOI the data should be downloaded for- Returns:
download_dir – Directory the data obtained from the datasafe have been saved to
- Return type:
- Warns:
UserWarning – Issued if the consistency check fails, i.e. data or metadata may be corrupted
- Raises:
datasafe.loi.MissingLoiError – Raised if no LOI is provided
- update(loi='', filename='', path='')[source]
Update data belonging to a dataset to the datasafe.
If no manifest file exists, it will automatically be created.
Different scenarios for determining which files belong to the dataset and for distinguishing between data and metadata files are:
Neither parameter
filename
norpath
givenAll files of the current directory will be assumed to belong to the dataset.
Parameter
filename
givenOnly files starting with the value of
filename
will be considered. Note that the value is used as pattern.Parameter
path
, but no parameterfilename
givenOnly files in the directory given by
path
will be considered.Both parameters,
filename
andpath
givenOnly files starting with the value of
filename
and located in the directory given bypath
will be considered. Note that the value is used as pattern.
- Parameters:
loi (
str
) – LOI the data should be updatedfilename (
str
) –Name of the file(s) belonging to a dataset.
This is taken as pattern and extended with “.*” and used with
glob.glob()
if given.path (
str
) – File system path where to look for files belonging to a dataset
- Returns:
integrity – dict with fields
data
andall
containing boolean valuesFor details see
datasafe.manifest.Manifest.check_integrity()
.- Return type:
- Raises:
datasafe.loi.MissingLoiError – Raised if no LOI is provided
-
- class datasafe.client.LocalClient[source]
Bases:
Client
Client connecting locally directly within Python to a local server.
- server
Datasafe server component to talk to. The server itself will communicate with a backend to do the actual storage.
- Type:
- class datasafe.client.HTTPClient[source]
Bases:
Client
Client connecting via HTTP to a remote server.
The remote server connected by this client needs to have an API conforming to that implemented by
datasafe.server.HTTPServerAPI
.HTTP status codes returned from the server are handled correctly and converted into exceptions.
- server_url
URL of a datasafe HTTP server to connect to.
Default: ‘http://127.0.0.1:5000/’
- Type:
- url_prefix
Prefix of the URLs appended to
server_url
Default: ‘api/’
- Type: