datasafe.server module

Server components of the LabInform datasafe.

Different server components can be distinguished:

  • user-facing components (frontends)

  • storage components (backends)

Note that “user” is a broad term here, meaning any person and program accessing the datasafe. In this respect, the clients contained in datasafe.client are users as well.

The backend components deal with the actual storage of data (in the file system) and the access to them.

Frontends

Frontends allow a “user” (mostly another program) to access the datasafe, without needing any details of how the data are actually stored.

Currently, there are two frontends implemented, that have different use cases:

  • Server

    General frontend that can be used locally with datasafe.client.LocalClient.

  • HTTPServerAPI

    API for the HTTP server running via flask.

    HTTP frontend that can be used via HTTP, e.g. using the datasafe.client.HTTPClient class. Using HTTP, this allows generally to completely separate client and server in terms of their locations and access data even remotely. However, keep in mind that remote access comes with security implications that are currently not dealt with.

    The actual HTTP server is created with the function create_http_server(), but the API class is the interesting part here.

Backends

Backends deal with actually storing the data.

Currently, there is only one backend implemented:

Things to decide

Some things that need to be decided about:

  • Where to store configuration?

    At least the base directory for the datasafe needs to be defined in some way.

    Other configuration values could be the issuer (number after the “42.” of a LOI)

Perhaps one could store the configuration in a separate configuration class to start with and see how this goes…

Module documentation

class datasafe.server.Server[source]

Bases: object

Server part of the datasafe.

The server interacts with the storage backend to store and retrieve contents and provides the user interface.

It retrieves datasets, stores them and should check, whether its content is complete and not compromised.

The transfer occurs as bytes of the zipped dataset that is received by the server, decoded, unzipped, and archived into the correct directory.

storage
Type:

StorageBackend

loi
Type:

datasafe.loi.Parser

new(loi='')[source]

Create new LOI.

The storage corresponding to the LOI will be created and the LOI returned if successful. This does, however, not add any data to the datasafe. Therefore, calling new() will usually be followed by calling upload() at some later point. On the other hand, before calling upload(), you need to call new() to create the new LOI storage space.

Parameters:

loi (str) – LOI for which the resource should be created

Returns:

loi – LOI the resource has been created for

Return type:

str

Raises:
  • datasafe.exceptions.MissingLoiError – Raised if no LOI is provided

  • datasafe.exceptions.InvalidLoiError – Raised if LOI is not valid (for the given operation)

upload(loi='', content=None)[source]

Upload data to the datasafe.

Data are upload as bytes of the zipped content (dataset).

Parameters:
  • loi (str) – LOI the storage should be created for

  • content (bytes) – byte representation of a ZIP archive containing the contents to be stored via the backend

Returns:

integrity – dict with fields data and all containing boolean values

For details see datasafe.manifest.Manifest.check_integrity().

Return type:

dict

Raises:
  • datasafe.exceptions.MissingLoiError – Raised if no LOI is provided

  • datasafe.exceptions.LoiNotFoundError – Raised if resource corresponding to LOI does not exist

  • datasafe.exceptions.ExistingFileError – Raised if resource corresponding to LOI is not empty

download(loi='')[source]

Download data from the datasafe.

Parameters:

loi (str) – LOI the data should be downloaded for

Returns:

content – byte representation of a ZIP archive containing the contents of the directory corresponding to path

Return type:

bytes

Raises:
  • datasafe.exceptions.MissingLoiError – Raised if no LOI is provided

  • datasafe.exceptions.LoiNotFoundError – Raised if resource corresponding to LOI cannot be found

  • datasafe.exceptions.MissingContentError – Raised if resource corresponding to LOI has no content

update(loi='', content=None)[source]

Update data in the datasafe.

Data are upload as bytes of the zipped content (dataset).

Parameters:
  • loi (str) – LOI the resource should be updated for

  • content (bytes) – byte representation of a ZIP archive containing the contents to be updated via the backend

Returns:

integrity – dict with fields data and all containing boolean values

For details see datasafe.manifest.Manifest.check_integrity().

Return type:

dict

Raises:
  • datasafe.exceptions.MissingLoiError – Raised if no LOI is provided

  • datasafe.exceptions.LoiNotFoundError – Raised if resource corresponding to LOI does not exist

  • datasafe.exceptions.NoFileError – Raised if resource corresponding to LOI is not empty

class datasafe.server.StorageBackend[source]

Bases: object

File system backend for the datasafe, actually handling directories.

The storage backend does not care at all about LOIs, but only operates on paths within the file system. As far as datasets are concerned, the backend requires a manifest file to accompany each dataset. However, it does not create such file. Furthermore, data are deposited (using deposit()) and retrieved (using retrieve()) as streams containing the contents of ZIP archives.

root_directory

base directory for the datasafe

Type:

str

manifest_filename

name of manifest file

Type:

str

working_path(path='')[source]

Full path to working directory in datasafe

Returns:

working_path – full path to work on

Return type:

str

create(path='')[source]

Create directory for given path.

Parameters:

path (str) – path to create directory for

Raises:

datasafe.exceptions.MissingPathError – Raised if no path is provided

exists(path='')[source]

Check whether given path exists

Parameters:

path (str) – path to check

isempty(path='')[source]

Check whether directory corresponding to path is empty

Parameters:

path (str) – path to check

Returns:

result – Returns true if directory corresponding to path is empty.

Return type:

bool

Raises:

datasafe.exceptions.NoFileError – Raised if no path is provided

remove(path='', force=False)[source]

Remove directory corresponding to path.

Usually, non-empty directories will not be removed but raise an OSError exception.

Parameters:
  • path (str) – path that should be removed

  • force (bool) –

    set to True when non-empty directory should be removed

    default: False

Raises:

OSError – Raised if a non-empty directory should be removed and force is set to False

get_highest_id(path='')[source]

Get number of subdirectory corresponding to path with highest number

Return last element of a sorted list of directory contents, assuming the directory to only contain subdirectories with numeric IDs.

In case there is no numeric ID yet in the directory, it returns 0.

Todo

Handle directories whose names are not convertible to integers

Parameters:

path (str) – path to get subdirectory with highest number for

Returns:

id – subdirectory with the highest number in the directory corresponding to path

Return type:

int

create_next_id(path='')[source]

Create next subdirectory in directory corresponding to path

Parameters:

path (str) – path the subdirectory should be created in

deposit(path='', content=None)[source]

Deposit data provided as content in directory corresponding to path.

Content is the byte representation of a ZIP archive containing the actual content. This byte representation is saved in a temporary file and afterwards unpacked in the directory corresponding to path.

After depositing the content (including unzipping), the checksums in the manifest are checked for consistency with newly generated checksums, and in case of inconsistencies, an exception is raised.

Parameters:
  • path (str) – path to deposit content to

  • content (bytes) – byte representation of a ZIP archive containing the contents to be extracted in the directory corresponding to path

Returns:

integrity – dict with fields data and all containing boolean values

For details see datasafe.manifest.Manifest.check_integrity().

Return type:

dict

Raises:
  • datasafe.exceptions.MissingPathError – Raised if no path is provided

  • datasafe.exceptions.MissingContentError – Raised if no content is provided

retrieve(path='')[source]

Obtain data from directory corresponding to path

The data are compressed as ZIP archive and the contents of the ZIP file is returned as bytes.

Parameters:

path (str) – path the data should be retrieved for

Returns:

content – byte representation of a ZIP archive containing the contents of the directory corresponding to path

Return type:

bytes

Raises:
  • datasafe.directory.MissingPathError – Raised if no path is provided

  • OSError – Raised if path does not exist

get_manifest(path='')[source]

Retrieve manifest of a dataset stored in path.

Parameters:

path (str) – path to the dataset the manifest should be retrieved for

Returns:

content – contents of the manifest file

Return type:

str

get_index()[source]

Return list of paths to datasets

Such a list of paths to datasets is pretty useful if one intends to check locally for existing LOIs (corresponding to paths in the datasafe).

If a path has been created already, but no data yet saved in there, as may happen during an experiment to reserve the corresponding LOI, this path will nevertheless be included.

Returns:

paths – list of paths to datasets

Return type:

list

check_integrity(path='')[source]

Check integrity of dataset, comparing stored with generated checksums.

To check the integrity of a dataset, the checksums stored within the manifest file will be compared to newly generated checksums over data and metadata together as well as over data alone.

Parameters:

path (str) – path to the dataset the integrity should be checked for

Returns:

integrity – dict with fields data and all containing boolean values

Return type:

dict

datasafe.server.create_http_server(test_config=None)[source]

Create a HTTP server for accessing the datasafe.

Parameters:

test_config (dict) – Configuration for HTTP server

Returns:

app – WSGI application created via flask

Return type:

flask.Flask

class datasafe.server.HTTPServerAPI[source]

Bases: MethodView

API view used in the HTTP server.

The actual server is created via create_http_server() and operates via flask. This API view provides the actual API functionality to access the datasafe and its underlying storage backend via HTTP.

The API provides methods for the HTTP methods, currently GET, POST, PUT, and PATCH.

Furthermore, exceptions are converted into the appropriate HTTP status codes and the message of the exception is contained in the response body. Thus, clients such as datasafe.client.HTTPClient can convert the HTTP status codes back into Python exceptions.

server

Server backend that communicates with the storage backend.

Type:

datasafe.server.Server

get(loi='')[source]

Handle get requests.

The following responses are currently returned, depending on the status the request resulted in:

Status

Code

data

success

200

dataset contents (ZIP archive)

no data

204

message

not found

404

error message

invalid

404

error message

The status “no data” results from querying a LOI that has been created (using POST), but no data uploaded to so far.

The status “invalid” differs from “not found” in that the LOI requested is invalid.

Parameters:

loi (str) – LOI of get request

Returns:

response – Response object

Return type:

flask.Response

post(loi='')[source]

Handle POST requests.

A POST request will only create a new empty resource connected to the LOI, but never upload data. For uploading, use put. While this may seem like not conforming to the typical usage of POST requests, the reason is simple: post() returns the newly created LOI, while put() returns the JSON representation of the integrity check dict. Hence, to be able to check that the data have been successfully arrived at the datasafe storage backend, it is essential to separate POST and PUT requests.

The following responses are currently returned, depending on the status the request resulted in:

Status

Code

data

created

201

newly created LOI

invalid

404

error message

Parameters:

loi (str) – LOI of post request

Returns:

response – Response object

Return type:

class:flask.Response

put(loi='')[source]

Handle PUT requests.

PUT requests are used to transfer data to an existing resource of the datasafe. To create a new resource, use post() beforehand. If data exist already at the resource, this will result in an error (status code 405, see table below).

The following responses are currently returned, depending on the status the request resulted in:

Status

Code

data

success

200

JSON representation of integrity check dict

does not exist

400

error message

missing content

400

error message

invalid

404

error message

existing content

405

error message

The status “does not exist” refers to the LOI the data should be put to not existing (in this case, you need to first create it using PUSH). Therefore, in this particular case, status code 400 instead of 404 (“not found”) is returned.

The status “missing content” refers to the request missing data.

The status “existing content” refers to data already present at the storage referred to with the LOI. As generally, you could update the content using another method, a status code 405 (“method not allowed”) is returned in this case.

Parameters:

loi (str) – LOI of put request

Returns:

response – Response object

Return type:

class:flask.Response

patch(loi='')[source]

Handle PATCH requests.

PATCH requests are used to update data at an existing resource of the datasafe. To upload new data to an existing resource, use put(). If no data exist at the resource, this will result in an error (status code 405, see table below).

The following responses are currently returned, depending on the status the request resulted in:

Status

Code

data

success

200

JSON representation of integrity check dict

does not exist

400

error message

missing content

400

error message

invalid

404

error message

no resource content

405

error message

The status “does not exist” refers to the LOI the data should be put to not existing (in this case, you need to first create it using PUSH). Therefore, in this particular case, status code 400 instead of 404 (“not found”) is returned.

The status “missing content” refers to the request missing data.

The status “no resource content” refers to no data present at the storage referred to with the LOI. As generally, you could upload new content using another method, a status code 405 (“method not allowed”) is returned in this case.

Parameters:

loi (str) – LOI of put request

Returns:

response – Response object

Return type:

class:flask.Response

methods: t.ClassVar[t.Collection[str] | None] = {'GET', 'PATCH', 'POST', 'PUT'}

The methods this view is registered for. Uses the same default (["GET", "HEAD", "OPTIONS"]) as route and add_url_rule by default.