Assemblies

Assemblies are the key element of Pinyto as they empower the user to do with her data as she likes. Users can write their own assemblies but they also can install assemblies from other users. If those other users update their assembly the user automatically uses the new version without having to change anything.

Warning

It is crucial that users can trust the authors of the assemblies they have installed. They can check the sourcecode of the all assemblies as they install them but if an update changes the assembly the user gets no notification and the changed assembly may leak or delete data.

Users can prevent that by forking assemblies of other users which makes the assembly in the state as it is one of their own. By doing that their assembly does not change if the original author changes her assembly and the user can be sure that no harmful code is executed. This method has the downside that updates with bugfixes or new features do not get installed automatically. The user may delete the forked assembly in case of a good update and fork the original again. The usability of this procedure could be improved in future versions.

Installing assemblies

If you run your own Pinyto server there are two ways to install new assemblies. The first possibility is to store them in your database. You may want to create user accounts to have telling names for the assemblies. For example for the assembly pinyto/Todo a user with the name “pinyto” must be present.

If you check out a new version of Pinyto there might be new default assemblies in the api module. With this version come three bundled assemblies:

  • pinyto/DocumentsAdmin is used for the backoffice to let you browse all your documents there.
  • pinyto/Todo is the assembly for the example app for Pinyto which is a simple TODO-list.
  • bborsalino/Librarian is an assembly for a app used to manage the books in your flat. This assembly uses a job which completes incomplete data for books by asking a publicly available database. This could be a good example for your next assembly because the TODO-app does not need a job.

The assemblies in the api module provide data migrations which insert the user and the assembly itself into the database.

Normally the code of every assembly gets executed in a seccomp secured sandbox. Using the sandbox might decrease the performance of the assembly execution. So Pinyto gives administrators the opportunity to install assemblies as Django apps inside the api module. If an assembly is called and a directly installed assembly in the api module is found the code from there gets executed without a sandbox. In order to have this working an assembly with the same name has to exist in the database.

Warning

A bad admin could trick users in thinking that an assembly is not harmful by having different code saved in the database than in the Django app in api. We considered this a minor threat because if your admin wants to harm you she could do all sorts of bad things to your data. It is certainly best to have an admin you can trust. If you do not trust your admin become your own admin on your own server.

If you write an assembly inside of api make sure to insert the correct code of the assembly in the data migration.

For users there is no visible difference between assemblies executed in the sandbox and assemblies executed directly. If you have benchmarks showing how big the difference is please share them.

Calling Assemblies

Assemblies are called with the api_call view in the api_prototype module.

api_call checks in the api module if there is a directly executable version of the assembly. If there is none load_api is called.

Executing Jobs

API calls can save documents of different type. The “type”: “job” is special as it is the type of a document describing a scheduled job. A job scheduled this way gets executed immediately after the request saving the document is finished. The scheduling document must have the following structure:

  • “type”: “job”
  • “data”: A dictionary containing the following attributes and data:
    • “assembly_user”: The username of the author of the assembly.
    • “assembly_name”: The name of the assembly.
    • “job_name”: The name of the job.

After each finished request Django calls check_for_jobs.

The Sandbox

The Pinyto sandbox is used if safely_exec is called.

safely_exec starts a process and executes sandbox there.

The process executing sandbox creates a new instance of SecureHost. The initialization of this class forks the sandbox process into two parts:

  1. The host process which has access to the database the request and the services.
  2. The child which has only a pipe to communicate to the host process. All open file descriptors in the child process get closed and the database, request and service objects get replaced by sandbox versions. Those sandbox versions of the services have the same signatures but communicate only to the host process which asks the real services for answers which get returned into the child process. This is done because the sandbox is secured using seccomp and seccomp only allows reading an writing to already open file descriptors. The access to any other function of the kernel is blocked by the kernel. If the exec call in the child process tries to do anything other than calculations with the data in its memory or communication to the services over the pipe to the host process it will get terminated by the kernel.

There are some helper functions and classes which may be relevant for understanding how the sandbox works:

api_prototype.sandbox_helpers.libc_exit(n=1)[source]

Invoke _exit(2) system call.

Parameters:n (int) –
api_prototype.sandbox_helpers.read_exact(fp, n)[source]

Read only the specified number of bytes

Parameters:
  • fp (file) – file pointer
  • n (int) – number of bytes to read
Return type:

bytes

api_prototype.sandbox_helpers.write_exact(fp, s)[source]

Write only the specified number of bytes

Parameters:
  • fp (file) – file pointer
  • s (bytes) – string to write and not a byte more than that
api_prototype.sandbox_helpers.write_to_pipe(pipe, data_dict)[source]

Writes the data_dict to the give pipe.

Parameters:
  • pipe (socket.Socket) – one part of socket.socketpair()
  • data_dict (dict) –
api_prototype.sandbox_helpers.read_from_pipe(pipe)[source]

Reads a json string from the pipe and decodes the json of that string.

Parameters:pipe (socket.Socket) – one part of socket.socketpair()
Return type:dict
api_prototype.sandbox_helpers.escape_all_objectids_and_datetime(conv_dict)[source]

This function escapes all ObjectId objects to make the dict json serializable.

Parameters:conv_dict (dict) –
api_prototype.sandbox_helpers.unescape_all_objectids_and_datetime(conv_dict)[source]

This function reverses the escape of all ObjectId objects done by escape_all_objectids_and_datetime.

Parameters:conv_dict (dict) –
api_prototype.sandbox_helpers.piped_command(pipe, command_dict)[source]

Writes the command_dict to the pipe end reads the answer.

Parameters:
  • pipe (socket.Socket) – one part of socket.socketpair()
  • command_dict (dict) –
class api_prototype.sandbox_helpers.NoResponseFromHostException[source]

This is a custom exception which gets returned if no valid response is returned.

class api_prototype.models.SandboxCollectionWrapper(child_pipe)[source]

This wrapper is user to expose the db to the users assemblies. This is the class with the same methods to be used in the sandbox.

count(query)[source]

Use this function to get a count from the database.

Parameters:query (dict) –
Returns:The number of documents matching the query
Return type:int
find(query, skip=0, limit=0, sorting=None, sort_direction='asc')[source]

Use this function to read from the database. This method encodes all fields beginning with _ for returning a valid json response.

Parameters:
  • query (dict) –
  • skip (int) – Count of documents which should be skipped in the query. This is useful for pagination.
  • limit (int) – Number of documents which should be returned. This number is of course the maximum.
  • sorting (str) – String identifying the key which is used for sorting.
  • sort_direction (str) – ‘asc’ or ‘desc’
Returns:

The list of found documents. If no document is found the list is empty.

Return type:

list

find_distinct(query, attribute)[source]

Return a list representing the diversity of a given attribute in the documents matched by the query.

Parameters:
  • query (str) – json
  • attribute (str) – String describing the attribute
Returns:

A list of values the attribute can have in the set of documents described by the query

Return type:

list

find_document_for_id(document_id)[source]

Find the document with the given ID in the database. On success this returns a single document.

Parameters:document_id (string) –
Returns:The document with the given _id
Return type:dict
find_documents(query, skip=0, limit=0, sorting=None, sort_direction='asc')[source]

Use this function to read from the database. This method returns complete documents with _id fields. Do not use this to construct json responses!

Parameters:
  • query (dict) –
  • skip (int) – Count of documents which should be skipped in the query. This is useful for pagination.
  • limit (int) – Number of documents which should be returned. This number is of course the maximum.
  • sorting (str) – String identifying the key which is used for sorting.
  • sort_direction (str) – ‘asc’ or ‘desc’
Returns:

The list of found documents. If no document is found the list is empty.

Return type:

list

insert(document)[source]

Inserts a document. If the given document has a ID the ID is removed and a new ID will be generated. Time will be set to now.

Parameters:document (dict) –
Returns:The ObjectId of the insrted document
Return type:str
remove(document)[source]

Deletes the document. The document must have a valid _id

Parameters:document (dict) –
save(document)[source]

Saves the document. The document must have a valid _id

Parameters:document (dict) –
Returns:The ObjectId of the insrted document
Return type:str
class api_prototype.models.SandboxRequestPost(child_pipe)[source]

This wrapper is used to expose Django’s request object to the users assemblies. This class implements the most used methods of the request object.

This class is used to emulate request.POST

get(param)[source]

Returns the specified param

Parameters:param (str) –
class api_prototype.models.SandboxRequest(child_pipe)[source]

This wrapper is user to expose Django’s request object to the users assemblies. This class implements the most used methods of the request object.

init_body()[source]

This needs to be called after the seccomp process is initialized to fill in valid body data for the request.

class api_prototype.models.CanNotCreateNewInstanceInTheSandbox(class_name)[source]

This Exception is thrown if a script wants to create an object of a class that can not be created in the sandbox.

class api_prototype.models.Factory(pipe_child_end)[source]

Use this factory to create objects in the sandboxed process. Just pass the class name to the create method.

create(class_name, *args)[source]

This method will create an object of the class of classname with the arguments supplied after that. If the class can not be created in the sandbox it throws an Exception.

Parameters:
  • class_name (str) –
  • args – additional arguments
Returns:

Objects of the type specified in class_name

Return type:

Object

class api_prototype.models.SandboxParseHtml(pipe_child_end, html)[source]

This wrapper is user to expose html parsing functionality to the sandbox. This is the ParseHtml class with the same methods to be used in the sandbox.

contains(descriptions)[source]

Use this function to check if the html contains the described tag. The descriptions must be a list of python dictionaries with {'tag': 'tagname', 'attrs': dict}

Parameters:descriptions (dict) –
Return type:boolean
find_element_and_collect_table_like_information(descriptions, searched_information)[source]

If you are retrieving data from websites you might need to get the contents of a table or a similar structure. This is the function to get that information. The descriptions must be a list of python dictionaries with {'tag': 'tag name', 'attrs': dict}. The last description in this list will be used for a findAll of that element. This should select all the rows of the table you want to read. specify all the information you are searching for in searched_information in the following format: {'name': {'search tag': 'td', 'search attrs': dict, 'captions': ['list', 'of', 'captions'], 'content tag': 'td', 'content attrs': dict}, 'next name': ...}

Parameters:
  • descriptions (dict) –
  • searched_information (dict) –
Return type:

dict

find_element_and_get_attribute_value(descriptions, attribute)[source]

Use this function to find the described tag and return the value from attribute if the tag is found. Returns empty string if the tag or the attribute is not found. The descriptions must be a list of python dictionaries with {'tag': 'tag name', 'attrs': dict}

Parameters:
  • descriptions (dict) –
  • attribute (str) –
Returns:

string or list if attribute is class

class api_prototype.models.SandboxHttp(pipe_child_end)[source]

This wrapper is user to expose http requests to the sandbox. This is the Http class with the same methods to be used in the sandbox.

get(url)[source]

This issues a http request to the supplied url and returns the response as a string. If the request fails an empty string is returned.

Parameters:url – Url with http:// or https:// at the beginning
Type:str
Return type:str
post(url, data)[source]

This issues a http request to the supplied url and returns the response as a string. If the request fails an empty string is returned.

Parameters:
Return type:

str

class api_prototype.sandbox_helpers.EmptyRequest[source]

This class is used for processing jobs. They need request.body but it can be empty.