Thursday, July 23, 2020

Anatomy of asynchronous tasks

Anatomy of asynchronous interaction

When a parallel thread needs more data, it calls an interface function, which returns the data as a result. If data is not ready, the function blocks the caller's stack. Asynchronous tasks may not block, so the communication protocol is different: the task calls the request procedure with callback parameter. This request procedure exits immediately, and the calling task also exits, leaving the worker thread. Then, when the data is ready (it could be in the same moment), the callback procedure is called with data as a parameter. This call resubmits the task to a thread pool, if there are no other blockers. For example, if the task is running, processing previous portion of data, submitting it again may be dangerous and should be prevented. Such tasks with sequential processing of incoming data are called actors and  first was proposed by Carl Hewitt.

Details of the interaction protocol can be different. One important feature is the number of callback invocations for single request invocation. There are 3 variants:
 - single callback call - this is how tasks generated by CompletableFuture factory methods  work. They get single data item, process it, compute the result and never return to execution.
 - unlimited callback calls - Hewitt's actors. For each incoming data item. a user-defined procedure is invoked.
 - Reactive streams protocol  -  request procedure is split in 2: Publisher.subscribe and Subscription.request. Subscribe only informs caller that a new client wants to receive data. Request  procedure has a parameter - number of tokens which client is able to accept. This procedure can be called multiple times, after client finishes to process next portion of data.


Taxonomy of actor's ports

Another protocol attribute is whether the protocol allows completion signals. Completion signals an be of 2 kinds: normal and exceptional. CompletableFuture protocol supports only exceptional signal, there is no need for normal completion, as all data interactions pass only single data token. Akka actors does not support either signals. Reactive streams support both.

Yet another feature is the colour of tokens. Black tokens are indistinguishable and only number of tokens should be passed. Colour tokens are normal data messages. All 3 above mentioned protocols support colour tokens. Black tokens are supported by the request procedure of the reactive protocol. But in synchronous parallel programming, black tokens were used long ago, with the help of Dijkstra's semaphores.

Functioning of actor's body

Actor's functionality can be described in Petri Net terms as follows: actor consists of 2 owns places and a transition. Control place is internal and is not accessible outside the actors internals. It can contain at most 1 black token. Data place can contain unlimited number of data tokens, send by connected data providers. When both places are not empty, the actor is fired, with tokens extracted from the places and saved in the actor's body. Firing is implemented as submitting to a thread pool. When actor gets to a processor, the user-defined procedure with single parameter - data token - is called. The processing of control token is hidden from the user. When the user-defined procedure returns, the control token is returned to its place, which may cause another submission to a thread pool, if more data tokens exist.


Actor's toolkit 

Nothing prevents us to construct an actor with multiple token places (which are usually called ports) of various types. And in fact, output connection for reactive Publisher consists of 2 ports: one coloured port for reference to subscribers and another black port  for request counter. But all implementations I looked at do not provide access to port classes and so do not allow to create actors with arbitrary port configuration. So I developed generic actor library DF4J which has base classes for various actors and ports. Ports can are freely combined in a single actor. Actors can be aggregated in arbitrary dataflow graph with unified error handling. See https://github.com/akaigoro/df4j.


No comments:

Post a Comment