1. About schedula¶
schedula is a dynamic flow-based programming environment for python, that handles automatically the control flow of the program. The control flow generally is represented by a Directed Acyclic Graph (DAG), where nodes are the operations/functions to be executed and edges are the dependencies between them.
The algorithm of schedula dates back to 2014, when a colleague asked for a method to automatically populate the missing data of a database. The imputation method chosen to complete the database was a system of interdependent physical formulas - i.e., the inputs of a formula are the outputs of other formulas. The current library has been developed in 2015 to support the design of the CO2MPAS tool - a CO2 vehicle simulator. During the developing phase, the physical formulas (more than 700) were known on the contrary of the software inputs and outputs.
1.1. Why schedula?¶
The design of flow-based programs begins with the definition of the control flow graph, and implicitly of its inputs and outputs. If the program accepts multiple combinations of inputs and outputs, you have to design and code all control flow graphs. With normal schedulers, it can be very demanding.
While with schedula, giving whatever set of inputs, it automatically calculates any of the desired computable outputs, choosing the most appropriate DAG from the dataflow execution model.
The DAG is determined at runtime and it is extracted using the shortest path from the provided inputs. The path is calculated based on a weighted directed graph (dataflow execution model) with a modified Dijkstra algorithm.
schedula makes the code easy to debug, to optimize, and to present it to a non-IT audience through its interactive graphs and charts. It provides the option to run a model asynchronously or in parallel managing automatically the Global Interpreter Lock (GIL), and to convert a model into a web API service.
1.2. Dataflow Execution Model¶
Dispatcher is the main model of schedula
and it represents the dataflow execution model of your code. It is defined by
a weighted directed graph. The nodes are the operations to be executed.
The arcs between the nodes represent their dependencies. The weights are used to
determine the control flow of your model (i.e. operations’ invocation order).
Conceptually, when the model is executed, input-data flows as tokens along the
arcs. When the execution/
begins, a special node (
START) places the data onto
key input arcs, triggering the computation of the control flow. The latter is
represented by a Directed Acyclic Graph (DAG) and it is defined as the shortest
path from the provided inputs. It is computed using the weighted directed graph
and a modified Dijkstra algorithm. A node is executed when its inputs and domain
are satisfied. After the node execution, new data are placed on some or all of
its output arcs. In presence of cycles in the graph, to avoid undesired infinite
loops, the nodes are computed only once. In case of an execution failure of a
node, the algorithm searches automatically for an alternative path to compute
the desired outputs. The nodes are differentiated according to their scope.
schedula defines three node’s types:
- data node: stores the data into the solution. By default, it is executable when it receives one input arch.
- function node: invokes the user defined function and place the results onto its output arcs. It is executable when all inputs are satisfied and it has at least one data output to be computed.
- sub-dispatcher node: packages particular dataflow execution model as sub component of the parent dispatcher. Practically, it creates a bridge between two dispatchers (parent and child) linking some data nodes. It allows to simplify your model, reusing some functionality defined in other models.
The key advantage is that, by this method, the scheduling is not affected by the operations’ execution times. Therefore, it is deterministic and reproducible. Moreover, since it is based on flow-based programming, it inherits the ability to execute more than one operation at the same time, making the program executable in parallel. The following video shows an example of a runtime dispatch.