schedula: An intelligent function scheduler¶
| release: | 0.2.8 |
|---|---|
| date: | 2018-10-09 00:00:00 |
| repository: | |
| pypi-repo: | |
| docs: | |
| wiki: | |
| download: | |
| keywords: | scheduling, dispatch, dataflow, processing, calculation, dependencies, scientific, engineering, simulink, graph theory |
| developers: |
|
| license: |
What is schedula?¶
Schedula implements a intelligent function scheduler, which selects and executes functions. The order (workflow) is calculated from the provided inputs and the requested outputs. A function is executed when all its dependencies (i.e., inputs, input domain) are satisfied and when at least one of its outputs has to be calculated.
Note
Schedula is performing the runtime selection of the minimum-workflow to be invoked. A workflow describes the overall process - i.e., the order of function execution - and it is defined by a directed acyclic graph (DAG). The minimum-workflow is the DAG where each output is calculated using the shortest path from the provided inputs. The path is calculated on the basis of a weighed directed graph (data-flow diagram) with a modified Dijkstra algorithm.
Installation¶
To install it use (with root privileges):
$ pip install schedula
Or download the last git version and use (with root privileges):
$ python setup.py install
Install extras¶
Some additional functionality is enabled installing the following extras:
- plot: enables the plot of the Dispatcher model and workflow
(see
plot()). - web: enables to build a dispatcher Flask app (see
web()). - sphinx: enables the sphinx extension directives (i.e., autosummary and dispatcher).
To install schedula and all extras, do:
$ pip install schedula[all]
What is schedula?¶
Schedula implements a intelligent function scheduler, which selects and executes functions. The order (workflow) is calculated from the provided inputs and the requested outputs. A function is executed when all its dependencies (i.e., inputs, input domain) are satisfied and when at least one of its outputs has to be calculated.
Note
Schedula is performing the runtime selection of the minimum-workflow to be invoked. A workflow describes the overall process - i.e., the order of function execution - and it is defined by a directed acyclic graph (DAG). The minimum-workflow is the DAG where each output is calculated using the shortest path from the provided inputs. The path is calculated on the basis of a weighed directed graph (data-flow diagram) with a modified Dijkstra algorithm.
Installation¶
To install it use (with root privileges):
$ pip install schedula
Or download the last git version and use (with root privileges):
$ python setup.py install
Install extras¶
Some additional functionality is enabled installing the following extras:
- plot: enables the plot of the Dispatcher model and workflow
(see
plot()). - web: enables to build a dispatcher Flask app (see
web()). - sphinx: enables the sphinx extension directives (i.e., autosummary and dispatcher).
To install schedula and all extras, do:
$ pip install schedula[all]
Why may I use schedula?¶
Imagine we have a system of interdependent functions - i.e. the inputs of a function are the output for one or more function(s), and we do not know which input the user will provide and which output will request. With a normal scheduler you would have to code all possible implementations. I’m bored to think and code all possible combinations of inputs and outputs from a model.
Solution¶
Schedula allows to write a simple model (Dispatcher()) with
just the basic functions, then the Dispatcher() will select and
execute the proper functions for the given inputs and the requested outputs.
Moreover, schedula provides a flexible framework for structuring code. It
allows to extract sub-models from a bigger one.
Note
A successful application is CO2MPAS, where schedula has been used
to model an entire vehicle.
Very simple example¶
Let’s assume that we have to extract some filesystem attributes and we do not
know which inputs the user will provide. The code below shows how to create a
Dispatcher() adding the functions that define your system.
Note that with this simple system the maximum number of inputs combinations is
31 (\((2^n - 1)\), where n is the number of data).
>>> import schedula >>> import os.path as osp >>> dsp = schedula.Dispatcher() >>> dsp.add_data(data_id='dirname', default_value='.', initial_dist=2) 'dirname' >>> dsp.add_function(function=osp.split, inputs=['path'], ... outputs=['dirname', 'basename']) 'split' >>> dsp.add_function(function=osp.splitext, inputs=['basename'], ... outputs=['fname', 'suffix']) 'splitext' >>> dsp.add_function(function=osp.join, inputs=['dirname', 'basename'], ... outputs=['path']) 'join' >>> dsp.add_function(function_id='union', function=lambda *a: ''.join(a), ... inputs=['fname', 'suffix'], outputs=['basename']) 'union'
Tip
You can explore the diagram by clicking on it.
Note
For more details how to created a Dispatcher() see:
add_data(),
add_function(),
add_dispatcher(),
SubDispatch(),
SubDispatchFunction(),
SubDispatchPipe(), and
DFun().
The next step to calculate the outputs would be just to run the
dispatch() method. You can invoke it with just the
inputs, so it will calculate all reachable outputs:
>>> inputs = {'path': 'schedula/_version.py'} >>> o = dsp.dispatch(inputs=inputs) >>> o Solution([('path', 'schedula/_version.py'), ('basename', '_version.py'), ('dirname', 'schedula'), ('fname', '_version'), ('suffix', '.py')])
or you can set also the outputs, so the dispatch will stop when it will find all outputs:
>>> o = dsp.dispatch(inputs=inputs, outputs=['basename']) >>> o Solution([('path', 'schedula/_version.py'), ('basename', '_version.py')])
Advanced example (circular system)¶
Systems of interdependent functions can be described by “graphs” and they might contains circles. This kind of system can not be resolved by a normal scheduler.
Suppose to have a system of sequential functions in circle - i.e., the input of a function is the output of the previous function. The maximum number of input and output permutations is \((2^n - 1)^2\), where n is the number of functions. Thus, with a normal scheduler you have to code all possible implementations, so \((2^n - 1)^2\) functions (IMPOSSIBLE!!!).
Schedula will simplify your life. You just create a
Dispatcher(), that contains all functions that link your data:
>>> import schedula >>> dsp = schedula.Dispatcher() >>> plus, minus = lambda x: x + 1, lambda x: x - 1 >>> n = j = 6 >>> for i in range(1, n + 1): ... func = plus if i < (n / 2 + 1) else minus ... f = dsp.add_function('f%d' % i, func, ['v%d' % j], ['v%d' % i]) ... j = i
Then it will handle all possible combination of inputs and outputs
(\((2^n - 1)^2\)) just invoking the dispatch()
method, as follows:
>>> out = dsp.dispatch(inputs={'v1': 0, 'v4': 1}, outputs=['v2', 'v6']) >>> out Solution([('v1', 0), ('v4', 1), ('v2', 1), ('v5', 0), ('v6', -1)])
Sub-system extraction¶
Schedula allows to extract sub-models from a model. This could be done with the
shrink_dsp() method, as follows:
>>> sub_dsp = dsp.shrink_dsp(('v1', 'v3', 'v5'), ('v2', 'v4', 'v6'))
Note
For more details how to extract a sub-model see:
get_sub_dsp(),
get_sub_dsp_from_workflow(),
SubDispatch(),
SubDispatchFunction(), and
SubDispatchPipe().
Next moves¶
Things yet to do include a mechanism to allow the execution of functions in parallel.
API Reference¶
The core of the library is composed from the following modules:
It contains a comprehensive list of all modules and classes within schedula.
Docstrings should provide sufficient understanding for any individual function.
Modules:
dispatcher |
It provides Dispatcher class. |
utils |
It contains utility classes and functions. |
ext |
It provides sphinx extensions. |
Changelog¶
v0.2.6 (2018-09-13)¶
Feat¶
- (setup): Patch to use sphinxcontrib.restbuilder in setup long_description.
v0.2.5 (2018-09-13)¶
Fix¶
- (doc): Correct link docs_status.
- (setup): Use text instead rst to compile long_description + add logging.
v0.2.4 (2018-09-13)¶
Fix¶
- (sphinx): Correct bug sphinx==1.8.0.
- (sphinx): Remove all sphinx warnings.
v0.2.2 (2018-08-02)¶
Fix¶
- (des): Correct bug of get_id when tuple ids nodes are given as input or outputs of a sub_dsp.
- (des): Correct bug when tuple ids are given as inputs or outputs of add_dispatcher method.
v0.2.1 (2018-07-24)¶
Feat¶
- (setup): Update Development Status to 5 - Production/Stable.
- (setup): Add additional project_urls.
- (doc): Add changelog to rtd.
Fix¶
- (doc): Correct link docs_status.
- (des): Correct bugs get_des.
v0.2.0 (2018-07-19)¶
Feat¶
- (doc): Add changelog.
- (travis): Test extras.
- (des): Avoid using sphinx for getargspec.
- (setup): Add extras_require to setup file.
Fix¶
- (setup): Correct bug in get_long_description.
v0.1.19 (2018-06-05)¶
Fix¶
- (dsp): Add missing content block in note directive.
- (drw): Make sure to plot same sol as function and as node.
- (drw): Correct format of started attribute.
v0.1.18 (2018-05-28)¶
Feat¶
- (dsp): Add DispatchPipe class (faster pipe execution, it overwrite the existing solution).
- (core): Improve performances replacing datetime.today() with time.time().
v0.1.17 (2018-05-18)¶
Feat¶
- (travis): Run coveralls in python 3.6.
Fix¶
- (web): Skip Flask logging for the doctest.
- (ext.dispatcher): Update to the latest Sphinx 1.7.4.
- (des): Use the proper dependency (i.e., sphinx.util.inspect) for getargspec.
- (drw): Set socket option to reuse the address (host:port).
- (setup): Correct dill requirements dill>=0.2.7.1 –> dill!=0.2.7.
v0.1.14 (2017-07-11)¶
Fix¶
- (io): pin dill version <=0.2.6.
- (abort): abort was setting Exception.args instead of sol attribute.
v0.1.11 (2017-05-04)¶
Feat¶
Fix¶
- (doc): Replace type function with callable.
- (drw): Folder name without ext.
- (test): Avoid Documentation of DspPlot.
- (doc): fix docstrings types.
v0.1.10 (2017-04-03)¶
Feat¶
- (sol): Close sub-dispatcher solution when all outputs are satisfied.
Fix¶
- (drw): Log error when dot is not able to render a graph.
v0.1.8 (2017-02-09)¶
Feat¶
- (drw): Update plot index + function code highlight + correct plot outputs.
v0.1.6 (2017-02-08)¶
Fix¶
- (setup): Avoid setup failure due to get_long_description.
- (drw): Avoid to plot unneeded weight edges.
- (dispatcher): get_sub_dsp_from_workflow set correctly the remote links.
v0.1.5 (2017-02-06)¶
Feat¶
- (exl): Drop exl module because of formulas.
- (sol): Add input value of filters in solution.
Fix¶
- (drw): Plot just one time the filer attribute in workflow +filers|solution_filters .
v0.1.4 (2017-01-31)¶
Feat¶
- (drw): Save autoplot output.
- (sol): Add filters and function solutions to the workflow nodes.
- (drw): Add filters to the plot node.
Fix¶
- (dispatcher): Add missing function data inputs edge representation.
- (sol): Correct value when apply filters on setting the node output.
- (core): get_sub_dsp_from_workflow blockers can be applied to the sources.
v0.1.3 (2017-01-29)¶
Fix¶
- (dsp): Raise a DispatcherError when the pipe workflow is not respected instead KeyError.
- (dsp): Unresolved references.
v0.1.2 (2017-01-28)¶
Feat¶
- (dsp): add_args _set_doc.
- (dsp): Remove parse_args class.
- (readme): Appveyor badge status == master.
- (dsp): Add _format option to get_unused_node_id.
- (dsp): Add wildcard option to SubDispatchFunction and SubDispatchPipe.
- (drw): Create sub-package drw.
Fix¶
- (dsp): combine nested dicts with different length.
- (dsp): are_in_nested_dicts return false if nested_dict is not a dict.
- (sol): Remove defaults when setting wildcards.
- (drw): Misspelling outpus –> outputs.
- (directive): Add exception on graphviz patch for sphinx 1.3.5.
v0.1.1 (2017-01-21)¶
Fix¶
- (site): Fix ResourceWarning: unclosed socket.
- (setup): Not log sphinx warnings for long_description.
- (travis): Wait util the server is up.
- (rtd): Missing requirement dill.
- (travis): Install first - pip install -r dev-requirements.txt.
- (directive): Tagname from _img to img.
- (directive): Update minimum sphinx version.
- (readme): Badge svg links.
Other¶
- Add project descriptions.
- (directive): Rename schedula.ext.dsp_directive –> schedula.ext.dispatcher.
- Update minimum sphinx version and requests.