This package contains a lightweight data transformation framework with a focus on transparency and complexity reduction. Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code. PostgreSQL as a data processing engine. Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines. GNU make semantics. Nodes depend on the completion of upstream nodes. No data dependencies or data flows. No in-app data processing: command line tools as the main tool for interacting with databases and data. Single machine pipeline execution based on Python's multiprocessing. No need for distributed task queues. Easy debugging and output logging. Cost based priority queues: nodes with higher cost (based on recorded run times) are run first.
Features
- Cost based priority queues: nodes with higher cost (based on recorded run times) are run first
- Extensive web ui. The web browser as the main tool for inspecting, running and debugging pipelines
- Data integration pipelines as code: pipelines, tasks and commands are created using declarative Python code
- PostgreSQL as a data processing engine
- GNU make semantics. Nodes depend on the completion of upstream nodes. No data dependencies or data flows
- No in-app data processing: command line tools as the main tool for interacting with databases and data