Download Latest Version v0.14.8 source code.tar.gz (626.0 kB)
Email in envelope

Get an email when there's a new version of Datapipe

Home / v0.13.0-alpha.4
Name Modified Size InfoDownloads / Week
Parent folder
README.md 2023-07-19 1.4 kB
v0.13.0-alpha.4 source code.tar.gz 2023-07-19 606.9 kB
v0.13.0-alpha.4 source code.zip 2023-07-19 651.4 kB
Totals: 3 Items   1.3 MB 0

WIP 0.13.0

Changes

Core

  • Add datapipe.metastore.TransformMetaTable. Now each transform gets it's own meta table that tracks status of each transformation
  • Generalize BatchTransform and DatatableBatchTransform through BaseBatchTransformStep
  • Add transform_keys to *BatchTransform
  • Move changed idx computation out of DataStore to BaseBatchTransformStep
  • Add column priority to transform meta table, sort work by priority
  • Switch from vanilla tqdm to tqdm_loggable for better display in logs
  • TableStoreFiledir constructor accepts new argument fsspec_kwargs
  • Add filters, order_by, order arguments to *BatchTransformStep
  • Add magic injection of ds, idx, run_config to transform function via parameters introspection

CLI

  • Add step reset-metadata CLI command
  • Add step fill-metadata CLI command that populates transform meta-table with all indices to process
  • Add step run-idx CLI command
  • CLI step run_changelist command accepts new argument --chunk-size
  • New CLI command table migrate_transform_tables for 0.13 migration

Execution

  • Executors: datapipe.executor.SingleThreadExecutor, datapipe.executor.ray.RayExecutor

Deployment

  • Add helm chart for running regular loops in k8s as CronJob

Bugfixes

  • Fix QdrantStore.read_rows when no idx is specified
Source: README.md, updated 2023-07-19