Pipelines

If you find yourself repeating sequence of actions to get or update the results of your project, then you may already have a pipeline. For example, a data science workflow could involve:

Gathering data for training and validation
Extracting useful features from the training dataset
(Re)training an ML model
Evaluating the results against the validation set

DVC helps you define these stages in a standard YAML format (.dvc and dvc.yaml files), making your pipelinepipeline more manageable and consistent to reproduce.

See Get Started: Data Pipelines for a hands-on introduction to this topic.

Modifying Large Datasets Defining Pipelines

🐛 Found an issue? Let us know! Or fix it:

Edit on GitHub

❓ Have a question? Join our chat, we will help you:

Discord Chat

🤝 Data on petabyte scale? Checkout our sister project:

lakeFS Docs