DATA VERSION CONTROL

Installation

Download package and install vscode extension from https://dvc.org/

Then install Google Drive supported dvc version for python:

uv pip install dvc[gdrive]

Usage

Example Folder Structure:

root/
├── data/inputs
│   ├── BOM.xlsx
│   ├── clientes.xlsx
│   ├── EDIs.xlsx
│   ├── originals
│   │   ├── BOM_original.xlsx
│   │   ├── coproduccion.json
│   │   ├── HdCMang.xlsx

Initialize DVC

dvc init

Track files and create .dvc file for each tracked file

dvc add data/inputs/
# Note: if the files are already tracked, remove from git using:
# git rm -r --cached data/inputs/

Track the changes with git

git add data/inputs.dvc data/.gitignore
# or add '/data/inputs/ to main .gitignore instead of the new file. 
# commit to git
git commit -am "Change on checksums / .dvc" 

Setup Remote

Google Drive

Create Google Drive API for DVS (for now only service accounts work)

https://dvc.org/doc/user-guide/data-management/remote-storage/google-drive#using-a-custom-google-cloud-project-recommended

https://console.cloud.google.com/iam-admin/serviceaccounts