dvc cache push is slow for many files

Created on 1 Mar 2018  路  9Comments  路  Source: iterative/dvc

Pushing 1.5Gb dvc repository with 8K small data files takes more than 6 hours inside AWS.

bug

Most helpful comment

@mazzma12 Looks like you've mistaken dvc and dvc.org repos :) You've meant to link https://github.com/iterative/dvc.org/issues/497 . I'll copy your comment there. Thanks!

All 9 comments

There were 83K very small files, not 8K.
So, this issue priority is not too high.

Data directories are big part of this issue since DVC mirrors the directory structure and creates a corresponded file for each file from the dir.

@efiop is there a way to create a single file to store the data directory structure?

I have a few thoughts on the alternative ways to store dirs. I'll take a closer look soon.

Actually, the main problem here is that we don't reuse aws connections and instead establish a new one every time we push/pull a file. This is a known issue. I am already working on it.

Ok, found the issue. Cached dirs are currently processed in 1 thread in cloud logic code. Already working on fixing that.

Managed to speed up x7 times for a few hundred small files with https://github.com/dataversioncontrol/dvc/pull/504 . Still, dvc is not really optimized for an extreme number of small files, so you might want to consider packing those into tar or something. Closing for now.

x7 is awesome! Thank you.

Yeah, we need to invent some file packing tricks like Git does.

Note that the group between users has to be a primary group and not a secondary. This can be changed by running : usermod -g groupName $USER

@mazzma12 Looks like you've mistaken dvc and dvc.org repos :) You've meant to link https://github.com/iterative/dvc.org/issues/497 . I'll copy your comment there. Thanks!

Was this page helpful?
0 / 5 - 0 ratings

Related issues

jorgeorpinel picture jorgeorpinel  路  3Comments

shcheklein picture shcheklein  路  3Comments

dnabanita7 picture dnabanita7  路  3Comments

mdscruggs picture mdscruggs  路  3Comments

siddygups picture siddygups  路  3Comments