dvc --version
0.22.0
Installed with macOS package to macOS Mojave.
I tried this tutorial to reproduce and got some error.
During the tutorial, dvc add
command should not copy cashed data. But as you can see below the total folder size is doubled.
aiml/tutorial_dvc/classify master
â–¶ du -sh data
41M data
aiml/tutorial_dvc/classify master
â–¶ du -sh .dvc/cache
41M .dvc/cache
aiml/tutorial_dvc/classify master
â–¶ du -sh .
82M .
Reflinks to nodes are different. Maybe it's the case. Let me know if you need some extra information about my environment.
aiml/tutorial_dvc/classify master
â–¶ ls -i data/Posts.xml.zip
4690717 data/Posts.xml.zip
aiml/tutorial_dvc/classify master
â–¶ ls -i .dvc/cache/ec/
4688793 1d2935f811b77cc49b031b999cbf17
I've checked issue #942 but fix in 0.14.0
seems to be not working
Hi @TezRomacH !
Your system supports reflinks, so dvc used them to create a link from cache to your workspace. No data duplication has occurred. Unlike hardlink, reflink to a file has different inode, so it is a bit harder to see it working for yourself. Also, du
utility still counts them as two separate full-blown files, even though there is no duplication on the filesystem level. We should definitely make it more clear in the documentation. Created https://github.com/iterative/dvc.org/issues/139 .
Also, to be sure that no duplication occurs, you could take a look at the free space on your drive using df
utility, which will show that your free space didn't go down once again by that file size after you've dvc add
-ed it.
Thanks for the feedback!
Oh, thanks, now it's clearer!
Most helpful comment
Hi @TezRomacH !
Your system supports reflinks, so dvc used them to create a link from cache to your workspace. No data duplication has occurred. Unlike hardlink, reflink to a file has different inode, so it is a bit harder to see it working for yourself. Also,
du
utility still counts them as two separate full-blown files, even though there is no duplication on the filesystem level. We should definitely make it more clear in the documentation. Created https://github.com/iterative/dvc.org/issues/139 .Also, to be sure that no duplication occurs, you could take a look at the free space on your drive using
df
utility, which will show that your free space didn't go down once again by that file size after you'vedvc add
-ed it.Thanks for the feedback!