and implement throughout the docs.
And consider changing the
md5field name (also in DVC-files)?
Notes:
If it is just dos2unix <datafile> && md5sum <datafile> (as described here: https://github.com/iterative/dvc.org/issues/68#issuecomment-520301930),
then it actually is pure MD5, but the datafile is normalized first, to avoid any mismatching in different operating systems.
I think that when the explanation is meant for the users (who don't need to know the internal details) the terms hash/checksum are ok.
When explaining internal details, than MD5 is better, and also accurate. However a note on end-of-line normalization may also be useful.
I agree with @dashohoxha .
checksum vs hash. Hash is a better term because of the primary goal we are using these md5s for - store and find binary files in our storage. (Though we check the consistency as well).
I'm fine to use checksum everywhere, especially considering that it won't be hard to switch to a new version.
MD5, etag, other checksums make sense when we go into details.
I also vote for "checksum". In fact I don't see how it's the wrong concept at all, other than the purpose we use it for. I think "hash" is too general, TBH.

UPDATE: I made sure every instance of MD5 is upper case unless it's a file or command code block as part of #669.
Hi! Notice that per some relatively recent conversation with @shcheklein, we stopped using "checksum" in docs, in favor of terms like (MD5/md5) "file hash" and "hash values" (in DVC-files). See #962
Should we apply the same in docs?
Note that there is somewhat of a collision between that term and (Git) "commit hash" for which we're using "commit SHA hash" in the docs PR right now (may review that one). Cc @efiop
@jorgeorpinel I would reconsider using MD5. As https://github.com/iterative/dvc/issues/1676 and https://github.com/iterative/dvc/issues/3069 are still open, and we do not intend to drop them, there exists possibility that at some point in time md5 will be obsolete term. This will trigger a lot of changes on the docs side.
Agree. Right now we are reducing the usage of MD5 in docs because AFAIK our file hashes are not always direct MD5 (and this is still not properly explained, see #68)
This issue is to decide what to do with all these terms so if you guys tell me we should remove MD5 as much as possible then we'll do that.
Most helpful comment
Agree. Right now we are reducing the usage of MD5 in docs because AFAIK our file hashes are not always direct MD5 (and this is still not properly explained, see #68)
This issue is to decide what to do with all these terms so if you guys tell me we should remove MD5 as much as possible then we'll do that.