Should we restrict stage name to certain characters? Maybe [a-z][A-Z][0-9]{hyphen, underscore}.
YAML keys can contain any thing, even spaces.
Allowing every kind of characters might create strange errors (which we might be unaware of).
If we disallow, what characters should we disallow?
The stricter - the better. We can always ease up on it.
Hey DVC team, not sure if this is the best place to do this, but as a DVC user I'd like to cast a vote for including periods as allowed characters. They are allowed in filenames in all modern OS that would run DVC, and while aren't the most common punctuation in file systems, are widely used.
I bring this up because I'm converting from 0.94 to 1.0+ and I commonly use periods in the names of .dvc files, which are now becoming stage names. Manually adding them to a dvc.yaml works, as does dvc repro, but dvc run complains.
The only compromise I can see (without being familiar with DVC implementation) is that periods are used in REGEX, but I generally never run into issues. Like I said dvc repro works fine with periods. Not sure if there are any other downsides though.
If you agree, I could potentially get my hands dirty and send a pull request. If one user's vote isn't enough, maybe in time others will chime in here as well.
@skshetry what do you think? ^
Manually adding them to a dvc.yaml works, as does dvc repro, but dvc run complains
@michael-ford, yes, you are right that we only complain on dvc run. It's unlikely that we are going to start enforcing them.
You just have to add "." (period) if you want to in the following line, but I'll suggest you slugify them instead if you are migrating.
https://github.com/iterative/dvc/blob/bc5611968f44979016bf87904e0abc2d35a046c2/dvc/stage/__init__.py#L33
We are working on new features that will affect the dvc.yaml and we'll be relaxing some of the restrictions and/or enforcing them if we use these characters for some _features_. However, the period on dvc run won't be affected, and should be safe to use as-is, irrespective of the change on INVALID_STAGENAME_CHARS.
@skshetry Should we reopen/create new issue?
@skshetry thanks for the reply. From what I understand:
dvc run) will continue to work in the near-future updates@michael-ford, I am a bit conflicted about it, and unsure about what stage name even means or _should_ mean. We disallowed stage name to not contain periods because most of the files for the DVC tracked dependencies and outputs will have one and might conflict during some operations (eg: push/pull/fetch/checkout; there's already a conflict if the files don't have any extensions).
Therefore, we decided to not support creating it but didn't go one step further to enforce it as that might mean users are generating themselves and will at least try to separate the stage name and name of the outputs. Another reason to not enforce is that it might be annoying to see an error during checkouts, for example. So, it will continue working in the future.
I can see it being useful with autocompletion, but we are moving away from stage name to the outputs name for most of the commands (repro does not _yet_) for the same reasons.
Another reason, that is already mentioned, we are working on the next updates with _dvc.yaml_ that might change things here and might add some clarifications to it. So, I am a bit hesitant to support to change it yet.
Saying that we are open to relax the restriction, for which we'd love to hear what the pros/cons of this rule for you are, how will you ensure they don't conflict, and what DVC can do to prevent those conflicts. That'd help us make a better decision around this. :slightly_smiling_face:
@skshetry Should we reopen/create new issue?
Reopening for further discussions.
Closing due to inactivity. @michael-ford, please feel free to reopen anytime.