I have a stream that has an audio stream with a language tag equal to "sv-tal", which we use to identify a certain accessibility feature (this stream is sent to many platforms which do not necessarily have videoplayers that parse accessibility or supplementary label tags, which makes it impractical to change the tag currently).
However, the new ExoPlayer doesn't just parse the manifest tag and transcribe it directly to the Format.language field, but implements a normalization of this tag in the constructor of the Format class, so the tag is transcribed as a new, different string.
Not only that, but this Util.normalizeLanguageTag method that is used can take in one string input, like "sv-tal", and depending on which OS is on the device, it will output totally different "normalized" tags. On Android 7 and up, it parses to "tal", on Android 5 and 6 , it parses to "sv", and on Android 4 and below, it parses to "sv-tal". This surely has to be changed, since outputs, especially of util methods, should be fully dependent on inputs, and not global state unknowns such as device OS. This means that we can't distinguish in the app what the specifications of the stream is.
Is it really the job of the manifest parser to normalize tags? Shouldn't it rather just transcribe tags as is and provide the option to normalize tags to the apps? It is also not clear that we always would want to relate the language tag to the Android locale language, because the code standard is different and we lose specificity along the way, specificity we might want to expose in our apps.
Right now the ExoPlayer is so hardly coupled to this normalization (by including the util method in the constructor of a model class), that it becomes impossible to opt out from it.
Suggestion:
-Parse tag strings as is without processing/normalizing them along the way
-Offer normalization as an opt-in to the client
I think the issue here is "sv-tal" is not a valid language tag. Our language tag normalization requires valid tags according to IETF BCP 47 to work consistently across API versions.
Note that this is also a requirement for DASH manifests, see ISO/IEC 23009-1, clause 5.3.3.2 Table 5: "@lang: Declares the language code for this Adaptation Set. The syntax and semantics according to IETF RFC 5646 shall be used." (IETF RFC 5646 is the latest revision of the BCP47 tags).
Yeah, I get it and we have flagged it to our video packaging team to move towards tagging accessibility features in another way, but the normalization is not done consistently even so. It depends on which Android API version is on the device what the output of the manifest parser becomes. Surely a parser's responsibility is firstly to transcribe what is in the manifest to a model object, not to clean up the contents. I can understand the necessity to do normalization for specific purposes, but now the normalization makes it impossible to see what the manifest actually contained.
Even then it seems strange that the parse will then contain different content depending on what Android API is on the device parsing it. Consistency in the parse seems like a priority.
I can see your issue and the reasons to leave the input tag untouched. However, doing that also means that all users of the language tag need to normalize it themselves because they can't rely on a certain format. So this is a trade-off between normalizing at one point at the input side, or at many points on the output side.
Surely a parser's responsibility is firstly to transcribe what is in the manifest to a model object, not to clean up the contents
The parser's responsibility is to transcribe the manifest contents into our format model. This means values are renamed, merged, reordered and transcribed to different formats all over the parser. The reasoning behind this is to ensure that the player sees a normalized view of the media and its formats, no matter if it's a DASH, HLS, SmoothStreaming stream or anything else.
it seems strange that the parse will then contain different content depending on what Android API is on the device parsing it
This is only true for non-spec compliant tags, so I'm not sure it is a problem in general?
to see what the manifest actually contained
It might be possible to add the original tag to the AdaptationSet class that is only used as part of the DashManifest and not for further processing. It also already contains other properties we don't necessarily need but are part of the manifest.
It might be possible to add the original tag to the AdaptationSet class that is only used as part of the DashManifest and not for further processing. It also already contains other properties we don't necessarily need but are part of the manifest.
This would be a great trade-off actually. If you are reliant on a stream that doesn't follow guidelines in its manifest, then you can be expected to dig in strange places for your data, but at least the original is accessible. Could we request this change?
This is only true for non-spec compliant tags, so I'm not sure it is a problem in general?
Maybe not a big problem, but it's still preferable to have consistent behaviour across API levels. If it's easy for us to make it so that's the case, then it's a good idea to do so.
Adding the original tag to the AdaptationSet sounds very specific (to this particular use case). If we add it anywhere*, shouldn't it be in the Format and be filled in for all other media types as well?
* I'm on the fence about whether it's necessary at all. It sounds like a workaround for something one particular provider is doing, which is known to be in violation of the spec. It should be relatively easy for them just to generate compliant manifests.
The issue for me here is more that Util.normalizeLanguageTag is a problematic function. If, as you say, the manifest complies to standards, then the function doesn't do anything, it just returns the same value, which means it doesn't exist for that scenario. It only normalizes the tag if it is not compliant, but in that case, it doesn't do it in a consistent manner. It generates a new tag that is dependent on variables that are not even added as parameters. It's unacceptable that the manifest parser can return a different parse based on OS version, or whatever.
We decided to replace our API level dependent normalization code with a non-API-level dependent normalization. This should avoid any problems with different behaviour on different API levels.
While doing this, we also needed to remove some checks that were previously performed, including the one that removed the invalid subtag "tal" for your specific example. That means it shouldn't be necessary to add a separate value to DashManifest anymore.
Fixed in the commit above.
Most helpful comment
We decided to replace our API level dependent normalization code with a non-API-level dependent normalization. This should avoid any problems with different behaviour on different API levels.
While doing this, we also needed to remove some checks that were previously performed, including the one that removed the invalid subtag "tal" for your specific example. That means it shouldn't be necessary to add a separate value to DashManifest anymore.