In webhint we've recently started integrating BCD to automatically warn users about potential cross-browser support issues.
One place we currently have a blind spot is on support for features which can make up part of a CSS value (e.g. 3D transforms). We know other efforts have previously maintained their own lists to work around this, but we'd like to try doing this in a way that other tools can benefit from as well.
As such I'm proposing adding a matches annotation (or similar) containing a regular expression that can be run against a CSS value to identify if that value uses a given feature. For 3D transforms this might look something like:
{
"css": {
"properties": {
"transform": {
...
"3d": {
"__compat": {
"description": "3D support",
"matches": "\\b(matrix3d|perspective|rotate3d|rotateZ|scale3d|scaleZ|translate3d|translateZ)\\b",
...
Alternatively (or additionally), perhaps even just a list of keywords to look for in a value would be sufficient (depending on what other CSS properties might need):
"keywords": [
"matrix3d",
"perspective",
...
]
I think this would be a great addition and would maybe help us make more features machine-readable / machine-friendly.
@eduardoboucas tried to work around this by introducing "transforms" https://github.com/eduardoboucas/compat-report#transforms. See also https://github.com/eduardoboucas/compat-report/issues/4 for CSS sub-features that have this problem.
Looks like webhint.io runs into a similar problem now, so we should fix this in our data schema so that hopefully data consumers like these two can work with a solution here.
@eduardoboucas do you have thoughts on the "matches" proposal above?
Hi! The proposal seems quite similar to the concept I put together at the time – which, sadly, I haven't had a chance to work on since. The main difference appears to be the format. I was going with individual JS files at the time, because I was worried that a regular expression just wouldn't cut it for certain edge cases, but maybe that's not the case.
If we're talking about adding this information to BCD instead of keeping it somewhere else, then a regular expression probably is the best approach, as it keeps things simple and language-agnostic.
At the time, I made a list of all the sub-features that would require some form of detection. Perhaps it's worth going through it and seeing if a regular expression would do the job for most cases?
TL;DR – Looks great! 👍
I have one concern about this. You need a way to ensure that the regex is
both valid and returns the expected results. In other words, it sounds like
you need tests.
Joe Medley | Technical Writer, Chrome DevRel | [email protected] |
816-678-7195
If an API's not documented it doesn't exist.
On Mon, Feb 4, 2019 at 3:00 AM Eduardo Bouças notifications@github.com
wrote:
Hi! The proposal seems quite similar to the concept I put together at the
time – which, sadly, I haven't had a chance to work on since. The main
difference appears to be the format. I was going with individual JS files
at the time, because I was worried that a regular expression just wouldn't
cut it for certain edge cases, but maybe that's not the case.If we're talking about adding this information to BCD instead of keeping
it somewhere else, then a regular expression probably is the best approach,
as it keeps things simple and language-agnostic.At the time, I made a list
https://github.com/eduardoboucas/compat-report/issues/4 of all the
sub-features that would require some form of detection. Perhaps it's worth
going through it and seeing if a regular expression would do the job for
most cases?TL;DR – Looks great! 👍
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
https://github.com/mdn/browser-compat-data/issues/3366#issuecomment-460206931,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH0vi94w4sIKYKzl86pYqyr30_IRXmz1ks5vKBLMgaJpZM4afQ21
.
@eduardoboucas Thanks for sharing the list!
Looking through I do see that a regular expression against full values would have certain edge cases for situations where a user-defined string can be included (and CSS variables likely extend this issue to most properties).
That said, I'd expect to continue to use something like postcss-value-parser as you do in compat-report and look at specific sub-values contained there as opposed to the entire value string (in which case the keywords part of my proposal is probably a better fit for most properties except things like alpha hex values).
@jpmedley I agree including tests for regular expressions would be a good idea, though if we preferred using the simpler list of keywords syntax and only used matches where absolutely necessary (e.g. alpha hex values) we'd probably only need tests for the latter.
I should note some items would still be difficult to cover (e.g. those which depend on the context in which they are applied such as transform-origin in SVG), but I think increasing the amount of coverage would be worth it even if some cases are still left out.
TL;DR I'd like to use keywords for simpler cases with the expectation that machine consumers are tokenizing values to match keywords, and use matches only when needed for cases like alpha hex values in background-color or the three-value-syntax for transform-origin.
Hi @antross, @Elchi3 and I discussed this last week and we think this looks a promising addition to BCD. He asked me to take steps forward on this.
While I work on a more formal proposal for changing the schema, I wanted to ask: can you share any more example regexes or keyword lists that you're using with webhint? I'm guessing this is in the webhint source somewhere, I'm just not sure where to look.
In the meantime, I'm going to take another look at @eduardoboucas's prior art and start making a plan to bring this to BCD. Thanks everybody!
I wanted to ask: can you share any more example regexes or keyword lists that you're using with webhint?
Currently webhint only analyses property names and ignores values so it doesn't include any regexes or keyword lists for this scenario. My hope was to help get the data exposed in BCD before we had to start maintaining our own lists on the side.
matches propertyOK, I've been mulling this over for a couple days (it turns out writing something down really brings some issues to light) and here's what I've come up with. I've leaned heavily on your proposal, @antross, but thinking of BCD as a whole, I've given it a bit of a twist. I'd love to have feedback (especially from @antross and @Elchi3)!
I propose we change the schema to allow simple support statements to have an optional property, matches, that defines an object containing properties containing an array of keywords or a regular expression. In JSON, it’d look like this:
"description": "3D support",
"matches": {
"keywords": [
"matrix3d",
"perspective",
"and-so-on"
]
},
"description": "3D support",
"matches": {
"regex": "\\b(matrix3d|perspective|and-so-on)\\b"
},
Formally, simple support statements may have a property named matches. If it exists, its value must be an object. The matches object must have one property, either keywords or regex. The value of keywords must be an array of one or more strings (and those strings shouldn’t be regular expressions). The value of regex must be a string containing a regular expression.
At least initially, I’d like to restrict the matches property to the css.properties.* hierarchy, until we’ve had some experience creating and reviewing this data.
The meaning of the matches property should be understood (and documented) to be a hint about what source might correspond to the feature. Source matched against the information in the matches object should be considered best effort (i.e., we’re not making any guarantees, about matching or not matching the feature).
More specifically, we should define keywords or regexes that make sense at the narrowest scope for a given feature. For CSS, the matching data for a feature that corresponds to some possible values of a property should match against the list of tokens in a list of values, not a whole declaration (e.g., the regex above, not something like transform: (matrix3d|perspective|and-so-on)).
Finally, matches shouldn't be a substitute for just looking up the name of the feature itself. For example, it feels pretty pointless to do something like this:
{
"css": {
"properties": {
"background-image": {
"__compat": {
"matches": {
"keywords": ["background-image"]
}
Let's not do that.
To sort of avoid the problem, we should probably avoid regexes where possible. In the event we must add a regex, I think we should require tests. Those tests should live in the /test/ directory. I think this requires less planning and may be easier to deduplicate (e.g., you only need one set of tests for the regex for #RRGGBBAA color values, but that regex will appear in multiple features).
matches (or regex) and keywords as top-level simple support statement propertiesThis was tempting, as it's a flatter hierarchy, but I figured adding multiple top-level options to simple support statements would make it harder to find out if a feature has any matching data (i.e., you have to check for the existence of two properties). Also, if we wanted to add some additional way of matching, we'd have to add yet another top-level property.
matches valueThis would, again, be flatter, though at the expense of being slightly less self-descriptive. Also I think it might be slightly error prone for reviewing (imagine a single regex accidentally put inside an array).
Additionally, specifying a type (with the key in matches) gives us the opportunity to introduce additional ways of matching source to BCD features. For example, we could add hints for JavaScript BCD features with ESTree interface names or ESLint selectors without overloading strings or arrays with additional meaning (I’m not saying we should necessarily do those things but it seems easy to not immediately foreclose on the option).
I like it.
Do you think we might need two different types of regexes: one that's intended to match a full value (e.g. three-value syntax for transform-origin) and one that's intended to just match a token within a value (e.g. #RRGGBBAA)?
The latter could still pair with something like postcss-value-parser and would give tools more flexibility to avoid false positives when custom strings like CSS variable names may also be present in the full value.
Hmm, that's a good point. And I like the idea of being specific about the scope of the regex. Then we'd have three options for matches objects: the two regexes and keywords.
I suppose the hard part is naming them. 😆 What do you think of regex_value and regex_token?
What do you think of
regex_valueandregex_token?
Sounds good to me. 👍
I predict people will have trouble keeping them straight. As an exercise,
can you write down the actual definitions that will be in
compat-data-schema.md so that we clarify this distinction before the
confusion arrises. And maybe this exercise churns up better names. (I'm not
saying they're bad. I just believe there's always something better no
matter how good the current thing is.)
Joe Medley | Technical Writer, Chrome DevRel | [email protected] |
816-678-7195
If an API's not documented it doesn't exist.
On Wed, Feb 20, 2019 at 9:21 AM Tony Ross notifications@github.com wrote:
What do you think of regex_value and regex_token?
Sounds good to me. 👍
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mdn/browser-compat-data/issues/3366#issuecomment-465673928,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH0vi7_1z8dX3qJWeHm-W78y3hxBKVBDks5vPYQVgaJpZM4afQ21
.
@jpmedley, I'd probably write up the currently discussed allowed properties of the matches object something like this:
matchesA matches object may have one of the following properties (in order of preference):
keywords: an array of literal strings that correspond to the feature.
In CSS, these may be, for example, a list of values for a property (matrix3d, perspective).
regex_to_be_named_1: a string containing a regular expression that matches a single token (i.e., text delimited by characters that are excluded from the text to be matched) corresponding to the feature.
For example, in CSS, these may be regular expressions that match component value types (1px, #00ff00, 2fr).
regex_to_be_named_2: a string containing a regular expression that matches a value corresponding to the feature.
For example, in CSS, these may be regular expressions that match declaration values (1px solid #00ff00) or media query expressions (max-width: 1000px).
Having written this out, I can see a few more possibilities for naming these:
keywords, regex_token, regex_valuekeywords, regex_token, regex (showing increasing generality)keywords, regex_word, regex_value (borrowing from postcss-value-parser)Or we could go really narrow and name for the specific scope: component_value_keywords, component_value_regex, value_regex, media_query_keywords, media_query_expression_regex, etc. I don't love this idea—it comes at the expense of having a lot more options for the matches object and partially reiterating the hierarchy of the data itself—but I thought I'd mention it for completeness.
Thanks for indulging me on this. I myself am having trouble with the
distinction and I have to review PRs on behalf of Google. The extra clarity
on day 1 will save a lot of back and forth in the future.
I think the examples might look too much alike. This concerns me because I
expect that many decisions about what to include will be based on too brief
a glance at the definitions and not enough thought about what they mean. I
don't often recommend more words as the solution to a writing problem, but
this is one of those cases. I would change the first example to something
like: "for example, 1px. Other examples include #00ff00 or 2fr." This
visual difference will make it hard to mix up the two definitions.
Joe Medley | Technical Writer, Chrome DevRel | [email protected] |
816-678-7195
If an API's not documented it doesn't exist.
On Thu, Feb 21, 2019 at 4:39 AM Daniel D. Beck notifications@github.com
wrote:
@jpmedley https://github.com/jpmedley, I'd probably write up the
currently discussed allowed properties of the matches object something
like this:
matchesA matches object may have one of the following properties (in order of
preference):-
keywords: an array of literal strings that correspond to the feature.
In CSS, these may be, for example, a list of values for a property (
matrix3d, perspective).
-regex_to_be_named_1: a string containing a regular expression that
matches a single token (i.e., text delimited by characters that are
excluded from the text to be matched) corresponding to the feature.For example, in CSS, these may be regular expressions that match
component value types (1px, #00ff00, 2fr).
-regex_to_be_named_2: a string containing a regular expression that
matches a value corresponding to the feature.For example, in CSS, these may be regular expressions that match
declaration values (1px solid #00ff00) or media query expressions (max-width:
1000px).
Having written this out, I can see a few more possibilities for naming
these:
- (as before) keywords, regex_token, regex_value
- keywords, regex_token, regex (showing increasing generality)
- keywords, regex_word, regex_value (borrowing from
postcss-value-parser)Or we could go really narrow and name for the specific scope:
component_value_keywords, component_value_regex, value_regex,
media_query_keywords, media_query_expression_regex, etc. I don't love
this idea—it comes at the expense of having a lot more options for the
matches object and partially reiterating the hierarchy of the data
itself—but I thought I'd mention it for completeness.—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/mdn/browser-compat-data/issues/3366#issuecomment-465985057,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AH0vi2EiP18eAwaiM9TCSWrQYjtaWEhcks5vPpNogaJpZM4afQ21
.
I'm happy to discuss this stuff in advance, @jpmedley. LIke you, I'd rather find problems at this stage than after we've added to the schema.
This concerns me because I expect that many decisions about what to include will be based on too brief a glance at the definitions and not enough thought about what they mean.
Yep, that's a good point and we can certainly refine the wording here. On top of that, I plan to cross reference real data, once we have some.
Also, on the reviewing side of things, it should help that we'll require tests for regular expressions. That will force submitters to illustrate the regex's scope with some example input to match against (and _not_ match against).
In any case, I'd like your position on this: is this OK to proceed to a PR? If you think that there is a distinction between the two regex forms that I just haven't written out well enough, then the answer is probably yes (and we can work out the final wording in the PR). If you think we have not yet made the case that there is or should be a distinction, then some more discussion is probably in order.
If you plan to change the wording of the component example, and I hope you are, I'd like to see that. If you plan to reflect it in the PR that will be fine.
OK, thank you! I definitely plan to improve the wording on the example (in fact, I was thinking that the examples might be helped by showing the actual regexes, but I didn't want to get bogged down in writing regexes for non-existent features). And of course, I'll be interested in your feedback on the PR, when it's ready.
And unless there are no further questions at this stage, I think the next step is to get started on that PR. Thanks everyone. 😄
A heads up for anyone following this issue: I've opened PR #3631 to carry out the proposal I put forward before.
The schema changes from #3631 landed. Thanks so much for making that happen, @ddbeck :+1:
@eduardoboucas has a list of features that he thinks require matches (transforms in his project). See https://github.com/eduardoboucas/compat-report/issues/4. Should we file a BCD issue so that we can work on getting matches into our data? Do you have a different list for WebHint, @antross? How can we prioritize these?
Fantastic! 🎉
We're not tracking anything for webhint that's not already covered by the list @eduardoboucas put together, so that seems like the right one to base an issue on.
I've filed webhintio/hint#2114 to track work to start consuming the data in webhint.
Thanks @antross! I've filed https://github.com/mdn/browser-compat-data/issues/3709 for adding more matches. It is based on the issue by @eduardoboucas.
Closing this issue.
Most helpful comment
OK, thank you! I definitely plan to improve the wording on the example (in fact, I was thinking that the examples might be helped by showing the actual regexes, but I didn't want to get bogged down in writing regexes for non-existent features). And of course, I'll be interested in your feedback on the PR, when it's ready.
And unless there are no further questions at this stage, I think the next step is to get started on that PR. Thanks everyone. 😄