THREE.PropertyBinding.parseTrackName() throws Error when I attempt to do Skeletal Animation with THREE.AnimationMixer if bone names are Japanese.
I think this problem is caused by that the regular expression in THREE.PropertyBinding.parseTrackName() doesn't match Japanese characters.
var re = /^(([\w]+\/)*)([\w-\d]+)?(.([\w]+)([([\w\d[]_.:- ]+)])?)?(.([\w.]+)([([\w\d[]_. ]+)])?)$/;
var matches = re.exec(trackName);if( ! matches ) {
throw new Error( "cannot parse trackName at all: " + trackName );
}
The root issue seems that \w doesn't match Japanese character.
On console.
var re = /^(([\w]+\/)*)([\w-\d]+)?(.([\w]+)([([\w\d[]_.:- ]+)])?)?(.([\w.]+)([([\w\d[]_. ]+)])?)$/;
-> undefined
re.exec('.bone[Armature.DEF_cog].position')
-> [".bone[Armature.DEF_cog].position", "", ...]
re.exec('.bone[ใใใใใ].position')
-> null
I know names shouldn't be Japanese character but I'd be very pleased if it supports because generally MMD bone names are Japanese.
[ ] ...
[x] All of them (maybe)
[ ] Internet Explorer
[x] All of them (maybe)
/ping @tschw
Does /\w+/u match a japanese word? Then should be as simple as adding the unicode flag (note the u) to the RegExp...
Sadly no on my Chrome
/\w+/u.exec('abc')
-> ["abc"]
/\w+/u.exec('ใใใ')
-> null
Is this standard compliant or a bug in Chrome?
Prolly it's standard.
I got the same result on Firefox nightly.
This'll help?
https://gist.github.com/ryanmcgrath/982242
It seems like we need uxxxx with target character code range.
Easier would be to do an inverted search for the delimiter, I guess :).
E.g. instead of \w+\/ go like [^/]+\/. That help?
Woks.
/[^/]+\//.exec('abc/')
-> ["abc/"]
/[^/]+\//.exec('ใใใ/')
-> ["ใใใ/"]
So I'm considering how that regular expression should be....
https://github.com/mrdoob/three.js/blob/master/src/animation/PropertyBinding.js#L546
var re = /^(([\w]+\/)*)([\w-\d]+)?(.([\w]+)([([\w\d[]_.:- ]+)])?)?(.([\w.]+)([([\w\d[]_. ]+)])?)$/;
If I can rewrite this expression with inverted one, the issue would be solved...
([([\w\d[]_.:- ]+)])
@takahirox
If I can rewrite this expression with inverted one, the issue would be solved...
It might be possible to simplify that reg ex before rewriting it and that should solve all your problems. Reading the comment in the source code, I'd go like:
var re = /([^/.[]+)(?:[/.[]([^/.[\]]+)\]?)?(?:[/.[]([^/.[\]]+)\]?)?(?:[/.[]([^/.[\]]+)\]?)?/;
re.exec('foo.bar.baz')
// Array [ "foo.bar.baz", "foo", "bar", "baz", undefined ]
re.exec('foo[bar].baz')
// Array [ "foo[bar].baz", "foo", "bar", "baz", undefined ]
re.exec('foo/bar[baz]')
// Array [ "foo/bar[baz]", "foo", "bar", "baz", undefined ]
re.exec('foo/bar[baz].boo')
// Array [ "foo/bar[baz].boo", "foo", "bar", "baz", "boo" ]
Note that (?:[/.[]([^/.[\]]+)\]?)? just repeats three times. It allows some false positives, but I guess might be close enough and (though still cryptic by nature) is a lot easier to grasp than the generated code.
Translation of the whole thing:
Oh crap, we'd actually need yet another repetition, but I'm sure you get the principle. Or maybe piecemeal parsing would be the better option (use /g suffix on the RegExp literal and call .exec repeatedly - I did this for WebGL uniform names elsewhere in threejs).
HTH to turn this issue one into a PR :)
FYI this is a great tool for testing a regex http://regexr.com/ since no sane human can understand a regex that has some form of complexity to it
As long as it's for testing... This issue actually exists _because_ a tool has been used to generate the super tight and pretty much black-boxed expression.
A cryptic _notation_ can never be an excuse for cryptic _logic_, though can hide it quite well. No tool can replace thorough understanding of what you are doing, and once you do, testing trivial code becomes less important. IOW, when I post stuff here, I know what I'm doing - more than any tool possibly ever can :).
Thanks guys.
I made a new expression. Looks good to you?
var re = /^(?:(.+)\/)?([^/.]+)?(?:\.([^/.\[]+)(?:\[([^/\[]+)\])?)?\.(?:([^/.\[]+)(?:\[([^/\[]+)\])?)$/;
According to the code, we seem to need to extract
I didn't simplify and just made the use of inverted search because
it was hard to distinguish *Name and *Index with simplified expression.
This expression works correctly for the following examples written in the code as comment.
nodeName.property
nodeName.property[accessor]
nodeName.material.property[accessor]
uuid.property[accessor]
uuid.objectName[objectIndex].propertyName[propertyIndex]
parentName/nodeName.property
parentName/parentName/nodeName.property[index]
.bone[Armature.DEF_cog].position
BTW this expression doesn't accept [] in the *Index while the current expression does, like this
nodeName.propertyName[something[index]]
// the current expression -> matches
// new expression -> no match
Is that ok?
You should get rid of the wildcard prefix match: .+ eats everything. If it works at all, it means it has to backtrack a lot to get the rest of the input covered and thus performs rather poorly.
That taken care of, and given all tests pass, I'd say it's an improvement.
I personally wouldn't try to model grammars with regular expressions and just use them for forward matching. So I'd look at the whole problem as decomposing a path with different types of delimiters (and optional ] characters, matching [/.[]([^/.[\]]+)\] in a loop pretty much). This will not be super strict, but therefore end up both fast and easily maintainable. Semantic association would then happen in the rest of the code. I guess that route might have ended up easier on your brain, and anyone's who might come after you ;-).
I think the problem would be simpler if we first extract directoryName, like this.
var index = trackName.lastIndexOf( '/' );
var directoryName = trackName.slice( 0, Math.max( 0, index ) );
trackName = trackName.slice( index + 1 );
Now, we don't need to care '/' delimiter.
Then, the regex will be,
var re = /^([^.[\]]+)?(?:\.([^.[\]]+)(?:\[([^[\]]+)\])?)?\.(?:([^.[\]]+)(?:\[([^[\]]+)\])?)$/;
I think this's good enough because it isn't too complex.
Another regex option.
If we use simple regex loop style,
the code with the following semantic association would be like this
var re = /(?:^|\.)([^.[\]]+)(?:\[([^[\]]+)\])?/g;
var directoryName, nodeName, objectName, objectIndex, propertyName, propertyIndex;
var index = trackName.lastIndexOf( '/' );
directoryName = trackName.slice( 0, Math.max( 0, index ) );
trackName = trackName.slice( index + 1 );
var matches = [];
var match;
while ( ( match = re.exec( trackName ) ) !== null ) {
matches.push( match );
}
if ( matches.length === 0 ) {
throw new Error( "cannot parse trackName at all " + trackName );
}
if ( trackName.indexOf( '.' ) === 0 ) {
nodeName = null;
} else {
nodeName = matches.shift()[ 1 ];
}
if ( matches.length === 0 || matches.length > 2 ) {
throw new Error( "grammar error " + trackName );
}
if ( mtches.length === 1 ) {
objectName = null;
objectIndex = null;
} else {
var objectMatch = matches.shift();
objectName = objectMatch[ 1 ];
objectIndex = objectMatch[ 2 ];
}
var propertyMatch = matches.shift();
propertyName = propertyMatch[ 1 ];
propertyIndex = propertyMatch[ 2 ];
Because dot(.) can be in [] like this
.bone[Armature.DEF_cog].position
we should explicitly indicate [ ] in regex I think.
Anyway, which regex do you prefer?
The first one
The second one
@takahirox
Why don't you use one first regex to identify the position of the first delimiter and then .exec the repetitive part from there? No need to slice, you can just set the current position of the RegExp object...
semantic association necessity
I guess it might be possible to greatly simplify this stuff by looking at the rest of the code: Most of it should be simple property traversal. There are some special cases, but I'm not even sure you have to look at the delimiters at all; the number of path segments and the strings may do to identify them.
Then there is an additional advantage, namely being able to process property paths with any number of segments. Consider animation code in a custom vertex shader, for instance...
Then there is an additional advantage, namely being able to process property paths with any number of segments. Consider animation code in a custom vertex shader, for instance...
Should PropertyBinding accept the path with any number of segments?
I think we should first specify the path strings PropertyBinding can accepts.
I expected what it should accept are only the variations written in the comment.
// matches strings in the form of:
// nodeName.property
// nodeName.property[accessor]
// nodeName.material.property[accessor]
// uuid.property[accessor]
// uuid.objectName[objectIndex].propertyName[propertyIndex]
// parentName/nodeName.property
// parentName/parentName/nodeName.property[index]
// .bone[Armature.DEF_cog].position
If it accepts only the variations above, I prefer
var re = /^([^.[\]]+)?(?:\.([^.[\]]+)(?:\[([^[\]]+)\])?)?\.(?:([^.[\]]+)(?:\[([^[\]]+)\])?)$/;
If it accepts the path with any number of segments, regex should be like this
var re = /(?:^|\.)([^.[\]]+)(?:\[([^[\]]+)\])?/g;
@tschw
@mrdoob
So far I'm gonna make PR with the regex
var re = /^([^.[\]]+)?(?:\.([^.[\]]+)(?:\[([^[\]]+)\])?)?\.(?:([^.[\]]+)(?:\[([^[\]]+)\])?)$/;
to support Japanese characters.
And let's make another PR with the following regex when we wanna let PropertyBinding accept the path strings with any number of segments
var re = /(?:^|\.)([^.[\]]+)(?:\[([^[\]]+)\])?/g;
How does this sound to you guys?
@takahirox
Sounds fantastic!
My focus was not sooo much on the added feature (although it's really nice to have) but to achieve a straightforward implementation (guessing the segmentation approach will lead us there). Although the first regular expression is technically rather simple - just repetitive - it's not all too easy to read and to actually notice it :).
If @tschw is happy, I'm happy ๐
Thanks guys.
BTW do you know what these lines are for?
if ( matches.index === re.lastIndex ) {
re.lastIndex++;
}
I can't understand why these lines are necessary...
Nope. Looks like leftover stuff that can be removed. Is .lastIndex respected without /g flag on the RegExp?
No
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex
I think we should remove the lines.