Three.js: THREE.PropertyBinding.parseTrackName() throws Error if bone name is Japanese

Created on 5 Jun 2016 · 25Comments · Source: mrdoob/three.js

Description of the problem

THREE.PropertyBinding.parseTrackName() throws Error when I attempt to do Skeletal Animation with THREE.AnimationMixer if bone names are Japanese.

I think this problem is caused by that the regular expression in THREE.PropertyBinding.parseTrackName() doesn't match Japanese characters.

var re = /^(([\w]+\/)*)([\w-\d]+)?(.([\w]+)([([\w\d[]_.:- ]+)])?)?(.([\w.]+)([([\w\d[]_. ]+)])?)$/;
var matches = re.exec(trackName);

if( ! matches ) {
throw new Error( "cannot parse trackName at all: " + trackName );
}

The root issue seems that \w doesn't match Japanese character.

On console.

var re = /^(([\w]+\/)*)([\w-\d]+)?(.([\w]+)([([\w\d[]_.:- ]+)])?)?(.([\w.]+)([([\w\d[]_. ]+)])?)$/;
-> undefined
re.exec('.bone[Armature.DEF_cog].position')
-> [".bone[Armature.DEF_cog].position", "", ...]
re.exec('.bone[あいうえお].position')
-> null

I know names shouldn't be Japanese character but I'd be very pleased if it supports because generally MMD bone names are Japanese.

Three.js version

[ ] Dev
[x] r77
[ ] ...

Browser
[x] All of them (maybe)
[ ] Chrome
[ ] Firefox
[ ] Internet Explorer

OS
[x] All of them (maybe)
[ ] Windows
[ ] Linux
[ ] Android
[ ] IOS

Hardware Requirements (graphics card, VR Device, ...)

Source

takahirox

All 25 comments

/ping @tschw

mrdoob on 5 Jun 2016

Does /\w+/u match a japanese word? Then should be as simple as adding the unicode flag (note the u) to the RegExp...

tschw on 8 Jun 2016

Sadly no on my Chrome

/\w+/u.exec('abc')
-> ["abc"]
/\w+/u.exec('あいう')
-> null

takahirox on 8 Jun 2016

Is this standard compliant or a bug in Chrome?

tschw on 8 Jun 2016

Prolly it's standard.
I got the same result on Firefox nightly.

takahirox on 8 Jun 2016

This'll help?

https://gist.github.com/ryanmcgrath/982242

It seems like we need uxxxx with target character code range.

takahirox on 8 Jun 2016

Easier would be to do an inverted search for the delimiter, I guess :).

tschw on 10 Jun 2016

E.g. instead of \w+\/ go like [^/]+\/. That help?

tschw on 10 Jun 2016

Woks.

/[^/]+\//.exec('abc/')
-> ["abc/"]
/[^/]+\//.exec('あいう/')
-> ["あいう/"]

So I'm considering how that regular expression should be....

https://github.com/mrdoob/three.js/blob/master/src/animation/PropertyBinding.js#L546

var re = /^(([\w]+\/)*)([\w-\d]+)?(.([\w]+)([([\w\d[]_.:- ]+)])?)?(.([\w.]+)([([\w\d[]_. ]+)])?)$/;

If I can rewrite this expression with inverted one, the issue would be solved...

([([\w\d[]_.:- ]+)])

takahirox on 14 Jun 2016

@takahirox

If I can rewrite this expression with inverted one, the issue would be solved...

It might be possible to simplify that reg ex before rewriting it and that should solve all your problems. Reading the comment in the source code, I'd go like:

var re = /([^/.[]+)(?:[/.[]([^/.[\]]+)\]?)?(?:[/.[]([^/.[\]]+)\]?)?(?:[/.[]([^/.[\]]+)\]?)?/;

re.exec('foo.bar.baz')
// Array [ "foo.bar.baz", "foo", "bar", "baz", undefined ]
re.exec('foo[bar].baz')
// Array [ "foo[bar].baz", "foo", "bar", "baz", undefined ]
re.exec('foo/bar[baz]')
// Array [ "foo/bar[baz]", "foo", "bar", "baz", undefined ]
re.exec('foo/bar[baz].boo')
// Array [ "foo/bar[baz].boo", "foo", "bar", "baz", "boo" ]

Note that (?:[/.[]([^/.[\]]+)\]?)? just repeats three times. It allows some false positives, but I guess might be close enough and (though still cryptic by nature) is a lot easier to grasp than the generated code.

Translation of the whole thing:

Capture a string of non-delimiters,
optionally followed by a single delimiter followed by a captured string of non-delimiters and an optional, non-captured, closing bracket,
optionally followed by a single delimiter followed by a captured string of non-delimiters and an optional, non-captured, closing bracket, and
optionally followed by a single delimiter followed by a captured string of non-delimiters and an optional, non-captured, closing bracket.

Oh crap, we'd actually need yet another repetition, but I'm sure you get the principle. Or maybe piecemeal parsing would be the better option (use /g suffix on the RegExp literal and call .exec repeatedly - I did this for WebGL uniform names elsewhere in threejs).

HTH to turn this issue one into a PR :)

tschw on 14 Jun 2016

FYI this is a great tool for testing a regex http://regexr.com/ since no sane human can understand a regex that has some form of complexity to it

GGAlanSmithee on 15 Jun 2016

👍1

As long as it's for testing... This issue actually exists _because_ a tool has been used to generate the super tight and pretty much black-boxed expression.

A cryptic _notation_ can never be an excuse for cryptic _logic_, though can hide it quite well. No tool can replace thorough understanding of what you are doing, and once you do, testing trivial code becomes less important. IOW, when I post stuff here, I know what I'm doing - more than any tool possibly ever can :).

tschw on 16 Jun 2016

Thanks guys.

I made a new expression. Looks good to you?

var re = /^(?:(.+)\/)?([^/.]+)?(?:\.([^/.\[]+)(?:\[([^/\[]+)\])?)?\.(?:([^/.\[]+)(?:\[([^/\[]+)\])?)$/;

According to the code, we seem to need to extract

parentName (can be null)
nodeName (can be null)
objectName (optional)
objectIndex (optional)
propertyName
propertyIndex (optional)

I didn't simplify and just made the use of inverted search because
it was hard to distinguish *Name and *Index with simplified expression.

This expression works correctly for the following examples written in the code as comment.

nodeName.property
nodeName.property[accessor]
nodeName.material.property[accessor]
uuid.property[accessor]
uuid.objectName[objectIndex].propertyName[propertyIndex]
parentName/nodeName.property
parentName/parentName/nodeName.property[index]
.bone[Armature.DEF_cog].position

BTW this expression doesn't accept [] in the *Index while the current expression does, like this

nodeName.propertyName[something[index]]
// the current expression -> matches
// new expression -> no match

Is that ok?

takahirox on 1 Jul 2016

You should get rid of the wildcard prefix match: .+ eats everything. If it works at all, it means it has to backtrack a lot to get the rest of the input covered and thus performs rather poorly.

That taken care of, and given all tests pass, I'd say it's an improvement.

I personally wouldn't try to model grammars with regular expressions and just use them for forward matching. So I'd look at the whole problem as decomposing a path with different types of delimiters (and optional ] characters, matching [/.[]([^/.[\]]+)\] in a loop pretty much). This will not be super strict, but therefore end up both fast and easily maintainable. Semantic association would then happen in the rest of the code. I guess that route might have ended up easier on your brain, and anyone's who might come after you ;-).

tschw on 1 Jul 2016

I think the problem would be simpler if we first extract directoryName, like this.

var index = trackName.lastIndexOf( '/' );
var directoryName = trackName.slice( 0, Math.max( 0, index ) );
trackName = trackName.slice( index + 1 );

Now, we don't need to care '/' delimiter.

Then, the regex will be,

var re = /^([^.[\]]+)?(?:\.([^.[\]]+)(?:\[([^[\]]+)\])?)?\.(?:([^.[\]]+)(?:\[([^[\]]+)\])?)$/;

I think this's good enough because it isn't too complex.

takahirox on 2 Jul 2016

👍1

Another regex option.

If we use simple regex loop style,
the code with the following semantic association would be like this

var re = /(?:^|\.)([^.[\]]+)(?:\[([^[\]]+)\])?/g;

var directoryName, nodeName, objectName, objectIndex, propertyName, propertyIndex;

var index = trackName.lastIndexOf( '/' );
directoryName = trackName.slice( 0, Math.max( 0, index ) );
trackName = trackName.slice( index + 1 );

var matches = [];
var match;

while ( ( match = re.exec( trackName ) ) !== null ) {

    matches.push( match );

}

if ( matches.length === 0 ) {

    throw new Error( "cannot parse trackName at all " + trackName );

}

if ( trackName.indexOf( '.' ) === 0 ) {

    nodeName = null;

} else {

    nodeName = matches.shift()[ 1 ];

}

if ( matches.length === 0 || matches.length > 2 ) {

    throw new Error( "grammar error " + trackName );

}

if ( mtches.length === 1 ) {

    objectName = null;
    objectIndex = null;

} else {

    var objectMatch = matches.shift();
    objectName = objectMatch[ 1 ];
    objectIndex = objectMatch[ 2 ];

}

var propertyMatch = matches.shift();
propertyName = propertyMatch[ 1 ];
propertyIndex = propertyMatch[ 2 ];

Because dot(.) can be in [] like this

.bone[Armature.DEF_cog].position

we should explicitly indicate [ ] in regex I think.

Anyway, which regex do you prefer?

takahirox on 2 Jul 2016

The first one

advantage: no semantic association, more strict
disadvantage: complex regex

The second one

advantage: easy regex
disadvantage: semantic association necessity, less strict

takahirox on 2 Jul 2016

@takahirox
Why don't you use one first regex to identify the position of the first delimiter and then .exec the repetitive part from there? No need to slice, you can just set the current position of the RegExp object...

semantic association necessity

I guess it might be possible to greatly simplify this stuff by looking at the rest of the code: Most of it should be simple property traversal. There are some special cases, but I'm not even sure you have to look at the delimiters at all; the number of path segments and the strings may do to identify them.

Then there is an additional advantage, namely being able to process property paths with any number of segments. Consider animation code in a custom vertex shader, for instance...

tschw on 5 Jul 2016

Then there is an additional advantage, namely being able to process property paths with any number of segments. Consider animation code in a custom vertex shader, for instance...

Should PropertyBinding accept the path with any number of segments?
I think we should first specify the path strings PropertyBinding can accepts.
I expected what it should accept are only the variations written in the comment.

    // matches strings in the form of:
    //    nodeName.property
    //    nodeName.property[accessor]
    //    nodeName.material.property[accessor]
    //    uuid.property[accessor]
    //    uuid.objectName[objectIndex].propertyName[propertyIndex]
    //    parentName/nodeName.property
    //    parentName/parentName/nodeName.property[index]
    //    .bone[Armature.DEF_cog].position

If it accepts only the variations above, I prefer

var re = /^([^.[\]]+)?(?:\.([^.[\]]+)(?:\[([^[\]]+)\])?)?\.(?:([^.[\]]+)(?:\[([^[\]]+)\])?)$/;

If it accepts the path with any number of segments, regex should be like this

var re = /(?:^|\.)([^.[\]]+)(?:\[([^[\]]+)\])?/g;

takahirox on 6 Jul 2016

👍1

@tschw
@mrdoob

So far I'm gonna make PR with the regex

var re = /^([^.[\]]+)?(?:\.([^.[\]]+)(?:\[([^[\]]+)\])?)?\.(?:([^.[\]]+)(?:\[([^[\]]+)\])?)$/;

to support Japanese characters.

And let's make another PR with the following regex when we wanna let PropertyBinding accept the path strings with any number of segments

var re = /(?:^|\.)([^.[\]]+)(?:\[([^[\]]+)\])?/g;

How does this sound to you guys?

takahirox on 10 Jul 2016

@takahirox
Sounds fantastic!

My focus was not sooo much on the added feature (although it's really nice to have) but to achieve a straightforward implementation (guessing the segmentation approach will lead us there). Although the first regular expression is technically rather simple - just repetitive - it's not all too easy to read and to actually notice it :).

tschw on 11 Jul 2016

👍1

If @tschw is happy, I'm happy 😊

mrdoob on 12 Jul 2016

Thanks guys.

BTW do you know what these lines are for?

https://github.com/mrdoob/three.js/blob/1c03bd6687c53230c9adab06db735993b05d4219/src/animation/PropertyBinding.js#L555-L559

if ( matches.index === re.lastIndex ) {

    re.lastIndex++;

}

I can't understand why these lines are necessary...

takahirox on 13 Jul 2016

Nope. Looks like leftover stuff that can be removed. Is .lastIndex respected without /g flag on the RegExp?

tschw on 18 Jul 2016

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/lastIndex

I think we should remove the lines.

takahirox on 18 Jul 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Avoid setting crossOrigin when dealing with data urls

konijn · 3Comments

sourceFile is no longer used in 76, no?

makc · 3Comments

RawShaderMaterial does not use opacity

clawconduce · 3Comments

Round / circular points via Three.PointsMaterial with WebGL

Bandit · 3Comments

What is three.js's normal map handedness convention?

donmccurdy · 3Comments

Three.js: THREE.PropertyBinding.parseTrackName() throws Error if bone name is Japanese

Description of the problem

Three.js version

Browser

OS

Hardware Requirements (graphics card, VR Device, ...)

All 25 comments

Related issues