Just curious: does three. js utilize the optimizations for calculating square roots described here:
http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/
?
I believe this has more to do with optimizing compilers and the assembler
level instructions interractions with math coprocessors. It would probably
be more apt to ask if
On Sun, Mar 23, 2014 at 10:10 AM, loldrup [email protected] wrote:
Just curious: does three. js utilize the optimizations for calculating
square roots described here:http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/
?
Reply to this email directly or view it on GitHubhttps://github.com/mrdoob/three.js/issues/4601
.
@loldrup nope. these kind of optimizations were more relevant in the 486 days.
I got some speed using this
You can override Math directly.
// This is usually bad to do, but make them global and reusable for speed.
var __k = new ArrayBuffer(4),
__fl= new Float32Array(__k),
__ui= new Uint32Array(__k),
__x2= 0,
__y= 0;
// Override the sh** function directly
Math.sqrt = function(x){
//__x2 = 0.5 * (__fl[0] = x);
__fl[0] = x;
__ui[0] = (0x5f375a86 - (__ui[0] >> 1));
__y = __fl[0];
//return (__y * ( 1.5 - ( __x2 * __y * __y ) ))*x;
return __y*x;
}
And these also helped.
Just place before anything ThreeJS does.
var PI_FLOAT = 3.14159265;
var PIBY2_FLOAT = 1.5707963;
Math.atan2 = function(y,x){
if ( x == 0 ){
if ( y > 0 ) return PIBY2_FLOAT;
if ( y == 0 ) return 0;
return (-1 * PIBY2_FLOAT);
}
var atn;
var z = y/x;
if ( Math.abs( z ) < 1.0 ){
atn = z/(1.0 + 0.28*z*z);
if ( x < 0 ){
if ( y < 0 ) return (atn - PI_FLOAT);
return atn + PI_FLOAT;
}
}else{
atn = PIBY2_FLOAT - z/(z*z + 0.28);
if ( y < 0 ) return (atn - PI_FLOAT);
}
return atn;
}
Math.abs = function(value) {
value = +value;
return value < 0 ? -value : value;
}
Math.floor = function(value){
value = +value;
return ~~value === value ? value : (value > 0) ? ~~value : (~~value - 1);
}
Math.ceil = function(value){
value = +value;
return ~~value === value ? value : (value > 0) ? (~~value + 1) : ~~value;
}
Math.round = function(value) {
value = +value;
return value + (value < 0 ? -0.5 : 0.5) >> 0;
}
Math.sign = function(value) {
value = +value;
return value ? value < 0 ? -1 : 1 : 0;
}
Math.acos = function(x) {
let negate = x ;
x = Math.abs(x);
let ret = -0.0187293;
ret = ret * x;
ret = ret + 0.0742610;
ret = ret * x;
ret = ret - 0.2121144;
ret = ret * x;
ret = ret + 1.5707288;
ret = ret * Math.sqrt(1.0-x);
ret = ret - 2 * negate * ret;
return negate * 3.14159265358979 + ret;
}
You can also improve the SQRT function by the following. Fewer instructions is always better.
// This is usually bad to do, but make them global and reusable for speed.
var __k = new ArrayBuffer(4),
__fl= new Float32Array(__k),
__ui= new Uint32Array(__k);
// Override the sh** function directly
Math.sqrt = function(x){
__fl[0] = x;
__ui[0] = (0x1fbb4f30 + (__ui[0] >> 1));
return __fl[0];
}
Some such "optimizations", such as the Quake square root, come at a cost of accuracy. And modern processors are more optimized to compute exact square root fast, than the mentioned 486 and other Quake-era processors.
Quoting the Wikipedia article:
The algorithm generates reasonably accurate results using a unique first approximation for Newton's method; however, it is much slower and less accurate than using the SSE instruction rsqrtss on x86 processors also released in 1999.
@emanuelfludd do you have performance numbers?
Square Root Approximation Visual Test
Of course it is not an accurate number. It's an approximation.
I do not have performance numbers.
EliasHasle - Do you have performance numbers on native Square Root?
Also. Where in the code does the v8 engine convert Square Root from the Math object in Javascript to the C++ compiled ASM result.
For example in C I can write this:
/*
* SSE2 Optimized INTEL Square Root Method
*/
double asquare(register double m){
asm ("sqrtsd %1, %0" : "=x" (m) : "x" (m));
return m;
}
And that will give me the Square Root calculated via the processors SSE2 instruction set additions.
But it is not clear where in V8 this is all happening. And I cannot verify. Any help appreciated :)
Where in the code does the v8 engine convert Square Root from the Math object in Javascript to the C++ compiled ASM result.
Here you have some pointers:
https://github.com/v8/v8/blob/master/src/builtins/math.tq#L212-L219
https://github.com/v8/v8/blob/master/src/compiler/backend/arm64/instruction-selector-arm64.cc#L1211
https://github.com/v8/v8/blob/master/src/compiler/backend/arm64/code-generator-arm64.cc#L1416-L1418
https://github.com/v8/v8/blob/master/src/compiler/backend/x64/code-generator-x64.cc
Also:
https://cs.chromium.org/chromium/src/v8/src/codegen/ia32/assembler-ia32.cc?dr=C&sq=package:chromium&g=0&l=129
https://cs.chromium.org/chromium/src/v8/src/codegen/x64/assembler-x64.cc?sq=package:chromium&dr=C&g=0&l=75
I do not have performance numbers.
That's unfortunate. Unless we have some numbers we can't start considering adopting these optimisations.
EliasHasle - Do you have performance numbers on native Square Root?
No. And I have nothing to add regarding JS implementations either. Just recalling from reading about this before, that the utility of square root approximations is much less than in the Quake days, because of optimized sqrt
cirquits. The Quake square root does multiple operations to calculate an approximate square root, whereas modern circuits basically do the accurate calculation in one operation (not invoking an interpolation algorithm or Newton's method or anything, as would be necessary in the old days). Now I'm on thin ice regarding CPU implementation too, so maybe I should just shut my keyboard.
Dear Mr. Doob:
I cannot with sound conscience, as a scientist, provide you results of experiments of said optimizations without first being confided with the tools and specifications of testing rig/requirements. There are varying outcomes depending upon testing environment.
I can, however, provide you with the exact level of error you will achieve when using this specific approximation, as I outlined previously in the above method code.
I provide you this proof, as it is a worthy scientific cause to investigate further. Processing an arithmetic Bitshift is always going to be faster, cycle wise, than a more intense calculation such as Square Root.
Whether or not the compiler has pushed the workload off to the Microcoded implementation is besides the point of this exercise.
Bitshift >> is fast.
Try using https://jsperf.com/ to compare native vs your proposed bitwise based methods.
Thank you so much for the V8 insights! Very useful information.
I wrote this small little program to test the speed. I suggest anyone run this and consider the underlying hardware implementations. Systems with SSE2 will most likely pick up more speed on native.
// This is usually bad to do, but make them global and reusable for speed.
var __k = new ArrayBuffer(4),
__fl= new Float32Array(__k),
__ui= new Uint32Array(__k),
itr=66666666;
// Override the sh** function directly
console.log("Processing "+itr+" iterations of Square Root.");
var sqrt2 = function(x){
__fl[0] = x;
__ui[0] = 0x1fbb4f30 + (__ui[0] >> 1);
return __fl[0];
}
var start = +new Date();
var i=0;
var res = 0;
for(;i<itr;i++){
res = Math.sqrt(i);
}
var end = +new Date();
console.log("Native Square Root Speed - "+(end-start)+" ms");
var start2 = +new Date();
var i=0;
var res = 0;
for(;i<itr;i++){
res = sqrt2(i);
}
var end2 = +new Date();
console.log("Approximation Speed - "+(end2-start2)+" ms");
/*
OUTPUT AS PROVIDED ON MY SYSTEM in NODEJS
Processing 66666666 iterations of Square Root.
Native Square Root Speed - 1794 ms
Approximation Speed - 308 ms
*/
Chrome 76, MacOS 10.14.6
Processing 66666666 iterations of Square Root.
Native Square Root Speed - 822 ms
Approximation Speed - 867 ms
Safari 12.1.2 , MacOS 10.14.6
Processing 66666666 iterations of Square Root.
Native Square Root Speed - 194 ms
Approximation Speed - 1473 ms
Fascinating! The results likely vary depending upon architecture.
My runtime was: NodeJS v8.16.0, Debian 9.6(x86-64), Intel Pentium J2900 @ 2.41Ghz x 4, 3.7 GiB Ram
This is that approximation at higher ranges.
ChromeOS 76, Pixelbook
Processing 66666666 iterations of Square Root.
Native Square Root Speed - 1321 ms
Approximation Speed - 1374 ms
I'm still getting this solid result from: NodeJS v8.16.0, Debian 9.6(x86-64), Intel Pentium J2900 @ 2.41Ghz x 4, 3.7 GiB Ram
Processing 666666666 iterations of Square Root.
Native Square Root Speed - 17958 ms
Approximation Speed - 3018 ms
I can, however, provide you with the exact level of error
@emanuelfludd what's the maximum error for low numbers? It's not clear from your graph but it looks like ~180^2 might be off by around +5?
Chrome 76 (V8 7.6.303.29) 64-bit on Windows 10. Intel Core i7 8th Gen:
Native Square Root Speed - 672 ms
Approximation Speed - 756 ms
(Node 12.0.0 runs the older V8 7.4.288.21)
@emanuelfludd I tried to google the constant 0x1fbb4f30, and nothing was returned. Maybe Google fails me. Or is there some original research on that one?
With such different benchmark results and large errors, I am not sure I see where to use this function in three.js.
Yeah. I don't want to say where the constant came from. But it works. Better. If you disregard the supposed constraints of this Javascript runtime environment, which obviously has widely varying efficiency amongst different platforms and installations.
Here is a clip of me running this on my desktop here. The test.
From My Environment This Is The Result
I don't want to say where the constant came from.
In the Wikipedia article on the related fast inverse square root (or "Quake invSqrt"), there are mentions of researchers who have worked on optimizing the integer constant for minimal error. If you need a "legit" optimal constant for sqrt2
, I suppose you may follow a similar procedure. (A better article on square root algorithms, including approximate ones, is https://en.wikipedia.org/wiki/Methods_of_computing_square_roots .)
But it works. Better.
That is debatable. Maybe your environment just doesn't exploit capabilities that exist in your and all other modern CPUs.
@emanuelfludd Please, try it in other browsers, other machines. If it only runs faster in your setup but nowhere else, there isn't much we can do.
Debatable? Going from 2billion polys to 12 billion polys is not debatable.
You asked for numbers. I kindly provided.
You asked for more numbers. I kindly provided.
You asked where I derived my methods. I won't bother expending any more energy where obviously feelings override experiment.
Science is not democratic. Allow anyone to try this themselves. In the same environment I described. And they will arrive at the same result. That is science.
I am a Freemason. We know geometry.
Science is not democratic. Allow anyone to try this themselves. In the same environment I described.
three.js is used in many different environments and we need to ensure that any optimisations that we apply are optimisations for all the most commonly used environments - for three.js, that's mainly JavaScript running in the browser.
You have shown that this optimisation works in a node environment, however that's probably the least common use case for three.js. We would need to demonstrate that this is an improvement in browser for at least Chrome and Firefox, and so far the tests that people have run don't show that. We haven't run any tests at all for mobile and we would need to do that too. So far, the numbers from the tests people have run don't appear to warrant further effort here though.
@emanuelfludd what's the maximum error for low numbers? It's not clear from your graph but it looks like ~180^2 might be off by around +5?
The plot seems to show only a few selected numbers, not necessarily capturing the maximum error. Note that @emanuelfludd's implementations omit the Newton's method step found in the Quake III version, instead sticking only to the first guess.
The plot graph contains all the data points from 0 to 82,849.
5.323692 × 10^8 <-- Is the constant.
You'll have to delve into segmented functions to understand.
Going from 2billion polys to 12 billion polys
“... in your specific test environment that is not representative of the real environments that ThreeJS is used in.”
I appreciate you effort here, but if you need these performance gains that only work in your runtime environment _and comes at a cost of all others_, please proceed as you did originally, and overwrite Math.sqrt()
.
I just ran the test from this very web browser I am using. And not inside NodeJS.
That exact code outputs the following:
Processing 66666666 iterations of Square Root.
Native Square Root Speed - 1010 ms
Approximation Speed - 630 ms
FireFox 60.9.0esr (64-bit)
@emanuelfludd What if you divide the times by the respective number of correct bits? 😉
It is an approximation. That is in the name. This is a known known.
Minus the snarky, nasty, unhelpful comments.
This is the result from Chromium.
Processing 66666666 iterations of Square Root.
Native Square Root Speed - 4572 ms
Approximation Speed - 3882 ms
Version 73.0.3683.75 (Developer Build) built on Debian 9.8, running on Debian 9.6 (64-bit)
There's a level of unprofessionalism by many on this website. I do not wish to continue discussion.
The constant is without a single Newtonian iteration. Which, given its relative and still useful proximation, I am able to render 3D. I will not bother ever contributing to any projects here if this is how this ship is run.
Very disappointing.
There's a level of unprofessionalism by many on this website.
I think you will need to clarify that, both who you mean and how they are unprofessional. Just to be clear, I am just another user (and minor contributor), not in any decision-making position in the project. I think the questions and objections raised by the maintainers (and @looeee and me) are perfectly valid and reasonable, and neither rude nor unprofessional.
Regardless of whether your method of approximating square roots can be fast/useful, it cannot replace the native square root for all purposes, since it is inexact. Maybe it will be more welcome in the Resources category on the forum, though. It is definitely interesting. :)
Most helpful comment
three.js is used in many different environments and we need to ensure that any optimisations that we apply are optimisations for all the most commonly used environments - for three.js, that's mainly JavaScript running in the browser.
You have shown that this optimisation works in a node environment, however that's probably the least common use case for three.js. We would need to demonstrate that this is an improvement in browser for at least Chrome and Firefox, and so far the tests that people have run don't show that. We haven't run any tests at all for mobile and we would need to do that too. So far, the numbers from the tests people have run don't appear to warrant further effort here though.