I have a i7, 16gb RAM, 2gb Nvidia card.
What settings can I change on DepthMap to speed things up a bit, assuming a lower resolution output?
While slow/fast are pretty relative terms I agree that it might need some optimization. I run set of photographs through Zephyr 3D (free version) and Meshroom and it took significantly more time with Meshroom. I didn't measure exact time but I can run the same set again on both is it is any use.
I am using i7-5820K, 48GB RAM, Titan X (Maxwell) 12GB.
I used the 3D Zephyr example files (the cherub statue) and after two hours the depthmap bar had hardly moved. I get a warning "Low GPU memory volume step Z: 3"
I'm keen to play with the attributes to speed things up, and not fussed about the resolution of the final mesh at the moment.
Problem, I don't understand most of the attributes and haven't found a decent explanation of how they affect the complexity of the computation.
You get some kind of documentation when you run each of steps in command line - however it's not awfully more than what you get in UI.
I have the same question. What can I do to speed up the _DepthMap_ stage?
Please share if you learn anything more about this.
My machine is a laptop with both integrated graphics and a discrete NVidia GTX960M, 2GB GDDR5. I was able to get AliceVision to utilise the faster GPU for the DepthMap stage, which it would not do by default.
I want to acheive much shorter reconstruction times and I am willing to trade some quality for performance, if I just know what settings to tweak.
I posted my question to the AliceVision Google Group, but perhaps GitHub Issues is the more active forum.
Augment the downscale factor to directly reduce the precision.
Reduce the number of T cameras (sgmMaxTCams, refineMaxTCams) will directly reduce the computation time linearly, so if you change from 10 to 5 you will get a 2x speedup. A minimum value of 3 is necessary, 4 already gives decent results in many cases if the density of your acquisition process regular enough. The default value is necessary in large scale environment where it is difficult to have 4 images that cover the same area.
@fabiencastan Wow thank you.
While these informations may seem obvious, there is no documentation and the description (sgmMaxTCams : Semi Global Matching: Number of neighbour cameras.) is too short to really grasp the idea behind those parameters.
Yes, I know that I need to update most of the tooltips with more precision.
Good info from @fabiencastan. That did reduce the workload.
I have looked closer at the _DepthMap_ stage and to my surprise the GPU is mostly idle during these long calculations. This strikes me as very strange.
At the start of the DepthMap process the console output clearly shows that CUDA GPU is detected:
[14:54:06.662787][info] Supported CUDA-Enabled GPU detected.
[14:54:06.663779][info] Found 1 image dimension(s):
[14:54:06.663905][info] - [4032x3024]
[14:54:06.740162][info] Overall maximum dimension: [252x189]
[14:54:06.740661][info] Create depth maps.
number of CUDA devices: 1
0: GeForce GTX 960M
[14:54:06.742183][info] Number of GPU devices: 1, number of CPU threads: 8
[14:54:06.742678][info] PSSGM autoScaleStep: scale: 1, step: 1
[14:54:06.742678][info] PlaneSweepingCuda:
- nImgsInGPUAtTime: 42
- scales: 1
- subPixel: Yes
- varianceWSH:
CUDA device no 0 for 0
Device 0 memory - used: 367.884369, free: 1680.115601, total: 2048.000000
Device 0 memory - used: 379.884369, free: 1668.115601, total: 2048.000000
[14:53:01.226908][info] Compute depth map:
...
Each depth map calculation takes about 5 minutes and all cores of the CPU are under heavy load the whole time, but the GPU remains idle. The _Python_ process causes a short spike of GPU activity, but the _aliceVision_depthMapEstimation_ process holds the GPU activity at 0% the whole time.
I have monitored the GPU using Windows' Task Manager > Performance tab and a program called GPU-Z by TechPowerUp.
Shouldn't the monitoring show the GPU working hard during these calculations?
@adgbu see: alicevision/AliceVision#481
Most helpful comment
Augment the downscale factor to directly reduce the precision.
Reduce the number of T cameras (sgmMaxTCams, refineMaxTCams) will directly reduce the computation time linearly, so if you change from 10 to 5 you will get a 2x speedup. A minimum value of 3 is necessary, 4 already gives decent results in many cases if the density of your acquisition process regular enough. The default value is necessary in large scale environment where it is difficult to have 4 images that cover the same area.