...and non-standard.
Possible solution 1: Modify the voice stealing algo so that it respects the max poly parameter.
Possible solution 2: Rename the Max polyphony parameter to Scene polyphony and make it separately adjustable per scene, and also add a proper voice counter with actual max poly number (which would be scene A polyphony + scene B polyphony).
Bonus: Max polyphony should then also be streamed to patch rather than being a Surge instance parameter.
This is absolutely non-standard, no other synth (hardware or software) I know of works like this. When a voice is stolen, it should be stolen quickly. On the order of single digit milliseconds quickly.
Solution: Obviously for backwards compatibility reasons we cannot outright change how voice stealing works, but we can add voice stealing mode as an option in Menu, with two options: soft (current way) and hard (proper way).
Let's discuss!
Another aspect of polyphony, is that Surge has a limit of 64 in the UI. If this is a fixed hard limit in the engine, then there will be a harder 'take the weakest slot now' reallocation that is more of the conventional type you seek, and the user-defined limit is 'soft'.
If instances continue to be created dynamically beyond 64, then your suggestion carries more significance.
I should probably have looked at the code before saying this :)
Worth noting that the motivation for the user-defined polyphony limit, is to put a reasonable limit on CPU usage, so the solution should address that.
Surge has at most 64 voices available per scene as a compile time constraint. This is a compile time constant in the code.

Set that 3 to a 0 and see what happens
If that makes it seem more right we can make that an off be default option in 1.9
@baconpaul That's weird, I cannot hear any difference between having it at 3 or at 0.
In fact, I cannot hear any difference even if that enforcePolyphonyLimit method is completely commented out. Seems like everything is happening in softkillVoice(), except freeing the voice.
If this is true, that means point 2 from my opening post is not correct (I think I was just looking at softkillVoice() and assumed what happens based on that code). However point 1 is still quite valid. And I'd say it also probably makes sense to have 0 instead of 3 there anyways in enforcePolyphonyLimit...
Most helpful comment
Surge has at most 64 voices available per scene as a compile time constraint. This is a compile time constant in the code.