While reviewing #30481 I noticed an unrelated problem in norm(::Number):
julia> norm(1.0, 0)
1.0
julia> norm(0.0, 0)
0.0
julia> norm(NaN, 0)
1.0
Surely norm(NaN, 0) should give NaN?
Isn't it norm(0.0, 0) that's the odd-man out? 0.0 ^ 0.0 is 1.0, which seems questionable to me, but given those floating point semantics, and the docstring definition:
norm(x::Number, p::Real=2)
For numbers, return \left( |x|^p \right)^{1/p}.
then it should be 1.0:
julia> 0.0 ^ 0.0 ^ (1/0.0)
1.0
julia> NaN ^ 0.0 ^ (1/0.0)
1.0
It's currently special-cased (introduced in https://github.com/JuliaLang/julia/pull/6057):
norm(x::Number, p::Real=2) = p == 0 ? (x==0 ? zero(abs(float(x))) : oneunit(abs(float(x)))) : abs(float(x))
The current norm(NaN, 0.0) follows the docstring definition.
julia> NaN^0.0^(1/0.0)
1.0
Or maybe the docstring should be changed? If norm(0.0, 0) is 0.0 or NaN, then I agree with norm(NaN, 0.0) == NaN
norm(0, 0) = 0 is correct. This is a standard definition, not up for debate.
The reason here is that "0 norm" is a bit funny, and the standard definition is essentially the limit p⟶ 0 of the number of nonzero-components of norm(x, p).x. So for x == 0 it gives zero. For any other finite value it is 1. There are a couple of funny cases:
For x == Inf what do you do? I think the right viewpoint is that for x == ±Inf you should take the x⟶±∞ limit of of norm(x, 0), which gives 1.
For an x of NaN, what do you do? The usual approach for any f(NaN) is to return NaN unless the result of f(x) is independent of the value of x, in which case you return that result. norm(x,0) is not independent of the value of x (it can be 0 or 1), so you should return NaN.
The reason here is that "0 norm" is a bit funny, and the standard definition is essentially the limit p⟶ 0 of
norm(x, p).
That is not the case though. The limit diverges whenever 2 or more elements are non-zero. It seems that our norm(x,0) is the limit of norm(x,p)^p so I'm wondering if norm should really be in the zero-"norm" business. It's really simple to write count(!iszero, x). (The other language for numerical computations return Inf for p=0.)
That is not the case though. The limit diverges whenever 2 or more elements are non-zero.
Whoops, sorry, you are quite right, I gave the wrong definition. The conventional definition is simply the number of non-zero elements of x. The rest of my conclusions (for Inf and NaN) stand.
We should either allow norm(x, 0) or not. Since there seems to be one universally accepted definition of the "L0 norm" (at least in ℝⁿ), it seems reasonable to support this definition. We also seem committed to supporting it in Julia 1.x by backwards compatibility. Given that, I feel that that current NaN behavior is a bug.
The conventional definition is simply the number of non-zero elements of x.
Indeed but I just don't think it makes much sense to include this definition in a p-norm function. It's a pretty drastic discontinuity in p at zero. Restricting to p>0 or maybe even p>=1 (to make it a norm) seems fair to me and would solve this particular issue as well.
Restricting to
p>0or maybe evenp>=1(to make it a norm) seems fair to me and would solve this particular issue as well.
Yes, but those are both breaking changes, so not an option until a 2.0 release of LinearAlgebra, which can't be done until we can change stdlib versions independent of Julia versions.
I've opened https://github.com/JuliaLang/julia/issues/30810 to track the possible deprecation of norm(x,0) and https://github.com/JuliaLang/julia/pull/30809 to handle the short term bugfix.
Most helpful comment
norm(0, 0) = 0is correct. This is a standard definition, not up for debate.The reason here is that "0 norm" is a bit funny, and the standard definition is
essentially the limit p⟶ 0 ofthe number of nonzero-components ofnorm(x, p).x. So forx == 0it gives zero. For any other finite value it is 1. There are a couple of funny cases:For
x == Infwhat do you do? I think the right viewpoint is that forx == ±Infyou should take thex⟶±∞limit of ofnorm(x, 0), which gives 1.For an
xofNaN, what do you do? The usual approach for anyf(NaN)is to returnNaNunless the result off(x)is independent of the value ofx, in which case you return that result.norm(x,0)is not independent of the value ofx(it can be 0 or 1), so you should returnNaN.