Here is the issue I run into during my work
The problem with the mask when it is not set
>>> a =np.arange(6,dtype=np.float64).reshape((2,3))
>>> a
array([[ 0., 1., 2.],
[ 3., 4., 5.]])
>>> am = np.ma.masked_array(a)
>>> am
masked_array(data =
[[ 0. 1. 2.]
[ 3. 4. 5.]],
mask =
False,
fill_value = 1e+20)
>>> am[0][1]=np.ma.masked
>>> am
masked_array(data =
[[ 0. 1. 2.]
[ 3. 4. 5.]],
mask =
False,
fill_value = 1e+20)
###--------- doesn't work----------------
>>> am[0,1]=np.ma.masked
>>> am
masked_array(data =
[[0.0 -- 2.0]
[3.0 4.0 5.0]],
mask =
[[False True False]
[False False False]],
fill_value = 1e+20)
###-------this way it works---------
>>> am[1][1]=np.ma.masked
>>> am
masked_array(data =
[[0.0 -- 2.0]
[3.0 -- 5.0]],
mask =
[[False True False]
[False True False]],
fill_value = 1e+20)
###--------now it surprisingly works again--
Linux 3.19.8-100.fc20.x86_64
Python 2.7.5
>>> np.__version__
'1.8.2'
I know my system is not up to date but I asked a friend who has
and he confirms that issue exists
Confirmed in master (1.12). This is like what #5580 hoped to fix, but as discussed there this particular case is not possible to fix without an overhaul of MaskedArray to remove np.nomask
.
Here's the problem: MaskedArrays sometimes store the mask as an array of booleans, and sometimes (if there are no masked values) store the mask simply as the value False
(and np.nomask == False).
The problem is that when slicing a MaskedArray (and getting a view), the mask can only be "viewed" if it is currently an array of booleans, but not if it is the constant "False". So the first time you try am[0][1] = ...
the mask is the constant "False" and can't be viewed, so doesn't get updated. The second time you try, the mask is being stored as an array of booleans so it can be viewed, and so gets updated.
Add this to the long list of bugs caused by this nomask
design, eg #7588.
Every time I think of nomask
, I also think of that Donald Knuth quote, "Premature optimization is the root of all evil."
It seems to me that nomask
is no exception. The number of unusual cases one encounters from this behavior is quite large and they are made more difficult to fix because of it.
If it was really such a productive thing to know whether the mask was trivial or not, we could just as easily have a method like has_mask
, which caches its result.
Is there any interest in moving towards removing nomask
outright?
I would like to chime in that I've also run into this issue. It might be worthwhile to add a note in the documentation about it. The following suggests to me that modifying the mask of a view will modify the mask of the original.
When accessing a slice, the output is a masked array whose
data
attribute is a view of the original data, and whose mask is eithernomask
(if there was no invalid entries in the original array) or a view of the corresponding slice of the original mask. The view is required to ensure propagation of any modification of the mask to the original.
Is there any interest in moving towards removing nomask outright?
Maybe, also masked
. This probably comes down to either making some big changes to masked arrays, or implementing a new class altogether. Might be worth putting together an NEP. I seldom use masked arrays, so this is something best done by people who need the functionality.
Most helpful comment
Every time I think of
nomask
, I also think of that Donald Knuth quote, "Premature optimization is the root of all evil."It seems to me that
nomask
is no exception. The number of unusual cases one encounters from this behavior is quite large and they are made more difficult to fix because of it.If it was really such a productive thing to know whether the mask was trivial or not, we could just as easily have a method like
has_mask
, which caches its result.Is there any interest in moving towards removing
nomask
outright?