Astropy: Possible precision loss bug when reading FITS files with astropy.io

Created on 21 Sep 2016 · 59Comments · Source: astropy/astropy

I and my colleague @wafels encountered a possible bug when attempting to read modified julian day (MJD) values out of a FITS file using astropy.io.fits. A working example is below. For reference, the file in question can be found here. It contains a catalogue of triggers marked 'SFLARE' detected by the Fermi mission.

In [2]: from astropy.io import fits
In [3]: from astropy.time import Time
In [4]: result = fits.open('browse_results.fits’)
In [5]: hdu1 = result[1]
In [6]: hdu1.data['TRIGGER_TIME'][108]
Out[6]: 55719.266
In [7]: tt = Time(hdu1.data['TRIGGER_TIME'][108],format = 'mjd')

In [8]: tt.datetime
Out[8]: datetime.datetime(2011, 6, 7, 6, 22, 30)

For reference, the correct time (verified from manually viewing the file, and also from the original Fermi catalogue online) for this trigger is actually: 2011-06-07 06:23:06:652

As a further test, the correct trigger time is read out by MRDFITS in IDL. As best I can tell, there seems to be some sort of precision loss when reading this file with astropy.io.fits. This can be seen here:

In [9]: print repr(np.float64(hdu1.data['TRIGGER_TIME'][108]))
55719.265625

Whereas from IDL:

IDL> result = mrdfits('browse_results.fits',1)
IDL> print,result[108].trigger_time,format='(d20.13)' 
 55719.2660492089999

Putting these numbers into a time conversion, e.g. xTIME gives you the same difference in times, 55719.265625 --> 06:22:30 UT, and 55719.26604921 --> 06:23:06.652 UT

Although this example is specific to times in MJD format, it could be a more general problem - I have not tested this behaviour on other data types. Another possibility is that something is slightly non-standard regarding how the FITS file is formatted, but I am not sure what would cause this.

Bug io.fits

Source

aringlis

Most helpful comment

Ops, @saimn ... Didn't see your comment above. :sweat_smile:

pllim on 21 Sep 2016

😄2

All 59 comments

I just want to confirm that in Mathematica I get the same value as @aringlis gets in IDL.

mcara on 21 Sep 2016

That's weird. The dtype is f8 and the value saved is 55719.266049209. I don't know (yet) what happened but it seems they were converted to float32 (but f8 should be float64):

with fits.open('browse_results.fits') as hdus:
    hdus.info()
    hdu = hdus[1]
    print(hdu.data['TRIGGER_TIME'][108], hdu.data['TRIGGER_TIME'].dtype)
55719.3 float32

if it were converted to float64 it would be "correct":

import numpy as np

'{:.20f}'.format(np.float32(55719.266049209))
'55719.26562500000000000000'

'{:.20f}'.format(np.float32(55719.266049209))
'55719.26604920899990247563'

MSeifert04 on 21 Sep 2016

I think it is because TFORM6 = F15.0. I can't find any entry for F in http://docs.astropy.org/en/stable/io/fits/usage/table.html#creating-a-fits-table . When I change that to TFORM6 = D15.0, that column is read in as float64 (I used tab = table.read(filename, format='fits')).

>>> tab[108]['TRIGGER_TIME']  # IDL is 55719.2660492089999
55719.266049209
>>> tt = Time(tab[108]['TRIGGER_TIME'], format='mjd')
>>> tt.datetime.strftime('%Y-%m-%d %H:%M:%S.%f')  # Should be 2011-06-07 06:23:06:652
'2011-06-07 06:23:06.651658'

pllim on 21 Sep 2016

👍2

I removed the bug tag for now because I don't know if this is bad code or bad data. Although fitscheck (full command below) does not throw any errors.

fitscheck browse_results.fits --ignore-missing

pllim on 21 Sep 2016

By the way, I also cannot find any documentation on setting TFORM to F in FITSIO (https://heasarc.gsfc.nasa.gov/docs/software/fitsio/quick/node10.html). So this could be bad data?

pllim on 21 Sep 2016

The format keywords listed in the doc ( @pllim 's link) are for binary tables. For ascii tables, F15.0 seems valid, see Table 15 in http://www.aanda.org/articles/aa/pdf/2010/16/aa15362-10.pdf
Also a bit later (page 20):

There is no difference between the F, D, and E formats; the content of the string
determines its interpretation. Numbers requiring more precision
and/or range than the local computer can support may be rep-
resented. It is good form to specify a D format in TFORMn for a
column of an ASCII table when that column will contain num-
bers that cannot be accurately represented in 32-bit IEEE binary
format (see Appendix E).

saimn on 21 Sep 2016

👍1

I would say this is a bug in the code (though I am no expert in FITS so quite possibly I may be wrong). Based on https://heasarc.gsfc.nasa.gov/docs/software/fitsio/quick/node10.html :

    **ASCII Table** Column Format Codes
    -------------------------------
    (w = column width, d = no. of decimal places to display)
        Aw   - character string
        Iw   - integer
        **Fw.d - fixed floating point**
        Ew.d - exponential floating point
        Dw.d - exponential floating point

    Binary Table Column Format Codes
    --------------------------------
    (r = vector length, default = 1)
        rA  - character string
        rAw - array of strings, each of length w
        rL  - logical
        rX  - bit
        rB  - unsigned byte
        rS  - signed byte **
        rI  - signed 16-bit integer
        rU  - unsigned 16-bit integer **
        rJ  - signed 32-bit integer
        rV  - unsigned 32-bit integer **
        rK  - signed 64-bit integer
        rE  - 32-bit floating point
        rD  - 64-bit floating point
        rC  - 32-bit complex pair
        rM  - 64-bit complex pair

The fact that a "format" is F (fixed point) does not imply that it is a single precision number - it is just the format in which the number is written. So, for ASCII tables it would be wiser to read such numbers as float64 (as does Mathematica and IDL).

mcara on 21 Sep 2016

But isn't FITS table a binary table? So, why does ASCII format even apply here?

pllim on 21 Sep 2016

@pllim The table here is an ASCII one.

saimn on 21 Sep 2016

I am confused. So I am invoking a magical genie... @embray

pllim on 21 Sep 2016

@pllim According to http://fits.gsfc.nasa.gov/xtension.html

Standard Extensions:

IMAGE - This extension type provides a means of storing a multidimensional array similar to that of the FITS primary header and data unit. Approved as a standard extension in 1994. The format of IMAGE extensions is defined in the A&AS, 105, 53 (1994) journal article.

TABLE - This ASCII table extension type contains rows and columns of data entries expressed as ASCII characters. Approved as a standard extension in 1988. The format of TABLE extensions is defined in the A&AS, 73, 365 (1988) journal article.

BINTABLE - This binary table extension type provides a more flexible and efficient means of storing data structures than is provided by the TABLE extension type. The table rows may contain a mixture of numerical, logical and character data entries. In addition, each entry is allowed to be a single dimensioned array. Numeric data are kept in binary formats. Approved as a standard extension in 1994. The format of BINTABLE extensions is defined in the A&AS, 113, 159 (1995) journal article.

mcara on 21 Sep 2016

@pllim http://docs.astropy.org/en/stable/io/fits/usage/unfamiliar.html#ascii-tables
See the line Fw.d Single precision real. In this case, even with F15.0 it should be parsed as double.

saimn on 21 Sep 2016

@aringlis table has:

XTENSION= 'TABLE ' / ASCII TABLE EXTENSION

So, it is an ASCII table

mcara on 21 Sep 2016

Ah, okay... I found it... http://docs.astropy.org/en/stable/io/fits/usage/unfamiliar.html#ascii-tables

As documented in Astropy, it clearly states:

Fw.d       Single precision real

So, according to the doc anyway, it is "working as intended". Why it is this way, I can't explain. I didn't write it.

pllim on 21 Sep 2016

👍1

Ops, @saimn ... Didn't see your comment above. :sweat_smile:

pllim on 21 Sep 2016

😄2

The fact that a "format" is F (fixed point) does not imply that it is a single precision number - it is just the format in which the number is written. So, for ASCII tables it would be wiser to read such numbers as float64 (as does Mathematica and IDL).

@mcara , that is a good point. In fact, there is already a precedent from other software makes a strong argument. But still, I would categorize this as API change rather than a bug.

pllim on 21 Sep 2016

Quoting from http://www.aanda.org/articles/aa/pdf/2010/16/aa15362-10.pdf (end of section 7.2.5) (see also https://github.com/astropy/astropy/issues/5353#issuecomment-248731885)

There is no difference between the F, D ,and E formats; the content of the string determines its interpretation. Numbers requiring more precision and / or range than the local computer can support may be represented. It is good form to specify a D format in TFORMn for a column of an ASCII table when that column will contain numbers that cannot be accurately represented in 32-bit IEEE binary format (see Appendix E ).

I don't know what this means in practice (interpret everything as float64?) but given that the number cannot be accuratly represented in 32bit they violated the "good form". :smile:

MSeifert04 on 21 Sep 2016

👍1

@pllim I still think this is a bug: the way astropy decided to interpret TFORM is different from the FITS standard. From the same paper reported by @MSeifert04, it clearly says: _"There is no difference between the F, D, and E formats..."_ and thus astropy took upon itself to change the meaning of this interpretation in a non-compliant with FITS standard way. Moreover, continuing previous quote, _"...the content of the string determines its interpretation."_ Thus, if a number in a string has, e.g., 10 digits, then obviously software should use double-precision numbers to represent it.

I think a bug is a bug regardless of whether it was unintentional (some silly mistake in the code due to which code does not work as intended) or due to misunderstanding (=incorrect interpretation) of the standard.

mcara on 21 Sep 2016

@aringlis Thanks for noticing this!

mcara on 21 Sep 2016

Thanks everyone for the quick response!

aringlis on 21 Sep 2016

@mcara I think that's when it becomes tricky: I understand F15.0 to mean 15 character floating point number with 0 decimals precision. Does the width or the decimal precision characterize the type of float (that's what the standard meant with "content of the string", right?)?

MSeifert04 on 21 Sep 2016

In my previous https://github.com/astropy/astropy/issues/5353#issuecomment-248751079 I should correct my last paragraph:

I think a bug is a bug regardless of whether it was unintentional (some silly mistake in the code due to which code does not work as intended) or due to misunderstanding (=incorrect interpretation) of the standard., _unless developers of astropy.io.fits have never intended for this package to adhere to FITS standards._

mcara on 22 Sep 2016

😄1

@MSeifert04 I am working on a longer answer to your previous comment (about tricky part) but F15.0 does mean that the number is 1) floating point; and 2) it can have 13 digits (plus sign and dot). d has almost no effect when reading a number (I'll come back to this issue in my longer answer).

mcara on 22 Sep 2016

@MSeifert04 It has been quite some time since I wrote my last Fortran program so I had to look for references to answer to your last comment (interpretation of the Fw.d - https://github.com/astropy/astropy/issues/5353#issuecomment-248755634). This is indeed quite tricky and it would be nice if an expert in FITS standard could contribute to this interpretation. So, I'll try to provide my own non-expert understanding of things... In doing this I will refer to:

Intel Fortran: https://software.intel.com/en-us/intel-fortran-compiler-17.0-user-and-reference-guide (especially, see section on "Real and Complex Editing" in "I/O formatting")
Oracle Fortran: https://docs.oracle.com/cd/E19957-01/805-4939/index.html (especially see the section on "Editing REAL Data (D, E, F, G)")
Definition of FITS, v3.0: http://www.aanda.org/articles/aa/pdf/2010/16/aa15362-10.pdf

Let's start with Fortran. According to both Intel and Oracle reference manuals, there is a distinction between input and output formats. While everything that I say next equally applies to all real editing formats, I will specifically refer to "F editing" as this is the format used in @aringlis tables. In general, F format specifier is of the form Fw.d. According to Oracle, Chapter 5, F Editing:

d indicates that the fractional part of the number (the part to the right of the decimal point) has d digits. However, if the _input_ datum contains a decimal point, that _decimal point overrides the d value_. [My emphasis]

Similarly, Intel's guide, section on F Editing says:

If the _input field_ contains a decimal point, _the location of that decimal point overrides the location specified by the F descriptor_. [My emphasis]

That is why 0 in F15.0 has no practical effect when reading data.

The situation is different when writing data, but I just discovered that it is controlled by FITS standard in a more complicated way than it would appear from TFORM. So, now let's discuss FITS...

From the section 7.2.1 of the FITS standard paper, see paragraph on "TFORMn keywords", it says:

_"Only the formats in Table 15, interpreted as Fortran (ISO 2004) input formats and discussed in more detail in Sect. 7.2.5, are permitted for encoding."_
_"The TDISPn keyword, defined in Sect. 7.2.2, may be used to recommend that a decimal integer value in an ASCII table be displayed as the equivalent binary, octal, or hexadecimal value."_

So, now let's take a look at the TDISPn keyword (section 7.2.2 in FITS paper):

TDISPn keywords. The value field of this indexed keyword shall contain a character string describing the _format recommended for displaying an ASCII text representation of of the contents of field n_. _This keyword overrides the default display format given by the TFORMn keyword._

@aringlis table has TDISP6 = 'F15.7' and it overrides TFORM6 = 'F15.0 ' which explains why his MJD value has digits after the point when it was written out. However, what is strange to me is that his table value has too many digits compared to format specification F15.7:

write(*, "(F15.7)") 55719.266049209D0
! should have produced 55719.2660492 without trailing "09"

This may be an indication that software used by @aringlis for creating fits files is also non-compliant with FITS standard or maybe it is an indication that I am missing something in understanding of the FITS standard. Maybe "recommended" is a key word in the description of TDISPn 😄. It appears that Fermi is using CFITSIO library so maybe it would be useful to discuss the issue of too many digits written with TDISP6 = 'F15.7' with CFITSIO group (if you know anyone over there).

mcara on 22 Sep 2016

@MSeifert04 Ah, I forgot to answer another issue mentioned in your comment https://github.com/astropy/astropy/issues/5353#issuecomment-248755634. In my opinion, (from FITS paper, section 7.2.5)

There is no difference between the F, D, and E formats; the content of the string determines its interpretation. Numbers requiring more precision and/or range than the local computer can support may be represented.

suggests that it is reader's responsibility to analyze input data and decide what is the most suitable numerical representation based on "string content". That is, if all numbers in a table column are of the form xx.xxx then maybe a 32-bit IEEE binary format is sufficient to hold such data. If data in a table are of the form x.xxxxxxxx then maybe a double precision numbers should be used, and so on...

It _seems_ that it would not be wrong to use double precision for all floating-point numbers.

mcara on 22 Sep 2016

@mcara Thank you for the detailed explanations. I actually never suspected that there is such a thing as "F editing" :smile:

I'll try to summarize it (let me know if I got it wrong):

"F editing" is pretty strict but with the TDISPn recommendation that's something one shouldn't apply to real FITS files.
Noone gives a definite limit (i.e. on the width) when to use doubles instead of single-precision floating point values?
The "good form" in the FITS standard provides an unmistakable hint (using D instead of E or F) that the number should be interpreted as double.

I just checked and Wikipedia states that singles have 6-9 (with decimal point and sign this can be 6-11) and doubles have 15-17 significant digits. But in my fits table experience (very limited!) generally widths of 8 and 15 (one had 25) are common. This seems to me like there is some, let's call it informal, standard that a width of 8 indicates single and 15 indicates doubles (and 25 maybe float128?). -- I don't know if that's true, it's just a guess!

Possible solutions would be:

to promote F and E if the width is bigger than xxx to float64
or print a precision-loss-warning if the width is bigger than yyy but it's not D (but keep everything else as-is)
or some combination of 1 and 2 but make xxx and yyy configurable.
or simply to always convert all float columns to doubles.

The last one would be pretty easy to implement and only double the memory. :smile: I haven't checked if any of the other "options" are reasonably possible.

MSeifert04 on 22 Sep 2016

It seems like something like option 1 is already supported but _not done_ in this case.

MSeifert04 on 22 Sep 2016

@MSeifert04 Let me first reply to your summary:

I'll try to summarize it (let me know if I got it wrong):

"F editing" is pretty strict but with the TDISPn recommendation that's something one shouldn't apply to real FITS files.

Sorry, it could be me, but I cannot follow this statement, especially _"...with the TDISPn recommendation that's something one shouldn't apply to real FITS files."_ Could you, please clarify what do you mean here?

Noone gives a definite limit (i.e. on the width) when to use doubles instead of single-precision floating point values?

I do not think that width (w in Fw.d) should be a guide. For example, if I write a list of numbers such as [0.12, 1.34, 2.34, 3.67] using F15.7, I will get strings similar to F4.2 (without all the white space in front of them). Thus, when reading back such a table, it should be enough to use 32-bit floats (even though TFORMn is F15.7). Thus the determination of the format should be done by reading the entire table and then analyzing the strings and trying to determine the smallest format that can hold the data. Of course, if one does not desire to do such a "sophisticated" analysis, one can use w instead.

The "good form" in the FITS standard provides an unmistakable hint (using D instead of E or F) that the number should be interpreted as double.

I think that "good form" clause is reasonable but it should not be used for deciding precision of floats. I think this sentence from FITS paper: _"There is no difference between the F, D, and E formats; the content of the string determines its interpretation"_ clearly implies that we cannot (or should not) rely format field descriptors to determine precision (single/double/quad) of floats.

mcara on 22 Sep 2016

@MSeifert04 Now, let's discuss solutions...

Possible solutions would be:

to promote F and E if the width is bigger than xxx to float64

or print a precision-loss-warning if the width is bigger than yyy but it's not D (but keep everything else as-is)

or some combination of 1 and 2 but make xxx and yyy configurable.

or simply to always convert all float columns to doubles.
The last one would be pretty easy to implement and only double the memory.

Assuming that implementation of a more sophisticated analysis of strings (as described in my previous comment) is too difficult and time consuming, first option seems most reasonable, except that it needs to be modified to include format field descriptor "D" since _"There is no difference between the F, D, and E formats..."_ I think a warning would be useful only when we cannot represent table data without losing precision, such as when TFORMn is, e.g., something crazy like F512.480 (for which even numpy.float128 wouldn't be enough).

Now I would like to make some general comments about the code that you have pointed to as well as the entire column.py. Let me start by saying that I am not familiar _at all_ with this code and my comments are based on a quick look at the file (and what I see in there is so weird - I am surprised it works).

First, it is not clear why the code is not working. You say: _"It seems like something like option 1 is already supported but not done in this case."_ Why (How) is it "not done"?
I think that assumptions used in the code:

python # mapping from ASCII table TFORM data type to numpy data type # A: Character # I: Integer (32-bit) # J: Integer (64-bit; non-standard) # F: Float (32-bit; fixed decimal notation) # E: Float (32-bit; exponential notation) # D: Float (64-bit; exponential notation, always 64-bit by convention) ASCII2NUMPY = {'A': 'a', 'I': 'i4', 'J': 'i8', 'F': 'f4', 'E': 'f4', 'D': 'f8'}

are not in agreement with FITS standard as D should be treated the same as F and E. Yes, it is "good form" to _save_ double precision (numpy.float64) data using "D". However, when _reading_ tables, we have no control over how a table was written and so we should not make such assumptions.

I can't figure out what is _scalar_to_format() doing... This is just a personal curiosity... For example:

``` python

fits.column._scalar_to_format(1)
'B'
fits.column._scalar_to_format(1.287906432876532)
'B'
fits.column._scalar_to_format(128790643287653.2)
'K'
fits.column._scalar_to_format(128790643287653.2e1)
'K'
fits.column._scalar_to_format(128790643287653.2e10)

TypeError Traceback (most recent call last)
in ()
----> 1 fits.column._scalar_to_format(128790643287653.2e10)

/.../lib/python2.7/site-packages/astropy/io/fits/column.pyc in _scalar_to_format(value)
2114 return FITSUPCONVERTERS.get(fits_format, fits_format)
2115 except KeyError:
-> 2116 return "A" + str(len(value))
2117
2118 def _cmp_recformats(f1, f2):

TypeError: object of type 'long' has no len()
```

I am surprised that both 1 and 1.287906432876532 have the same format "B" - unsigned byte according to http://docs.astropy.org/en/stable/io/fits/usage/table.html#column-creation. At the same time, 128790643287653.2 has format "K" (64-bit integer) while 128790643287653.2e10 crashes the code. Maybe everything works as intended but it is weird.

mcara on 22 Sep 2016

Ah, forgot one more thing: _convert_ascii_format() should also analyze TDISPn in addition to TFORMn since TDISPn overrides TFORMn.

mcara on 22 Sep 2016

@MSeifert04 wrote:

It seems like something like option 1 is already supported but not done in this case.
(https://github.com/astropy/astropy/blob/master/astropy/io/fits/column.py#L2295)

Yep, the format is correct here. I think the issue happens later, in FITS_rec._convert_ascii, which guess the numpy format directly from ASCII2NUMPY without using the result from the above function.
https://github.com/astropy/astropy/blob/master/astropy/io/fits/fitsrec.py#L840
So my guess is that recformat = ASCII2NUMPY[format[0]] should be replaced by recformat = format.recformat. It seems to work with this change.

@mcara : TDISP is just for display (after possible scaling) so it's not relevant here I think.

saimn on 22 Sep 2016

@saimn I thought so too but that produces two failures (might be unrelated, haven't had time to investigate): https://travis-ci.org/MSeifert04/astropy/builds/161876288

MSeifert04 on 22 Sep 2016

@mcara The _scalar_to_format is weird indeed. But I think it only expects strings (then int(value) fails!) but I'm not totally sure about that!

I don't think the FITS standard meant the actual values in the column when they said "contents of the string". I assumed that was supposed to mean the {w}.{d} part of the TFORMn string. But for all practical purposes I think it's just not feasible to parse the whole column to determine the dtype. Also I am no FITS standard expert so this is just my interpretation.

I think that assumptions used in the code [...] are not in agreement with FITS standard as D should be treated the same as F and E. Yes, it is "good form" to save double precision (numpy.float64) data using "D". However, when reading tables, we have no control over how a table was written and so we should not make such assumptions.

I think these "assumptions" only give the minimal dtype for these formats and as such these are perfectly fine (my opinion). At least as soon as there is one mechanism that upcasts them if they fulfill certain conditions (like the mentioned _convert_ascii_format function).

I agree in this case the values should have been interpreted as float64 (which is what IDL and Mathematica already do) and by not doing so this issue should be classified as Bug.

Sorry, it could be me, but I cannot follow this statement, especially "...with the TDISPn recommendation that's something one shouldn't apply to real FITS files." Could you, please clarify what do you mean here?

I think I have to retract that statement, I guess I don't understand what TDISPn means.

The "good form" in the FITS standard provides an unmistakable hint (using D instead of E or F) that the number should be interpreted as double.

I think that "good form" clause is reasonable but it should not be used for deciding precision of floats. I think this sentence from FITS paper: "There is no difference between the F, D, and E formats; the content of the string determines its interpretation" clearly implies that we cannot (or should not) rely format field descriptors to determine precision (single/double/quad) of floats.

I'm confused, do you mean that D should not be interpreted as double or is this more about that E and F default to float32?

MSeifert04 on 22 Sep 2016

@MSeifert04 - Indeed, in both cases some float values are saved to a FITS file, with a E15.7 format, which is then read as float64 with the change instead of float32 before. This is consistent with what we should expect from the _convert_ascii_format function, so either the data should not be written with E15.7, or the test values must be updated.

saimn on 22 Sep 2016

👍1

@saimn wrote:

@mcara : TDISP is just for display (after possible scaling) so it's not relevant here I think.

I did not know that there is a separate format used exclusively for displaying purpose. Thanks for clarifying this.

@MSeifert04 wrote:

I think these "assumptions" only give the minimal dtype for these formats and as such these are perfectly fine (my opinion).

I'm confused, do you mean that D should not be interpreted as double or is this more about that E and F default to float32?

I disagree here with you. When _saving/writing-out_ a table, I think we should follow the "good form" clause and save float64 data using D format. On the other hand, when _reading_ data, I think D format should be treated the same way as F or E formats because a table could have been written using D format for float32. Therefore, when reading, D should be treated in a similar manner as E. That's my interpretation of _"There is no difference between the F, D, and E formats; the content of the string determines its interpretation."_ Thus, I am suggesting to modify _convert_ascii_format case for E with:

elif format in ['E', 'D'] and precision > 6:

mcara on 22 Sep 2016

@saimn Do you want to open a PR for the change (https://github.com/astropy/astropy/issues/5353#issuecomment-248849801) including the test updates/fixes (https://github.com/astropy/astropy/issues/5353#issuecomment-248892320)?

@mcara I think the oversight of using _convert_ascii_format when reading a table is definitly a Bug, while the additional change to make D equivalent to E is an additional feature (even if it's just to be more fits-compliant).
Also there might be problems with that approach because as far as I see there is no column-content-scanning (just a guess!) and we might run into trouble if the precision is low but the exponent is too big for float32 (i.e. float 1.3e300) there it might make sense to explicitly use D to be sure it's parsed as double even if you don't need/want precision > 6. At the very least I don't think both issues should be adressed in a single PR.

MSeifert04 on 22 Sep 2016

@MSeifert04 wrote:

...we might run into trouble if the precision is low but the exponent is too big for float32 (i.e. float 1.3e300)...

Well, I am glad you have mentioned this because if the format (TFORMn) is, for example, E10.2 and the actual number in the string is 1.3e300 then current code, since d is 2, will try to read this number into a numpy.float32 array because current code does not analyze the "content of the string" to determine best representation.

mcara on 22 Sep 2016

Or the person who generates the file should just use "D" to begin with to avoid all this mess.

_Slowly backs out towards the exit..._

pllim on 22 Sep 2016

😄1

Exactly, that's why using a TFORM of D10.2 might make sense for those.

That example was meant as an argument against making D behave like E (At least against the "simple" change mentioned in https://github.com/astropy/astropy/issues/5353#issuecomment-248911462)

MSeifert04 on 22 Sep 2016

👍1

That example was meant as an argument against making D behave like E.

That is something that we can control on writing but it is not something that we can control on reading as we do not know how a table was created.

mcara on 22 Sep 2016

In any case, since all this discussion refers to reading tables, I think making sure code _is_ using _convert_ascii_format should fix the problem for now.

mcara on 22 Sep 2016

One more thing that I do not understand... If @aringlis tables have been written using TFORM=F15.0 why his actual "strings" are actually more like F15.9? Shouldn't have his writer actually trimmed the string to "55719."? If his tables were created using CFITSIO, can anyone verify the behavior of that package? Maybe it is a bug worth reporting to CFITSIO...

mcara on 22 Sep 2016

@mcara wrote:

If @aringlis tables have been written using TFORM=F15.0 why his actual "strings" are actually more like F15.9? Shouldn't have his writer actually trimmed the string to "55719."?

I thought the presence of a decimal point overrides the given {d}-value?! Or was that just the general F-value-editing and not incorporated into FITS? I lost track ... :smile:

MSeifert04 on 22 Sep 2016

FWIW, I would suggest emailing the fitsbits mailing list to just ask how this column should be interpreted, since the people who wrote the standard are on there.

astrofrog on 22 Sep 2016

It's also important to point out the standard is unclear, since they are currently drafting v4.0 of the standard, so this is an opportunity to clarify it: http://fits.gsfc.nasa.gov/fits_standard.html

astrofrog on 22 Sep 2016

@MSeifert04

I thought the presence of a decimal point overrides the given {d}-value?!

This is true only for input editing and it is not true for output editing. That is, decimal point overrides a given d value _only when reading_ a string.

I would highly suggest the following test to see that when using astropy.io.fits, TFORMn format is enforced on writing:

>>> # first load original @aringlis file:
>>> from astropy.io import fits
>>> from astropy.time import Time
>>> h = fits.open('browse_results.fits’)
>>> h[1].data['TRIGGER_TIME'][108]
55719.266
>>> # now save/write out this HDUList to the disk using astropy.io.fits:
>>> h.writeto("browse_results_astropy.fits")
>>> # Load this new table and check trigger time again:
>>> h2 = fits.open('browse_results_astropy.fits’)
>>> h2[1].data['TRIGGER_TIME'][108]
55719.0

That is, astropy.io.fits writer correctly formats output string (unlike the writer used to produce @aringlis table).

mcara on 22 Sep 2016

I have also tried to read both files in Mathematica. It is interesting that, when loading the re-saved table, Mathematica detects that table strings for trigger time do not have any significant digits after the dot and so returns integers:

In[1]:= t1=Import["browse_results.fits","TableData"][[1]];
triggerTime=t1[[109]][[6]];
SetAccuracy[triggerTime,15]
IntegerQ[triggerTime]
InexactNumberQ[triggerTime]
Accuracy[triggerTime]
Precision[triggerTime]
Out[3]= 55719.26604920899990
Out[4]= False
Out[5]= True
Out[6]= 11.2086
Out[7]= MachinePrecision

In[8]:= t2=Import["browse_results_astropy.fits","TableData"][[1]];
triggerTime=t2[[109]][[6]]
IntegerQ[triggerTime]
InexactNumberQ[triggerTime]
Accuracy[triggerTime]
Precision[triggerTime]
Out[9]= 55719
Out[10]= True
Out[11]= False
Out[12]= \[Infinity]
Out[13]= \[Infinity]

mcara on 22 Sep 2016

By the way, I would not assume Mathematica is necessarily doing the right thing here, and urge y'all to check with the FITS standard writers what the correct interpretation is!

astrofrog on 22 Sep 2016

I haven't read this entire thread but I think this might be the same issue I pointed out a few days ago here: https://groups.google.com/d/msg/astropy-dev/0Cc7IrBqUaA/G46aro2zAgAJ

embray on 23 Sep 2016

👍1

Okay, having read all this I can confirm you're just talking about the bug I brought up in the above e-mail. I would suggest the one-character fix that I mentioned there :)

embray on 23 Sep 2016

@embray The function _convert_ascii_format seemed to implement some kind of processing based on "cfitsio"-based values (but wasn't called and I wasn't able to determine the origin in cfitsio).

I wouldn't mind just setting the formats to float64 but then shouldn't we also remove this linked "else"-branch.

MSeifert04 on 23 Sep 2016

@mcara

I am surprised that both 1 and 1.287906432876532 have the same format "B" - unsigned byte according to http://docs.astropy.org/en/stable/io/fits/usage/table.html#column-creation. At the same time, 128790643287653.2 has format "K" (64-bit integer) while 128790643287653.2e10 crashes the code. Maybe everything works as intended but it is weird.

I think this is a bug. First it performs the step described in a comment as "First, if value is a string, try to convert to the appropriate scalar". Except it doesn't actually test if the value is a string, and the result is to immediately convert a float value to an int. Oops!

I think @MSeifert04 might be right though that it was only expecting strings to begin with. I think this was for guessing what format to use given a big list of string-formatted values. I think the purpose was for guessing what format to use when converting ASCII column data to a binary column without any format information given, which can happen in some corner cases. Yuck.

embray on 23 Sep 2016

@MSeifert04 Yes, I think you're right. I forgot about that. I think part of the problem is that many FITS files are not specifying TFORM correctly when updating the data contained in them, but I see no reason not to just use 'f8' always. I see now that I took this logic from CFITSIO so I don't feel too bad; at the time it was written it probably made sense for some large files to prevent doubling of the data size if possible.

embray on 23 Sep 2016

@MSeifert04 That said, if you (or someone) think they can fix the logic in _convert_to_ascii and make sure that it's used in the right place I'm fine with that too.

embray on 23 Sep 2016

So what is the most sensible solution ?

Fix the code to use the .recformat attribute computed by _convert_ascii_format (#5362).
Or change the ASCII2NUMPY values to always use float64, which is simpler and safer. The benefit of being able to read float32 columns for ASCII table is probably small.

saimn on 26 Sep 2016

I also think there's little benefit to reading values into float32 just because they _can_ be represented. As soon as you start doing arithmetic with those values you're going to want double precision anyways.

embray on 26 Sep 2016

:+1: to always using float64

To be fair, if a table is large enough that this starts to matter memory-wise, it should not be stored as an ASCII FITS table!

astrofrog on 27 Sep 2016

I do not have strong feelings about either way, however, I want to mention a possible consistency issue (at least in the future). Let's say users use 25 digits numbers. To store this kind of numbers, we should return numpy.float128 (or 96 on Windows). So, for higher precision numbers, different floats could be returned. It seems strange (=inconsistent) that for low precision numbers we return float64 instead of 32 if we wanted to be consistent in returning "appropriately sized" floats to hold requested precision.

This is not a fundamental problem - just something to think about. I could not find anywhere in the FITS paper anything that would suggest how FITS readers should behave.

I am for either solution.

mcara on 27 Sep 2016

Updated #5362 to always use float64 as everyone seems to be happy with this solution :)

saimn on 27 Sep 2016

Was this page helpful?

0 / 5 - 0 ratings