Phpspreadsheet: Cell::getValue() does not return raw value with numbers

Created on 29 Jan 2018  Â·  9Comments  Â·  Source: PHPOffice/PhpSpreadsheet

This is:

- [x] a bug report
- [ ] a feature request
- [ ] **not** a usage question (ask them on https://stackoverflow.com/questions/tagged/phpspreadsheet or https://gitter.im/PHPOffice/PhpSpreadsheet)

What is the expected behavior?

In any case, the method Cell::getValue() should not cast the value of the cell but should return only string value as stored in the file.

The int/float value should be given only in the method Cell::getFormattedValue().

What is the current behavior?

Cell::getValue() casts the value of cell as integer/float even if the cell contents digits that not represent a number.

That causes also a lost of precision with big int.

What are the steps to reproduce?

<?php
require __DIR__ . '/vendor/autoload.php';

$value = '10210303648605466'; //this is not a number but a string

$filePath = __DIR__ . '/example.csv';
file_put_contents($filePath, $value);

$spreadSheet = \PhpOffice\PhpSpreadsheet\IOFactory::load($filePath);

var_dump($spreadSheet->getActiveSheet()->getCell('A1')->getValue() === $value); //should be bool(true) but display bool(false)

Which versions of PhpSpreadsheet and PHP are affected?

My env:

- PHP:7.2.1-1+ubuntu14.04.1+deb.sury.org+1PHP
- phpoffice/phpspreadsheet:1.0.0

Most helpful comment

Ok thanks for this point of the doc.

So, there is a mistake in the doc: http://phpspreadsheet.readthedocs.io/en/develop/topics/accessing-cells/#retrieving-a-cell-value-by-coordinate indicates that "getValue" returns a "raw" value ("This will retrieve the raw, unformatted value contained in the cell."). But it is not the case because "getValue" returns a casted value as a number if digits are detected (raw value from a file must always be a string).

Maybe it should be interesting in the future to implement a complementary method "Cell::getRawValue()" that always returns the string value as found in the file.

All 9 comments

The current behavior is meant for the most common use-cases. If you need advanced needs, you should rely on a custom value binder as described in the docs: http://phpspreadsheet.readthedocs.io/en/develop/topics/reading-files/#a-brief-word-about-the-advanced-value-binder

Ok thanks for this point of the doc.

So, there is a mistake in the doc: http://phpspreadsheet.readthedocs.io/en/develop/topics/accessing-cells/#retrieving-a-cell-value-by-coordinate indicates that "getValue" returns a "raw" value ("This will retrieve the raw, unformatted value contained in the cell."). But it is not the case because "getValue" returns a casted value as a number if digits are detected (raw value from a file must always be a string).

Maybe it should be interesting in the future to implement a complementary method "Cell::getRawValue()" that always returns the string value as found in the file.

"raw" might be up to debate, but in this case it is "raw" as opposed to "calculated" as seen in getCalculatedValue(). Please free to suggest a PR if you see a way to improve it.

I found this issue because I was constantly reading big integers in a file that should not contain any numbers at all. Turned out that product numbers like "530E08" are recognized as scientific numbers and converted into 53000000000.

In my eyes this is really unexpected behaviour, if one only sticks to the documentation. (And besides I doubt that this should be converted at all. Nobody would ever write 530E08 but rather 5.3E10.

Perhaps a very simple handler (interpreting everything as a string) could be added and optionally be used at file opening?

The Problem CANNOT be fixed with a custom value binder as it already happens in the XLSX reader!

\PhpOffice\PhpSpreadsheet\Readermd5-7ab919ac3e70e680f2de3b07c6ba2f7flsx:980

// read empty cells or the cells are not empty
if ($this->readEmptyCells || ($value !== null && $value !== '')) {
    // Check for numeric values
    if (is_numeric($value) && $cellDataType != 's') {
        if ($value == (int) $value) {
             $value = (int) $value;
        } elseif ($value == (float) $value) {
             $value = (float) $value;
        } elseif ($value == (float) $value) {
             $value = (float) $value;
        }
}

… and if you want to create a custom version of the XLSX reader YOU HAVE TO COPY THE WHOLE CLASS because WAY TOO MUCH used methods are … PRIVATE!!! 😂

But yeah - it works!

Actually this is not a problem with XLSX, because it comes with a type definition - if I mark a cell as "text" in Excel it is read as text here => fine.

There is a problem with CSV files which by definition cannot contain any meta information. If I write "530E08" into a CSV file nobody an know whether it should be a string or a number.

Problem is: If the content is treated as a number, there is no way for the user of the package to revert to the string representation again (as this is ambiguous, "5300E7", "530E08", "53E09" etc). If anything (but perhaps [1-9][0-9]+) would be treated as a string, the user could still decide what to do with the result.

Therefore I consider this to be a bug in the CSV reader.

@SlowFox71 your issue is described in detail in #1132

@riconeitzel is correct - the error already happens in the reader. I do not know how to fix this with the current phpoffice. It CHANGES DATA upon reading. This completly destroys any use i had for this tool. Why is there no way to get the actual raw data?

Is there a way to ensure that no data is lost? Or can numbers with a lot of decimal places be cast to a sufficient data type during reading?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

huichen2017 picture huichen2017  Â·  4Comments

alexbog8 picture alexbog8  Â·  4Comments

PowerKiKi picture PowerKiKi  Â·  5Comments

PowerKiKi picture PowerKiKi  Â·  4Comments

cheinle picture cheinle  Â·  3Comments