Cakephp: An empty string should be NULL

Created on 31 Oct 2016  路  27Comments  路  Source: cakephp/cakephp

I faced this issue multiple times already. You can work around this issue but I think it should be a part of the ORM.

Table:

CREATE TABLE categories
(
    id INTEGER PRIMARY KEY NOT NULL,
    name VARCHAR(128) NOT NULL,
    meta_title VARCHAR(255)
);

ORM usage:

$entity = $this->Categories->newEntity($this->request->data);
$this->Categories->save($entity);

Current result:

| id | name | meta_title |
| -------- |:------------:| ---------:|
| 1 | Test | |

Should be:

| id | name | meta_title |
| -------- |:------------:| ---------:|
| 1 | Test | _NULL_ |

As I mentioned above you can work around this behavior with Behaviors/Callbacks but I think this should be something which is handled directly inside the ORM. Saving an empty string inside a NULLable field isn't the proper usage of NULL.

enhancement

Most helpful comment

If someone is interested in a work-around would be:

<?php

namespace App\Model\Behavior;

use Cake\Event\Event;
use Cake\ORM\Association;
use Cake\ORM\Behavior;
use Cake\ORM\Table;
use Cake\Utility\Hash;

class PreserveNullBehavior extends Behavior
{
    /**
     * @param \Cake\Event\Event $event
     * @param \ArrayObject $data
     * @param \ArrayObject $options
     * @return void
     */
    public function beforeMarshal(Event $event, \ArrayObject $data, \ArrayObject $options)
    {
        $data = $this->_process($data, $this->_table);
    }

    /**
     * @param \ArrayObject|array $data
     * @param \Cake\ORM\Table $table
     * @return \ArrayObject|array
     */
    protected function _process($data, Table $table)
    {
        $associations = [];
        foreach ($table->associations() as $association) { /* @var $association \Cake\ORM\Association */
            $associations[$association->property()] = $association->name();
        }

        foreach ($data as $key => $value) {
            if (array_key_exists($key, $associations)) {
                $data[$key] = $this->_process($data[$key], $table->association($associations[$key])->target());
                continue;
            }
            $nullable = Hash::get((array)$table->schema()->column($key), 'null');
            if ($nullable !== true) {
                continue;
            }
            if (is_string($value) && $value === '') {
                $data[$key] = null;
            }
        }

        return $data;
    }
}

All 27 comments

I had this issue recently and I think we need a treatment of NULL vs '' (null string).

I am also in favour of the common case of a (truncated) string being empty resulting into NULL by default. If that's not BC both options could be supported and ''-string could then - for now - be the default - at least for nullable fields if not for any (However: saving ''-string to a non-nullable field out of the box, without setting some handling config sounds weird to me, too)

This is especially important with foreign keys (which are supposed to be nullable if optional), but also with string fields and alike.
If an empty string is posted one would want to have null here saved in most cases.
Otherwise you risk data inconsistency and false results on find().

You either make it not nullable and can use != '' for checking (which I did in 2.x always), or you use null and IS/ISNOT null check.
But having to allow both means you will end up with both in your app (from different ways of input) - and as a consequence you have always conditions like where field != '' AND field IS NOT NULL etc.
if you ever forget to add both you miss out some records on finding, which introduces bugs.
I have seen so much of that error prone code everywhere in apps..

:+1: For a more core internal solution in the long run, but I am fine with any behavior opt-in for 3.3/3.4.

Maybe something along these lines: http://book.cakephp.org/3.0/en/orm/schema-system.html

public function initialize(array $config)
{
    parent::initialize($config);

    $this->table('users');
    $this->schema()->alterColumn('profile_id', array(
        'nullStrategy' => 'null'
    ));

    $this->table('users');
    $this->schema()->alterColumn('most_loved_color', array(
        'nullStrategy' => 'string'
    ));

    //...

  • By default, this should work for strings (e.g. without even setting the option explicitly)
  • This should be allowed to work for booleans: For instance we realise row flagging through a unique constraing (UNIQUE TINYINT(1) is_default NULL, for instance.
  • Not sure if it should be enable-able for number datatypes

If '' becomes NULL how does one represent ''? Having '' mean very different things based on the schema makes me concerned that determining how the ORM is going to behave is not as transparent as it could be. The current implementation has consistent behavior and if you really want/need null you can use entity getters, or omit the field from the form.

Coming from forms both mean the same thing usually. Or you couldn't separate the two semantically.
Once you leave out certain fields or get it from some other source than CRUD you immediately start to get the mentioned inconsistencies.

Thus, by saying default null you mean this should be stored as null. And the empty string in DB does simply not exist. You never want to check on both here in those mentioned cases above, or you get into maintenance hell and bug land.
I think at least that is the general opinion here from most. Since this is a BC break, I totally understand to make this more data consistent behavior an opt-in thing for 3.x so far.

Please note that '0' would of course not be transformed to null.
We must add tests and not use !empty(), better strict === '' checks to avoid false positive transformation.

I think they do mean the same thing in one context. Either '', null, false, 0 mean NULL or they mean, '' false or null.

@markstory Entity setters/getters are actually a good idea. Maybe we should simply spice up the docs for now?

Edit: However checking is_default_record IS NULL is something entirely different than checking against 0.

No, 0 or false are not part of this as I just added at the end.
This is important for either numeric or bool fields to be properly respected, or 0 to be not treated as empty value for string fields.
We are only talking about the scope of empty string here right now.

You are. But a (unique true) boolean can be very well represented by NULL (=false) and 1 (=true).
Anyway, that can both be mostly configured through entity setters. I'll setup a PR for that.

What about nullable INT and BOOLEAN fields? How would 0 and '' be handled there?

Isn't a nullable int with empty '' coming in already stored as null? Same for bool with incoming ''?
That would be the "intuitive" expectation.
Because in that case someone put the form field to "empty=>true" which means optional, which means nullable.

0 and 1 for bool should be treated as always, same for 0 for int.

It should be, but I wanted to make sure you didn't want to change that behavior as well.

馃憥 Empty string is quite different from null. If you don't want to save an empty string, set allowEmpty to false in your validation rules.

@josegonzalez You are missing the whole point. allowEmpty false would only create the mess and inconsistency we just talked about the last 10+ comments :)

Isn't a nullable int with empty '' coming in already stored as null? Same for bool?

If that's the case, that's great! (e.g. boolean '' save resulting in TINYINT NULL rather than TINYINT 0)

@ionas Yes it is and the behavior shouldn't be changed as @markstory already mentioned.

Actually, BoolType and empty string (nullable meaning of empty=>true coming from form) also behaves wrong as it looks like:

$this->assertFalse($this->type->marshal(''));

should be

$this->assertNull($this->type->marshal(''));

To change this inside 3.x would be out of scope for now probably.

IntegerType works as expected:

    $result = $this->type->marshal('', $this->driver);
    $this->assertNull($result);

If someone is interested in a work-around would be:

<?php

namespace App\Model\Behavior;

use Cake\Event\Event;
use Cake\ORM\Association;
use Cake\ORM\Behavior;
use Cake\ORM\Table;
use Cake\Utility\Hash;

class PreserveNullBehavior extends Behavior
{
    /**
     * @param \Cake\Event\Event $event
     * @param \ArrayObject $data
     * @param \ArrayObject $options
     * @return void
     */
    public function beforeMarshal(Event $event, \ArrayObject $data, \ArrayObject $options)
    {
        $data = $this->_process($data, $this->_table);
    }

    /**
     * @param \ArrayObject|array $data
     * @param \Cake\ORM\Table $table
     * @return \ArrayObject|array
     */
    protected function _process($data, Table $table)
    {
        $associations = [];
        foreach ($table->associations() as $association) { /* @var $association \Cake\ORM\Association */
            $associations[$association->property()] = $association->name();
        }

        foreach ($data as $key => $value) {
            if (array_key_exists($key, $associations)) {
                $data[$key] = $this->_process($data[$key], $table->association($associations[$key])->target());
                continue;
            }
            $nullable = Hash::get((array)$table->schema()->column($key), 'null');
            if ($nullable !== true) {
                continue;
            }
            if (is_string($value) && $value === '') {
                $data[$key] = null;
            }
        }

        return $data;
    }
}

Thanks @steefaan
Based on your input I made a test case here for all kind of types:
https://github.com/dereuromark/cakephp-shim/commit/e8794cf2340502129dace06d96593c23136f3bf2#diff-e4ba1b1125865b7f834ad66a4059c6f0R35
Looks like this behavior approach indeed fixes the issue for both bool and string type.

I even tried to add back the empty string as possible nullish value - when explicitly set as default value in the schema: https://github.com/dereuromark/cakephp-shim/pull/9
But maybe this is too much? I wonder why is not really working out? The schema default is still null here for some reason.

What about adding such a behavior to the core so that it can be opted in by app developers?
Other than that and alterColumn, it could also be configured through the datasource setting (e.g. app.php).

I also found a fixture issue here due to some testing with nullable columns: https://github.com/cakephp/cakephp/issues/9686

OK, so I released the Nullable behavior in Shim 1.4.0 as proof of concept and along with some tests that show the usually expected behavior.
With the 3.3.7 core release those tests will then all pass I assume.
People interested in getting their DB clean can already use that right away.

We should probably now find a way to get those fixes into the core somehow.

People doing

    'conditions' => [
        'secondary IS NOT' => null, // secondary is a MySQL TEXT NULL field
        'secondary IS NOT' => ''
    ]

which obviously need to be

    'conditions' => [
        'secondary IS NOT' => null, // secondary is a MySQL TEXT NULL field
        'secondary !=' => ''
    ]

only shows quite clearly the symtoms of these issues around two nullable types so to speak and the unclean defaults on ORM level here IMO.

That shouldnt happen in the first place for most default use cases.

  • Either you disable NULL and empty string is the empty value (requires ORM to force null into '')
  • you allow NULL and empty string becomes null (provided by shim behavior currently)

so you only have to change for a single nullish value in queries. Less false positives on result sets after updates from either form (often times empty string pollution) or internal stuff (often times null injection that can lead to the ORM exception in the first case).

The only 3rd use case is when you default to null but empty string is a non-empty value.
But in 99% of all use cases I have not seen this as expectation. I do however suspect this is the motivation behind the current behavior.
So is there a way we could easily toggle those somehow? To not prevent this, but make this the configurable non-default (which requires some additional data integrity checks then of course). Default should be single nullish value (based on schema NULL vs NOT NULL setting ideally).

'conditions' => [ 'secondary IS NOT' => null, // secondary is a MySQL TEXT NULL field 'secondary IS NOT' => '' ]

is just wrong... and hopefully fails with an error (if not it should).

The 2nd example is all right - where is the problem?

That shouldnt happen in the first place for most default use cases.

Either you disable NULL and empty string is the empty value (requires ORM to force null into '')
you allow NULL and empty string becomes null (provided by shim behavior currently)

I don't think there should be any automagic here. SQL NULL is not the same as SQL '' nor vice versa and we should not make any decisions. NULL in SQL - to me - feels more like void in PHP 7.1+

Actually, BoolType and empty string (nullable meaning of empty=>true coming from form) also behaves wrong

I agree, for BoolType empty string should be marshalled to null not false. We can change this in 4.0.

But for StringType auto converting empty string to null is not something that core should do. Those who need this behavior can easily handle it at app level with a behavior.

I dont agree on the StringType part as this is dangerous without having my Shim.Nullable in place.
This should be fixed as per schema in a future version IMO. Or we need to document the need in default cases more clearly. Right now, 90% of users aren't aware of the data inconsistency issues!

Hi, In Cake 3.8,

I have this In the model:

$validator
    ->requirePresence('field_guid', 'create')
    ->allowEmptyString('field_guid');

This in the DB:

`field_guid` char(36) NOT NULL,

When I do:

$user->field_guid = "";
$this->Users->save($user);

I get this (first query log, then DB error):

2020-06-20 14:36:28 Debug: duration=0 rows=0 INSERT INTO users (field_guid) VALUES (NULL)
2020-06-20 14:46:33 Error: [PDOException] SQLSTATE[23000]: Integrity constraint violation: 1048 Column 'field_guid' cannot be null (/app/vendor/cakephp/cakephp/src/Database/Statement/MysqlStatement.php:38)

Can I make Cake send an empty string?

Found my own answer: char(36) casts "" to NULL, varchar(36) leaves "" as "".

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Code-Working picture Code-Working  路  3Comments

Erwane picture Erwane  路  3Comments

tersmitten picture tersmitten  路  3Comments

marpe picture marpe  路  3Comments

inoas picture inoas  路  3Comments