Silverstripe-framework: DataObject::get_one() return value (boolean vs. null)

Created on 3 May 2016  路  26Comments  路  Source: silverstripe/silverstripe-framework

Hi, this may be an interesting discussion... but since I upgraded to SS 3.3 from 3.1 (since there used to be a bug in the way type hinting was being handled, i.e. ignored) I've been encountering bugs that would otherwise have arisen before due to the return value from this method. Basically if it doesn't find anything it returns either a boolean (only ever false) or an instance of the particular DataObject child in question.

Primary line responsible: https://github.com/silverstripe/silverstripe-framework/blob/master/model/DataObject.php#L3204

DataObject::$_cache_get_one[$callerClass][$cacheKey] = false;

I'd propose that we do one of the following:

  1. Change the return (when not found), and thus the type, from false to null and document this as a nullable return value, e.g. @return DataObject|null or
  2. At least document the fact that the return value has two types and clarify in the documentation that bool is always false, i.e. @return DataObject|bool False if not found.

What do you think? Is my intuition off by assuming that returning null is better in this circumstances since then type-hints will not fail if they're nullable?

affectv3 affectv4 changmajor impacmedium typbug

Most helpful comment

@silverstripe/core-team can I get a steer on whether a fixing an API-change regression from 3.1 -> 3.3 should be fixed in the 3 branch, rather than master? Please vote:

  • 馃憤 for 3
  • 馃憥 for master
  • :-/ for something else

All 26 comments

I expect that the change from null to false for an unfound value is an oversight.

The main thing is to ensure that such misses are cached. I think that putting false into the cache makes it easier to distinguish from no cached data than null.

I'd leave line 3204 as-is, but have the return value of DataObject::get_one() be null if the cache value is false.

Alternatively I guess you could use array_key_exists() rather than isset() on line 3197, and set the cache to null.

At any rate, I would add a type-sensitive test for this behaviour, so we don't miss it again.

I鈥檇 be happy to see this changed to null in master, it鈥檇 match DataList::first() and co (as an aside: the docs for those methods should probably be changed to indicate that null will be returned if no results are found).

Personal preference is for array_key_exists() and storing null in the cache. IIRC it鈥檚 marginally slower, but better than maintaining extra logic to swap between null and false.

@kinglozzer Yeah, worrying about speed here should be least of our concerns, as that'd be a micro-optimization. On my slow VM, I just tested the difference between isset() and array_key_exists() on an array of 5k keys, iterated over them 100k times (once with each function) and while isset() was faster, both took less than 0.1s to execute (0.01s vs 0.06) so it really shouldn't matter here. What matters more is a consistent and usable API with readable and easily comprehensible code.

Also, on those other methods (on DataList for example) I wanted to point out two issues with the code there:

  1. Several methods have execution paths which have a return value but have execution paths which may not return anything at all (resulting in an implicit null, as PHP technically doesn't have a void type).
  2. None of those that I've found document this second possible return value.

i.e. if you're using an IDE like PhpStorm (like myself) and you use the hinting (or click through to see the method to verify it will always return some type or another) you may end up accidentally assuming it will always return a DataObject when in fact it could return null. Granted, you should use common sense as well, like ->first() may result in a non-result, but it'd make sense to me to clean up this API to have ALL queries/results return null (explicitly via return null not just implicitly) if there is no result when one object is expected (or empty lists when multiple are expected, which is already happening) or the actual object instance itself.

I think this is important since while _some_ of these methods (on DataList especially) are pretty obvious, like ->first(), ->last(), and ->removeByID(), there may still be others which return values several methods removed which aren't documented properly which could bubble up and result in some type-related errors (i.e. null pointer references, calling a method on a non-object).

I'll do a quick PR on master for this, by the way.

I鈥檇 be happy to see this changed to null in master.

If this is behaviour that changed 3.1->3.3, do we want to change it back as a bugfix?

None of those that I've found document this second possible return value.

Improving type-hinting in docblocks would be great. I believe that there are some tools that can correct this for you based on static analysis, which would be great.

But alas, I am only human.

Isn't array_key_exists slower than isset? We could do return $result ?: null to ensure all empty values are nulled.

Isn't array_key_exists slower than isset?

I think that the impact is trivial in this case.

The speed difference is completely trivial. Using isset() over array_key_exists() for speed is a complete over-optimization. Not to mention they do different things (one checks to see if a key exists, the other also checks to see if the value is null).

@patricknelson any chance of a PR? :-)

@silverstripe/core-team can I get a steer on whether a fixing an API-change regression from 3.1 -> 3.3 should be fixed in the 3 branch, rather than master? Please vote:

  • 馃憤 for 3
  • 馃憥 for master
  • :-/ for something else

Ugh... I'd hate to say it but I think this would belong in master :disappointed: As much as I like the idea, and as eager as I am to have it happen, it really isn't very semver-friendly to drop something like this into a minor release.

Also, if this went into master I'd almost want to go ahead and wade through every place a DataList returns one-or-none and define it as DataObject|null and be explicit with it. I think these return types are important and since technically this'd be a potentially breaking API change, it'd a good chance to try and "get it right".

OK, if you're doing the fix and submitting it to master, I'm not going to stop you. :-)

Also, if this went into master I'd almost want to go ahead and wade through every place a DataList returns one-or-none and define it as DataObject|null and be explicit with it.

Agreed - that's a good idea.

I'm just not a fan of returning Object|null - I'd rather return an object always and use ->exists() to check it. We do this with has_one relations now so I'd like to see it elsewhere.

On a side note, are we not going to drop DataObject::get() and the like from 4? Isn't it just legacy 2.x support?

I'm just not a fan of returning Object|null - I'd rather return an object always and use ->exists() to check it. We do this with has_one relations now so I'd like to see it elsewhere.

This is a massive change and I'd rather not block this small fix by a massive change. You raise a valid issue but let's raise that as a separate issues. My suggestion for addressing that was here: #5452

On a side note, are we not going to drop DataObject::get() and the like from 4? Isn't it just legacy 2.x support?

This issue is about get_one(), which still provides critical caching functionality. We shouldn't make the same mistake as 3.0/3.1 did and remove this without a clear replacement. Which, again, is a massive issue.

@dhensby I'm not yet familiar with this pattern in SilverStripe -- can you link me to a quick example of it in use? Nullable object return types aren't exactly uncommon per se. It's a lot less verbose, it's intuitive and works well as long as A.) It's documented properly (which it's not currently) and B.) Developer is a good developer and does what a good developer should do and reads documentation and pays attention to return types (or potentially checks ->exists()).

can I get a steer on whether a fixing an API-change regression from 3.1 -> 3.3 should be fixed in the 3 branch, rather than master?

Think this questions is missing some context - regression was either introduced from 3.1 -> 3.2 or 3.2 -> 3.3 surely? In case of former I'd say fix on master at this point but in case of latter I'd do a patch release on 3.3 I think. Don't think there's a perfect answer.

I'm just not a fan of returning Object|null - I'd rather return an object always and use ->exists() to check it. We do this with has_one relations now so I'd like to see it elsewhere.

The only reason we do this for has_one is that it allows multi-opject creation on the fly.

$record = new Record();
$record->Title = 'My Record';
$record->Bob()->Title = 'My Bob';

// This will write a Bob and Record object with the above fields, and safely link them up.
$record->write(false, false, false, true);

There is no reason that static methods (that are expected to return a single object) should ever return "empty" objects.

I spent a fair amount of time reading DataObject::get_one. That function deserves more attention as it is so useful for speed.

It was unclear why we have the = false in there at first. Having read some of the above I get it.

And to respond to @dhensby's point about returning an empty object, I say Nay!, because if you even just look at the method documentation:

Return the first item matching the given query.

Emphasis on "matching the given query." Everyone expects it to be a non-object if there is no _match_ to your _query_. So, either a matching object, or not object at all. That being said, null is probably a better practice than empty objects or using false (and not to mention more consistent with our existing architecture) if you don't use exceptions, that is, since in a way it's much more type-safe.

On this topic, our other elders have spoken:
Sam: https://github.com/silverstripe/silverstripe-framework/issues/5441#issuecomment-219566610
Damian: https://github.com/silverstripe/silverstripe-framework/issues/5441#issuecomment-223185612

What next?

Would like to move forward, but it's been over a year @sminnee. Per votes here https://github.com/silverstripe/silverstripe-framework/issues/5441#issuecomment-219558142 people preferred v3 and I was the one downvote opting for master. I still think master may be best for this.

I think the marching orders on this are clear, to paraphrase:

Update the return type to null (intead of false) if no object can be found and apply this change to the master branch.

Anyone volunteering to do this? I've got a huge TODO list and this isn't high on my priorities but clearly it should be done and there's plenty of demand 馃槄

I think some thought should be given about promoting this DataObject::get_one, because the "multiple" creation of the same Object could be costly (no hard data, more a guess).

I could even imagine that when the same objects are iterated / used in several places:

$dos = MyDataObject::get();
foreach($dos as $do) {
    //do something
}

it may be faster to do:

$dos = MyDataObject::get();
$array = $dos->column('ID');
foreach($array as $id) {
    $do = DataObject::get_one('MyDataObject', array('ID' => $id));
    // do something
}

@sunnysideup how you decide to use it is totally up to you. I think the point I'm making is that if there is no MyDataObject matching the provided $id it'd pass null. Obviously in this case that wouldn't happen.

@patricknelson you are right - sorry, that was off topic ...

Was this page helpful?
0 / 5 - 0 ratings