Jira issue originally created by user fchris82:
I created fixtures and some data was inserted many times without calling the Task entity PrePersist event listener.
I printed the used and generated hash and I saw a Proxies\*_CG_*\Asitly\ProjectManagementBundle\Entity\User hash equal a Task entity hash!
Comment created by @ocramius:
Please provide either a code example or a test case. As it stands, this issue is incomplete
Comment created by @beberlei:
Are you calling EntityManager#clear() inbetween? Because PHP reuses the hashes. The ORM accounts for this.
Comment created by fchris82:
Hi!
http://diginin.hu/splobject_hashbug.zip
Chris
Comment created by @beberlei:
This is not a reproduce case, i don't want to execute your whole project.
I want to know, what is the actual bug that you see? Can you just print a list of all the hashes? Because the hashes dont differ at the end, bu tjust somewhere in the middle.
Comment created by fchris82:
I attached a hashlogs.txt file. The last Task class hash is 0000000050ab4aba0000000058e1cb12 ( line 3 129 )
This is not unique, view the line 2 760 . The Task is not being saved and the program don't call the prePersist listener. The "UnitOfWork" believe the entity has been saved because the isset($this->entityStates[$oid]) is true. But it is an other entity.
Comment created by fchris82:
The EntityManager::clear() fix the problem, but this is not "good" and "beautiful" solution. Shows no sign of that conflicts were and this is causing the problem. I was looking for the problem 7 hours.
Comment created by @ocramius:
One possible issue here is that a listener registers an entity as managed while a proxy is being loaded.
The given data is still insufficient to actually verify the problem.
I can confirm having this issue. My case is working on quite big collection of entities, and code (simplified) looks like this:
//$idsArr is about 2k elements
foreach ($idsArr as $id) {
$entityA = new A($id);
$em->persist($entityA);
}
$em->flush();
//some code happening in between but everything is happening during single request
//$anotherIdsArr is about 2k items
foreach ($anotherIdsArr as $anotherId) {
$entityB = new B($someConstValue, $anotherId);
$em->persis($entityB);
}
$em->flush();
Class A definition:
/**
* @ORM\Entity()
* @ORM\Table(name="a")
*/
class A
{
/**
* @var integer
*
* @ORM\Id
* @ORM\Column(type="integer")
*/
protected $l1;
/**
* @param integer $l1
*/
public function __construct($l1)
{
$this->l1 = $l1;
}
}
Class B definition
/**
* @ORM\Entity()
* @ORM\Table(name="b")
*/
class B
{
/**
* @var integer
*
* @ORM\Id
* @ORM\Column(type="integer")
*/
protected $l1;
/**
* @var integer
*
* @ORM\Id
* @ORM\Column(type="integer")
*/
protected $l2;
/**
* @param integer $l1
* @param integer $l2
*/
public function __construct($l1, $l2)
{
$this->l1 = $l1;
$this->l2 = $l2;
}
}
While trying to persist B instance
$entityB = new B(12, 8995);
I get
$oid == '0000000013204a0200007fc5ffc03559'
and such element already exists in $this->entityIdentifiers so getEntityState method returns STATE_MANAGED so $entityB is not persisted.
$this->entityIdentifiers['0000000013204a0200007fc5ffc03559'] stores instance of A entity: $a = new A(6148)
I can also confirm that calling $em->clear(A::class) before second loop fixes the issue but it's rather a workaround.
I think the issue here is that according to doc spl_object_hash can return same hash if object is destroyed and this is what is happening in my case
entityIdentifiers keeps a reference of type A, therefore that A wasn't garbage-collected. This means that the issue comes from somewhere else.
As I already stated above, this issue needs a reproducible test case.
the thing is it looks like A was garbage collected - as I stated in comment there's quite a lot happening between these two loops so it might have been garbage collected
@Ocramius I believe I have a reproducible test case here ~~> https://github.com/thebuccaneersden/doctrine-bug
@thebuccaneersden no, that is not a test case, that is an application that I'd have to set up and run, and that's not going to happen, sorry :-\
Please refer to https://github.com/doctrine/doctrine2/tree/1c2b7c968561f2e319117d7860c88e8c0cad3c7a/tests/Doctrine/Tests/ORM/Functional/Ticket for a list of existing functional test cases.
I just had the same issue: the spl_object_hash of an entity was the same than a previous one (completely different class), the $uow->getEntityIdentifier() was returning a wrong ID (the one of the previous one) and therefore the BasicEntityPersister::prepareUpdateData method was failing.
Calling $em->clear() worked around the bug.
This test triggers the problem consistently for me.
namespace Doctrine\Tests\ORM\Functional;
use Doctrine\Tests\Models\CMS\CmsArticle;
use Doctrine\Tests\Models\CMS\CmsUser;
use Doctrine\Tests\OrmFunctionalTestCase;
class HashCollisionTest extends OrmFunctionalTestCase
{
protected function setUp()
{
$this->useModelSet('cms');
parent::setUp();
}
public function testHashCollision() {
$user = new CmsUser();
$user->username = "Test";
$user->name = "Test";
$this->_em->persist($user);
$this->_em->flush();
$articles = [];
for($i=0;$i<100;$i++) {
$article = new CmsArticle();
$article->topic = "Test";
$article->text = "Test";
$article->setAuthor($this->_em->merge($user));
$this->_em->persist($article);
$this->_em->flush();
$this->_em->clear();
$articles [] = $article;
}
$user = $this->_em->merge($user);
foreach($articles as $article) {
$article = $this->_em->merge($article);
$article->setAuthor($user);
}
unset($article);
gc_collect_cycles();
$keep = [];
for($x=0;$x<1000;$x++) {
$keep[] = $article = new CmsArticle();
$article->topic = "Test";
$article->text = "Test";
$article->setAuthor($this->_em->merge($user));
$this->_em->persist($article);
$this->_em->flush();
$this->assertNotNull($article->id, "Article wasn't persisted on iteration $x");
}
}
}
The problem is in the loop that merges the articles. Every article->user is its own seperate instance of a proxy object. The spl_object_hash of these objects get added to entityStates/entityIdentifiers in UnitOfWork->registerManaged(). However, no reference to the object is kept here. RegisterManaged calls UnitOfWork->addToIdentityMap(). The identityMap does keep a reference to the object, but only for the first object with the same id. This means that for most of the CmsUser Proxy objects, no reference is kept from within doctrine, so if I break the last reference in my code by calling $article->setAuthor($user), the proxy object may be garbage collected, but UnitOfWork->entityStates and UnitOfWork->entityIdentifiers still contain the spl_object_hash of the now non existent object.
If I now try to create a new CmsArticle, php may use the same memory location, generating the same object hash. In this case that causes doctrine to think it's already managing the entity, and not persisting it.
We're also reproducing the issue. From our point of view, it appears to be because entities are kept into $identityMap using a strategy that is different than others maps ($entityIdentifiers, $originalEntityData, etc. which are oid based). When an entity is created using the same ids as a previous one, the new entity is tracked by maps oid-based, but it's NOT tracked by the $identityMap (because the check is based on keys, not on oid) . If the previous one is garbage collected (this is eventually happening anytime), the $identityMap will be confused when it comes to find the associated entity, resulting into a problematic skip. We're about to provide a fix, if someone can validate the way we're understanding the issue, this could save us time before writing code. We're about to create something like an IdentityMapManager that will track ALL entities:
something like that in addToIdentityMap(): $this->identityMap[$className][$idHash][$oid] = $entity;
@sandvige We worked around it by implementing an EntityMangerDecorator that keeps a reference to every entity that passes through it; only releasing it when clear() is called. I'd guess that ends up being functionally the same as your suggestion.
Keeping a reference to all entities in this way is far from ideal but it's the best we could do.
An anonimized version of our implementation is:
class OurEntityManager extends EntityManagerDecorator {
/** @var array */
private $references = [];
public function __construct(EntityManagerInterface $wrapped) {
parent::__construct($wrapped);
}
public function clear($objectName = null) {
if($objectName === null) {
$this->references = [];
}
parent::clear($objectName);
}
public function detach($object){
unset($this->references[spl_object_hash($object)]);
parent::detach($object);
}
public function remove($object) {
$this->references[spl_object_hash($object)] = $object;
parent::remove($object);
}
public function persist($object) {
$this->references[spl_object_hash($object)] = $object;
parent::persist($object);
}
public function merge($object) {
$newObject = parent::merge($object);
$this->references[spl_object_hash($newObject)] = $newObject;
foreach($this->getClassProperties($newObject) as $property) {
$value = $property->getValue($newObject);
if(is_object($value) && !($value instanceof \DateTime)) {
$this->references[spl_object_hash($value)] = $value;
}
}
return $newObject;
}
private $classPropertiesCache = [];
private function getClassProperties($entity){
$class = get_class($entity);
unset($this->classPropertiesCache[$class]);
if(!isset($this->classPropertiesCache[$class])) {
$arr = [];
$this->__getClassproperties(new \ReflectionClass($entity), $arr);
$this->classPropertiesCache[$class] = $arr;
}
return $this->classPropertiesCache[$class];
}
private function __getClassproperties(\ReflectionClass $reflectionClass, &$results) {
$metaData = null;
try {
$metaData = $this->wrapped->getMetadataFactory()->getMetadataFor($reflectionClass->getName());
}catch(MappingException $mex) {
// Skip when a subclass is not a valid entity
}
if($metaData !== null) {
foreach($reflectionClass->getProperties() as $property) {
$propertyName = $property->getName();
if(isset($metaData->associationMappings[$propertyName])) {
if(isset($metaData->associationMappings[$propertyName])) {
$property->setAccessible(true);
$results[] = $property;
}
}
}
}
if($reflectionClass->getParentClass() !== false) {
$this->__getClassProperties($reflectionClass->getParentClass(), $results);
}
return $results;
}
}
If I understand the problem correctly, we are getting an entity garbage collected while we still have a reference to its OID: does the problem present itself also on master?
@petervf Thanks for the code! Indeed, this is also a valid approach. @Ocramius As long as we're only keeping one entity here: https://github.com/doctrine/doctrine2/blob/master/lib/Doctrine/ORM/UnitOfWork.php#L1355
By the way, the function returns false when this is happening, but every piece of code using this method is giving approximately 0 sh*ts :D
@Ocramius I added code using the provided unit tests (this is failing without a fix), and I also added a dumb fix showing unit tests going green. I will work on it in the next day, I just wanted to bump this issue as I'll provide a way to handle the issue by keeping an array that maps entities with oids (the missing part I was explaining a few comments above)
Really bad choice to use spl_object_hash(), since it is not reliable when dealing with too much garbage collection. It just doesnt provide an unique representation of all object, forcing us to have to call $entityManager->clear() all the time.
@carlospauluk there is no other concept of object identity in php-src. As long as objects are not GC'd, the identifiers are also not reused, hence the bug must be in a retained identifier in the UoW.
From the tests in https://github.com/doctrine/doctrine2/pull/7407, it seems like the bug is when merging entities, where an object is GC'd, while the identity map is not correctly cleaned.
If you want to help out, please try and debug the scenario to find the exact location of the retained identifier. Regardless if spl_object_hash() or spl_object_id() are used, that's our only way to retain object identity in a map in this programming language.
@Ocramius & @carlospauluk spl_object_hash() and spl_object_id() are ok if we manage to keep a reference to the associated objects (which is exactly what we do in the provided fix, with the oidMap attribute). If the bug is correctly understood, why don't we merge it? Because of the tests? We cannot provide an exact test, because the GC is running across ALL the running tests, making it impossible to predict its behavior. This is why the provided unit test is trying to "saturate" the memory to maximize our chance for the GC to run.
why don't we merge it? Because of the tests?
Precisely. No test => no regression verification => no way to check if we don't break this accidentally further down the line.
Let me be very clear that i do not merge nor approve patches that do not have automated verification. There are exclusions to this, but this isn't the case.
the GC is running across ALL the running tests
Then let's use @runInSeparateProcess, or a separate CLI script that is only there to verify this specific scenario, but it is required to run in CI (with low -dmemory_limit=).
No reliable automation => no merge.
If I'm getting it right, we can see this divergence when we look at the functions isInIdentityMap () and getEntityState () in the UnitOfWork.php.
I'm really beginning with the doctrine internals code, but I see that getEntityState () relies only on the $ oid = spl_object_hash ($ entity) to check the state of an entity, while
isInIdentityMap () relies on the spl_object_hash plus the "$ classMetadata-> rootEntityName".
This is exactly the problem that I'm getting here with my code.
Indeed @Ocramius, after I made a bunch of merges with detached entities, someway the GC seems to rip off the references just after the finish of my "mergeAll" function, and then when I try to persist a fresh new entity (a few functions later in the same thread), the getEntityState () gets "confused" and returns that this new entity is already persisted and does not schedule the INSERT.
Why do not use the double check (rootEntityName + spl_object_hash) to ensure that we're talking about the same object instead of only using the weak spl_object_hash () logic?
@Ocramius You're right, you're the man in charge of quality, you're doing it right. But I'm providing tests. What is wrong with the tests I'm providing except the resources its using? Finally, your last comment is just saying "everything is good". I was just arguing why we need to use 1000+ entities to reproduce, and why we cannot reproduce with only 2. What is wrong?
@Ocramius Oh! Ok, I will remove this fix from this PR, and add it to another with the "Missing Tests" flag. Thanks for your input
@Ocramius I updated the PR :)
I hope this will be fixed soon!
My temporary fix until this issue has been resolved.
interface UniqueIdentifier
{
/**
* Returns a unique entity id.
*
* @return string
*/
public function uniqueId(): string;
}
/**
* Calculates a unique id.
*
* @param mixed $entity
* @return string
*/
public static function calculateUniqueId($entity)
{
if ($entity instanceof UniqueIdentifier) {
return $entity->uniqueId();
}
// Risk of collisions
return \spl_object_hash($entity);
}
````
abstract class Entity implements UniqueIdentifier {
/**
* @var string
*/
private $uniqueId;
/**
* @inheritDoc
*/
public function uniqueId(): string
{
if ($this->uniqueId === null) {
$this->uniqueId = uniqid();
}
return $this->uniqueId;
}
}
Got bit by this today, couldn't figure out why an entity wasn't persisting/flushing, hash collision...
As a solution a weakmap can be used (php 8).
See https://wiki.php.net/rfc/weak_maps for more details and how it relates to spl_object_hash.
Most helpful comment
I hope this will be fixed soon!