Hangfire: Large amount of data in State table causes connection timeout and application lock exceptions

Created on 2 Dec 2015  路  5Comments  路  Source: HangfireIO/Hangfire

I have spent the whole day today trying to find out why one of our environments (UAT) was slow to the point of being unusable.
I did notice that our logs were full of instances of both the exceptions mentioned in #471 and #273.
I had noticed the connection timeout in another environment (production) as well. However, that environment was still usable and did not seem to be affected beyond having a large number of entries in the logs.
After a long and winding road of discussions with our host, looking at the code and doubting my own existence I ended up at the following:
I looked at the State table and noticed that it had reached the 2GB mark!
I decided to truncate the table.
To my surprise and disappointment at the time I wasted before I tried the truncate statement, the environment was rejuvenated and everything started working normally again!
I'm not quite sure what to suggest as I'm a bit brain dead after a day of trying to resolve this issue, but I'm happy to wait for your questions @odinserj and start from there :smile:

sql-server bug

Most helpful comment

Any news on this?

All 5 comments

+1 having same issue. Queues stopped to process and on my local machine connected to the production database i'm getting same issue as the topicstarter. I can't remove records or truncate table alltogether, because it contains needed data

this saved me.... deleted all data from the State table and things went back to normal. I'm so pissed at Hangfire right now. Wasted a lot of time trying to track this issue down. I'm using Azure SQL, fyi. Our queues got backed up, everything was sluggish, kept getting "cannot release application lock" errors (even though we upgraded to 1.5.3), and Hangfire timeouts trying to create jobs. Our SQL server was pegged at 100% utilization. As soon as we deleted data from the State table, everything went back to normal.

Any news on this?

Problem with huge amount of records in the State table is related to problem with unhandled OperationCanceledException and its derived classes that caused background job to be re-queued again and again, leading to duplicate _Processing_ state entries. This issue was fixed 1.5.0 and 1.5.9.

Problem with timeouts is related to #628, where we investigated that ExpirationManager's queries is using INDEX SCAN operators, that blocks records in State and JobParameter tables. This issue will be fixed in upcoming 1.6.2.

Problems with application locks were fixed in 1.5.3 and 1.5.6.

I've added yet another number of optimizations for SQL Azure, but most of the problems with timeouts may be solved by turning on the AUTO_UPDATE_STATISTICS_ASYNC option for your database, because the majority of them related to blocking caused by statistics updates.

ALTER DATABASE YourDBName SET AUTO_UPDATE_STATISTICS_ASYNC ON
Was this page helpful?
0 / 5 - 0 ratings

Related issues

JvanderStad picture JvanderStad  路  3Comments

plmwong picture plmwong  路  3Comments

dealproc picture dealproc  路  3Comments

jeffsugden picture jeffsugden  路  4Comments

cbmek picture cbmek  路  3Comments