Ardupilot: Watchdog reset - Cube Black using Copter-3.6.4 and 3.6.10

Created on 4 Nov 2019  路  17Comments  路  Source: ArduPilot/ardupilot

Bug report

Issue details
We have now had two large Hex's (clients) come down due to watchdog resets, I believe the issues are the same as (https://github.com/ArduPilot/ardupilot/issues/11296) but in our case they are on PH2.1 Black Cube on hexs,

The first was a seasoned aircraft which recently had its engines upgraded running AC3.6.4 (other than increasing the motors by 10KV there were no other changes to the aircraft.). The aircraft completed a number of flights (20-30) before the cube restarted in flight falling 20m into a carpark. An investigation was conducted and there was no hardware issues found. Attached is the both the Flight log and Tlog (Flight 1). In the Tlog the flight controllers messages can be seen as the FC restarted. The Bin file shows up to the failure (no logging while disarmed).

The second incident involved a brand new Hex with about 10 hours on it, The aircraft was the same model as the previous one but with a few upgrades (F9P gps and new power system), running firmware AC3.6.10. The flight controller reset with the aircraft at 160m where it fell to the ground... In this case the TLog (2019-11-01 07-22-36) reported the watchdog as shown in below pic, Log disarmed was turned on, Bin logs 37 and 38 show the lead up ...and ride down ... It has been noted that in log 38 Baro2 is not present where it was in log 37 - this was noted with issues with solos

image

It seems like the i2c issue reported by the modded solo owners, however these were large expensive craft using trusted hardware. Both aircraft showed no issues leading upto the failure, and both resulted in total loss of aircraft.

Version
AC3.6.4
AC3.6.10

Platform
[ ] All
[ ] AntennaTracker
[X] Copter
[ ] Plane
[ ] Rover
[ ] Submarine

Airframe type
Hex

Hardware type
Black Cube standard carrier

Logs
Logs - Watchdog.zip

DevCallTopic

Most helpful comment

@RoelHelsen, Copter-3.6.12 is out and I think it resolves all the known cases of I2C lock-ups.

We discussed the timing for rolling out 4.0 as well and so our plan is to release 4.0.0-rc3 for beta testing on Monday (Dec 16th), test it for a week and if no critical issues are discovered release it on Dec 23rd. The thing is that this assumes we don't find any critical issues. If we find some them it will be another week, etc.

All 17 comments

Ok, thanks for the report. This is the first report I have seen of a CubeBlack being affected but an I2C storm is theoretically possible on any flight controller. I strongly recommend upgrading to 3.6.11 which has a fix.

We so far haven't seen any cases of this from 3.6.11 users.

I have grounded aircraft in service for time being (this particular model) until i test 3.6.11, I also haven't seen the storm issue on the black cube, If you need any other info let me know

@ScottySpooner, we discussed this on the dev call yesterday and the consensus was that it's most likely the I2C storm issue and that upgrading to 3.6.11 is the best first step. We haven't seen any reports like this on 3.6.11 so we think there is a very good chance that it will protect against the problem.

We will do some measurements on the I2C bus to confirm that. Do you have any advice to help measure / reproduce this I2C storm ? Is there a mode for which the I2C bus is more solicited ?

@tilaktilak,

It's very hard to reproduce but we've sometimes had luck using QMC compasses with long I2C lines. Also putting a source of interference near the line helps. I think @tridge used a power drill once but I imagine a powerful telemetry radio (like an RFD900 or an analog video system) might also work.

Could be related #12862
First tests with 2m long I2C lines does not reproduce the bug ( signal does not look noisy on the scope).
Shortly before the incident we observe a "Baro no Healthy", not sure it's enough to conclude that it's a i2c storm ?

Just a question: In your report you say that you were using AC3.6.4 for the first crash, but the attached logs of that flight (Flight1) show AC3.5.3 with NuttX OS.
If that is true, the issue might not only be in ChibiOS or it might be unrelated to the ChibiOS I2C storm problem and be something different entirely.

Ahh that is an interesting development, I believe the two crashes are unrelated. The aircraft from the first crash was meant to be running 3.6.6 according to the build sheet, Running NutX would have not been compatible with the hardware setup.

Logs of what appears to be exactly the same "i2c storm" reboot, on 3.6.10, this is a 50kg octocopter with as much redundancy as physically/electrically plausible.

https://drive.google.com/open?id=1xiVtLoIUfmZNYh_GbWcq_5abe2ml9aGq

Im curious as to why all these i2c storm events seem to feature baro 2 missing on watchdog reboot.

There are definitely important fixes for the I2C storm issue in 3.6.11 and we highly recommend upgrading (to 3.6.11). We've also found another (rare) I2C failure issue that can lead to a board locking which we have fixed in Copter-3.6.12-rc1. 3.6.12 will become the default version either tomorrow (Friday) or Monday.

Really sorry about these crashes, we're fixing them as quickly as we possibly can after finding them. We think they are quite rare but I know that's no consolation if it's your vehicle that hit the problem and came down.

Are there any tell signs for the I2C storm visible in the logs, to differentiate it from any other reason for the flight controller to crash? Without a way to replicate the issue, and then prove 3.6.11 has solved it, am in a best guess scenario as to the cause for the incident, and then the fix itself. This was a customers aircraft, and will be a reportable event to regulator.

From the result of this investigation: with the previous version of Copter, there is no way to identify it from the logs. We can only conclude that the drone has been reboot by the watchdog.
@tridge can you confirm that? Also is there any more logs added to 3.6.12 as well?

The 3.6.xx series has very little information logged after a watchdog. We greatly expanded the logging of information in 4.0.x. I'll have a look at the log and see if I can spot anything useful.

@tridge , @rmackay9 Is there already an idea about when Copter-4.0 will be released?

@RoelHelsen, Copter-3.6.12 is out and I think it resolves all the known cases of I2C lock-ups.

We discussed the timing for rolling out 4.0 as well and so our plan is to release 4.0.0-rc3 for beta testing on Monday (Dec 16th), test it for a week and if no critical issues are discovered release it on Dec 23rd. The thing is that this assumes we don't find any critical issues. If we find some them it will be another week, etc.

4.0 is now out.

We believe it (and the 3.6.12 release) address these issues.

Should we close this?

Yes, I agree we should close this because we've addressed all the know causes and implemented (in Copter-4.0.0) the general fix as well. We can re-open or open a new issue if we get new report.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

JoshWelsh picture JoshWelsh  路  4Comments

lordneeko picture lordneeko  路  9Comments

machenxiang picture machenxiang  路  8Comments

jinchengde picture jinchengde  路  4Comments

dongtuu picture dongtuu  路  3Comments