I've recently been seeing drops in temperature during printing. My hotend temperature will be set at say 200 and will be being maintained at that +/- a few degrees. However from time to time I will see a reading of around 176.
I've spent some time digging into this by adding some debug code into the temperature.cpp file. What I see is that normally over a period of a second or so the min and max raw ADC readings for the temperature sensor are 86/87 however from time to time I get a range of 86/937. Digging into this further I added code to record the min and max time from the call to start the ADC to reading it. Normally these values are around 944/1140 uS. However when I get the above "odd" reading I get a minimum period of only 28uS. My understanding is that the Atmega 2560 ADC conversion time minimum is around 100uS so I suspect this is what is causing the glitch. My guess is that at times other interrupt routines are delaying the start of the ADC conversion resulting in the short conversion time.
This will almost certainly be Arduino specific and probably depends upon what options you have enabled (and hence how long the other interrupt calls run for).
I'm going to add code to ensure that there is always a minimum delta of 100uS between successive calls to the temperature interrupt routine. But is there a better way to fix this?
Run a print job and closely monitor the hotend temperature reports.
Expected behavior:
Temperature should remain constant within a few degrees of the set point
Actual behavior:
Temperature occasionally drops by 25 or so degrees.
Configuration files are here:
https://github.com/gloomyandy/Marlin/tree/bugfix-2.0.x/Marlin
There aren't so many things what could delay/interrupt the temperature interrupt. About the only one coming to my mind is the stepper-IRQ Try to move slower in z or go down with steps/mm.
A bit more challenging would be to investigate/change the temp-IRQ. Are the new endstop tests inserted before or after the ADC handling. Moving ADC handling before everything else would give the stepper-IRQ less chance to interrupt the temp-IRQ before the ADCs are handled - should result in less jitter - on cost of more jitter in the soft-PWM, endstops, ... .
All ADC peripherals have CONVERSION_DONE bit. This bit should ALWAYS be polled to ensure the read value is the result of a full conversion.
Yes. using the timer ISR as "delay" between phases has the risk of something delaying the execution of one temperature ISR (for example, the temperature ISR is preempted by the stepper ISR and the UART isr) and then the delay btween starting the ADC conversion and the read of the result will not be respected. There is an easy fix: As the ADC is being driven by a state machine, if the ADC result is not ready, just instead of reading the value, keep the state machine at the read state, and let the next ISR read the value
I added a simple check on the time since the last call, if it was less than 100uS then I simply did a return right at the start of the isr and in effect waited for the next isr tick. What was surprising was that I think this is happening far more often than I originally thought. Before the above change I was seeing a fluctuation in hotend temperature during printing of around 5 degrees or so on a reasonably frequent basis. After adding the above change not only do I not see the occasional big dip, but I now see hardly any fluctuation in temperature at all, so I suspect that "missing" the conversion time must be happening reasonably often though probably resulting in smaller dips in temperature readings. These dips are probably triggering an incorrect increase in the hotend temperature which is not actually required.
@ejtagle & @thinkyhead is the ADC complete bit currently exposed via the HAL in some way? If so I can try testing for this rather than my rather crude time based approach. If the complete bit is not exposed then I guess we will either need to expose it, or perhaps define the "ADC" macro in some way such that it can return a not ready indication?
There is currently no HAL exposed method for retrieving the conversion state of the ADC peripherals, but it can easily be added, The MACROS are HAL_READ_ADC and HAL_START_ADC in HAL.h, HAL_READY_ADC (or something) would work.
edit: Although I did add the HAL_adc_finished() function to the LPC176x platform it appears, so this must have crossed my mind at the time.
Hmm, I was reasonably happy to produce a PR for a crude time based fix for this, but I'm not sure I fancy creating one that requires adding new functionality to all of the HALs. What is the process for such a change? It seems unlikely that anyone could easily test them all?
I'm also a little concerned about some of the other code in this isr, will all of the other functionality (like the soft PWM code) be OK when you can have a call to call delta that is as low as 28uS? Does anything that hangs off that code require any sort of minimum pulse width?
I suspect that a lot of people will hit this issue as I really don't have anything very extreme in my configuration or machine setup. I'm just using junction deviation, linear advance and UBL with pretty standard step rates (only 16 microsteps, standard belts for X/Y and a standard screw thread on Z), operated at pretty modest speeds. I've had a few thermal runaway halts that I'm pretty sure have been caused by this.
@gloomyandy: Just add for each hal a dummy function HAL_IS_ADC_CONVERSION_DONE that returns true, and properly implement it for AVR. I can implement the DUE version. Do not skip the whole timer interrupt, just the temperature measurement part :+1:
i think i'm hit by this issue too


what i tried to "fix" it:
PID tune
New wires for thermistor all the way from Ramps to Thermistor
New thermistor cartridge from [e3d-online.com]
New heater cartridge also from [e3d-online.com]
rechecked that all wires in screw therminal are secured in place and no wires are sticking out.
rechecked that all screws in screw terminals are tight
tried to switch arround the thermistor to plug with no noise on
tried new mega board
tried another PSU
tried with a noname 100K thermistor directly to ramps board, wires are just long enough
checked that thermistor pin headers make good contact by bending them just slightly apart
tried to drive extruder heater from bed output to rule out a bad FET
@gloomyandy
to test this am i correct that the only 2 files i have to overwrite are
Marlin/src/HAL/HAL_AVR/HAL.h
Marlin/src/module/temperature.cpp
?
Yes that should do it!
Assuming you are using AVR of course!
oki... you will know shortly if it did the trick for me
i just need to move wires for nozzle and bed back to where they belong
and btw the 2 graphs i put in are not the worst ones :-P
i'm.... RAmps+Mega
it helped a LOT for the extruder heater, but printing in PLA i have the bed off and now that gets crazy amount of positive spikes, and extruder 2 only have a temp sensor connected for now and that also gets spikes even thou its just hanging outside far away

both blue and orange should be flat at this point, blue will rise slowly in the start as the hot plastic is put down
Perhaps you should try without my changes but exactly the same hardware setup and see if you still get the spikes on the other sensors, they may not be anything to do with the firmware changes.
without your changes but the same setup i dont get spikes on the bed, look at my graph just a few post up
what is your setup?
would it make sense if i had the excat same setup but just did the same print without your changes?
But that graph was produced before you moved around various wires/connections (you said above that you needed to change some things back before you tested the updated firmware), perhaps in making these changes you are now picking up noise. Best to only change one thing at a time if possible, so worth doing the test with and without the firmware changes with exactly the same hardware setup.
yep i had the extruder heater on the bed output to test if the FET's was going bad and making noise
that was all
but yes... i will let this print run to end, then undo ONLY your changes and do it again with the same gcode

same gcode and no wires touched - without your changes
with your changes

and still with no changes to hardware but i printed a fitment/tolrance test and turned the bed on

what is strange is that the orange line has spikes and its just a dummy sensor i have to get ambient temp
Thanks for trying that I've just fixed a couple of stupid copy and paste errors one of which may have been causing the problems you are seeing. Could you give the updated version a try?
yep sure
i'm trying to get my tolerances right (square box with circle hole and a matching disc) so i will be doing a lot of those small prints anyway
still about the same

I'm surprised at that can you check to make sure that you have the following lines
if (HAL_IS_CONVERSION_DONE()) \
var += HAL_READ_ADC; \
else \
adc_sensor_state = (ADCSensorState)(int(adc_sensor_state) - 1);
and in particular that the last line is not
adc_sensor_state = (ADCSensorState)(int(SensorsReady) - 1);
Thanks
/*
* This gives each ADC 0.9765ms to charge up.
*/
#define ADD_ADC_SAMPLE_IF_READY(var) \
if (HAL_IS_CONVERSION_DONE()) \
var += HAL_READ_ADC; \
else \
adc_sensor_state = (ADCSensorState)(int(adc_sensor_state) - 1);
switch (adc_sensor_state) {
case SensorsReady: {
i copied the whole file so everything is ok...
what i do on github is viewing the whole file in raw and i just select everything and copy... and in notepad++ i select everything and hit insert
but yeah its a strange one
whole file as is here: https://pastebin.com/0XdCdaWg
Hmm that is strange. I'll see if I can reproduce the temp glitches on the bed. Could you post the section from your configuration file that has the various temp sensor configurations...
the whole configuration.h: https://pastebin.com/SXvUbXLW
partner is home and asked if i could go help with shopping bags hence the link to whole file
No problem!
I've just fixed a couple of other mistakes (that I missed in the last change), I doubt they will make any difference. But it is probably worth updating. I'm going to try and set things up to match how you have things and see if I get the same glitches.
might be worth noting that i have a dummy extruder in my setup to allow me to see ambient/chamber temp directly in octoprint
the chamber temp feature only sends the temp on serial and i cant see that in octoprint, only if i look at the terminal..
chamber is the orange line
what i did was adding a stepper with free pins and then i changed settings to say i have 2 extruders
so yeah i used free pins to create another extruder stepper and there was also free pins for a 2nd heater and fan.... my goal was to be able to use that and have an electrical heater at some point for ABS prints, but that would prob not work as it would be to slow in heating up. so for now i just use it to check temp
for the fun of it my configuration_adv.h: https://pastebin.com/SL3nMPBi
Thanks! I don't think I can recreate your setup as I have my extruder cooling fan on the mosfet that would be used for E2. But I think I've been able to reproduce the issue with just a single extruder. Will let you know!
I dont have a 2nd extruder heater or fan connected, only the temp sensor for a 2nd extruder,
the rest are just dummies created with free pin numbers
:-D
OK I've just pushed a couple of changes that I think may help things. You will need temperature.cpp and temperature.h. If you get chance could you give it a try?
I will do the first thing in the morning while its still reasonble cold
all this heat makes my brain go very slow :-(
i had a cold drink and did a test

tomorrow i will go back to using my 48A ATX psu but i expect no changes other than psu config :-D
someone on reprap.org suggested that temp readings should be an avg of the last 10 readings
i dont know if that would have any negative effects?
but lets say that we have 60 readings every sec (maybe overkill) so instead of reporting 60 readings we report the avg of every 10 and only report back 6 readings... just an example as i dont know how many readings are actually taken
The reading you see is already an average of 16 readings a reading is taken every 10mS and a new averaged sample is produced every 160mS, so you do indeed get approx 6 readings per second. There may be additional processing on this reading before you see it, but I've not looked at that.
my thought was just that do we need that much?
Well I guess you need a reasonably high frequency for the PID controller to be able to maintain the temperature. The averaging will help with any noise issues. Perhaps you could sample less often, but I suspect the cpu saving would not be that great.
I think I'm seeing the same problem. Most of the time the hotend temp is very stable, but then I get times where the temperature becomes very erratic and sometimes causes a runaway thermal reset. Like @boelle I've replaced the entire heater block with the latest from e3d which has the thermistor cartridge with the nice plastic connector that hooks to a single run of wire to the RAMPS board. I'm using a version of bugfix-2.0.x by tcm0116 that fixes another bug involving retraction. I'll probably try this fix and see if I see things being more stable.
@thinkyhead, has this fix been merged into the main Marlin bugfix-2.0.x branch?
I assume with all this gabbing that someone actually did something. The last comment was July 29. If it hasn't been addressed by now then someone who has a clue needs to get on it.
@gloomyandy, @boelle can you confirm that this fix got into the main bugfix branch? Today, I had the temp jump to about 450C in a couple seconds. I was really lucky it came down before the thermal protection kicked in.
Yes the code has been in for some time, I don't think the problem you are seeing is this one. I think you need to open a new issue and provide a lot more details.
It could still be the same problem. I've been on a branch from around the time you started this thread due to another issue that was fixed by that fork. Once I verify that other fix has also been merged, then I can go back on the main bugfix-2.0.x branch. But either way, if the problem still exists after I go back on the main branch, I'll put in another ticket.
@brucehvn
at least provide a temp graph like i did, it will tell a lot more than words :-D
Ok, luckily I was able to just grab one when I noticed the temp was fluctuating again. I'm using Repetier Server and it doesn't show a large time frame in the graph. This is a fairly small fluctuation compared to some I get.

Another example during the same print after about 20 minutes of pretty stable temps.

this is a little more than a "glitch" or spike
sure its not a thermistor that has one leg in the coffin?
@brucehvn
What processor do you use?
The above 'patch' does only work for the AVR-processors.
If i'v seen it right the other platforms all do use analogRead() storing in a variable to get the value later and fake the conversion ready test. Because analogRead() is busy waiting for the result this should work nicely, but is poisoning the timing.
Got surprising results.
Testet the duration of analogRead()
// MEGA time per ADC sample = 112.0000000000µs = 8.9285717010 Khz
// DUE time per ADC sample = 4.0000000000µs = 250.0000000000 Khz 30 times faster
// F429ZI time per ADC sample = 53.0000000000µs = 18.8679245283 Khz 2,113207547169811 times faster That's disappointing
For the MEGA this is the expected bad value. For good reason we don't use anologRead() at the AVRs, but start the conversion and pick up the result later.
At the DUE analoRead() is about 30 times faster. So we can probably use it.
The F429ZI is an 180MHz ARM and for that surprisingly slow in analogeRead() with stm32duino/Arduino_Core_STM32 kernel.
How does your board perform?
const int analogInPin = A0;
int sensorValue = 0;
void setup() {
Serial.begin(9600);
while (!Serial) {}; // wait for serial port to connect. Needed for native USB port only
}
#define NSAMPLES 100000
void loop() {
unsigned long time1 = micros();
for (long long i = NSAMPLES; i >= 0; i--)
sensorValue += analogRead(analogInPin);
unsigned long time2 = micros();
Serial.print("time per ADC sample = ");
double msps = (time2 - time1) / (NSAMPLES);
Serial.print(msps ,10);
Serial.print("µs = ");
Serial.print(1000.0/msps,10);
Serial.println(" Khz");
delay(2);
}
// MEGA time per ADC sample = 112.0000000000µs = 8.9285717010 Khz
// DUE time per ADC sample = 4.0000000000µs = 250.0000000000 Khz 30 times faster
// F429ZI time per ADC sample = 53.0000000000µs = 18.8679245283 Khz 2,113207547169811 times faster That's disapointing
@AnHardt : Not all 32bit processors have integrated fast ADC converters. Some of them could be using a ramp based (delta) ADC, that are quite precise, but very slow...
@AnHardt I'm using the typical Mega 2560 AVR with Ramps 1.4. I've just installed the latest bugfix-2.0.x code and will give it a try first before pursuing this any further.
@boelle I would tend to agree with you except I've completely replaced the heater block with the latest e3d v6 heater block with the thermistor cartridge assembly. It could be a problem with that, but I saw the same behavior with the old heater block and thermistor, so if it's hardware related, it would probably be something with the analog functionality of the ramps or mega board. I had another issue where the z motors were turning a little bit when a retract/unretract was executed. I spent 2 days swapping out all the hardware, ramps, mega, lcd display, etc., only to find out it was a firmware issue that was being worked on. So nowadays, I check the firmware first, then move on to the hardware if needed.
@ejtagle
Got some more numbers.
// MEGA/UNO/NANO time per ADC sample = 112.0000000000µs = 8.9285717010 Khz
// DUE time per ADC sample = 4.0000000000µs = 250.0000000000 Khz
// ESP32 time per ADC sample = 9.0000000000µs = 111.1111111111 Khz
// Teensy++ 2.0 time per ADC sample = 60.0000000000µs = 16.6666666667 Khz
// Teensy 3.5 time per ADC sample = 8.0000000000µs = 125.0000000000 Khz
// Teensy 3.6 time per ADC sample = 6.0000000000µs = 166.6666666667 Khz
// STM32F401RE time per ADC sample = 56.0000000000µs = 17.8571428571 Khz
// STM32F411RE time per ADC sample = 53.0000000000µs = 18.8679245283 Khz
// STM32F429ZI time per ADC sample = 53.0000000000µs = 18.8679245283 Khz
The STM32s ADCs are fast - 1Msample/s should be reachable with the right use. It's the current implementation of analogRead() what takes that long (stm32duino/Arduino_Core_STM32 kernel). For every analogRead() the ADC is - completely set up, start a calibration, wait for calibration, start a single measurement, wait for measurement, shut down ADC.
@AnHardt : That is the problem with Arduino core--- It is a general purpose framework, and when trying to get maximum performance from the hw, it starts to interfere.
All the digitalXXX() and analogXXX() functions reinitialize everything, just in case, so they are pretty slow. And that is the reason of the fastio.h macros.
I guess that if analogXXX() functions are getting in the way of performance, they should be reimplemented (hey!, performing an ADC conversion is not that hard at all ... ;)
(And BTW, yes, i HATE polling on embeded systems... on STM we should be doing it the same way than in AVR, using an state machine...)
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.