Reset a program from a ‘freeze’

While doing a test run on software for an incubator the software did fine, until suddenly on the ninth day weird things started to happen: the LCD showed some odd data and in spite of a low temperature the heating had not kicked in. I resetted the software, wondering what it could be but only an hour later again something off happened. This time in spite of the temperature being well over the upper limit, the heating was not switched off.
I checked my code, which really wasn’t so complicated: Read DSB1820, read DHT11, compare with high and low limit, switch a pin on or off and write the value to an LCD and I couldn’t find a single mistake and don’t forget, it had run flawlessly for 9 days. I suspected my display. This was a 20×4 LCD with I2C module with 4k7 pull up. Nevertheless there might still have been some ‘static’ on the SDA and SCL lines that supposedly can cause the Arduino to freeze.

Obviously that is not good if you are running an incubator as you dont want to check it and find your eggs boiled.
So, other than maybe straightening out the LCD cables a bit, I decided that I needed some software protection against ‘freezes’
The only (and possibly best) way to do this is with the watchdog timer. I dont want to go into the specifics and the background of the watchdog timer, but just keep it on a practical level.
What we do is to set-up the watchdog timer to initiate a system reset after say 4 seconds. Then in our loop we do a reset of the watchdog timer so it starts counting from zero again. So as long as the program tells the watchdog timer “I am still running” nothing will happen. Should the program freeze up, it will not reset the watchdog timer and then after 4 seconds the watchdog timer will reset the entire system.
It is very well possible to use the watchdog timer by manipulating the various registers yourself, but it is much simpler to use the watchdog libary that is part of the avr libraries.
We do this as follows:

#include <avr/wdt.h>

void setup()
	//wdt_enable(WDTO_1S);// 1 sec
	wdt_enable(WDTO_2S);// 2 sec
	//wdt_enable(WDTO_4S);// 4 sec
	//wdt_enable(WDTO_8S);// 8 sec

void loop()
	//  your program

15 thoughts on “Reset a program from a ‘freeze’

  1. Good advice. I do this on all my “perpetually running” machines such as home automation, weather station receivers etc.

    1. Thanks Jeroen, In the meantime I may have located what was causing the freezing in my stripboard Attiny, most likely the 7805: If I feed it through the 7805 I get occasional problems, If I feed it with 5 Volt directly to Vcc and ground, it is OK

      1. I might be insulting you Arduino, but aren’t 7805’s considered the most stable and indestructible chips around, provided you add a decoupling capacitor on the in and out lines? Very weird if despite that the processor would trip indeed!

      2. you are not insulting me, but I am narrowing down a problem I have and the 7805 came up as a suspect.
        I have a simple thermostat program that reads a DS18b20 and switches a relay on or off depending on the temperature. It als0 reads a dht11 and prints ds18b20 temp and dht11 humidity to an LCD
        When i supply my stripboard arduino with say 9 volts through the 7805, it is bound to crash, it may take a few hours, but if i keep it overnight I am pretty sure the next morning it has crashed. I notice that because the lcd is lit up but shows no text anymore and if the heating (a lamp) happened to be swiched on, it remains as such, even if the temperature is well over the limit.
        As i thought it might be the switching of the relay, i switched to a solidstate relay with no effect.
        I noticed however that when i fed the circuit from my computer, via USB and ftdi module… i got no crash at all.
        So i suspected the PSU (a wallwart) and changed that. Still crashes.
        So, I thought if powering through my ftdi module works but through my 7805 does not, it might be the 7805 or the circuit around it (2 elcos and a polarity protection diode), so, got a psu i built (with a 7805) and supplied 5volt directly to the vcc and ground. That seemed to work as in no crashes, until I came down the next morning… yo see it had crashed…. basically clearing the suspicion on the 7805. Anyway, i connected my ftdi module again that suplies 5volt from my computers usb and it proves stable again.
        So……. currently the only thing I can think of is the location. That is the only thing different when i feed it through my computer/ftdi.

        So.. my next test (tomorrow) will be using another wallsocket as that seems the only difference, though i can hardly imagine what problem that wallsocket as. If it has irregular drop outs,i’d ecpect the arduino just to reset

      3. Thank you for the long description of your experiments. Seems to be a thorough exercise indeed. The only thing that sprang to mind was: 7805 and two elco’s, I would at least add 100 nF’s as close as possible to the 7805′ in and out and also over the processor, again as close as possible. It really does help against transients.

        On the software side, as said, I usually add a watchdog, but I also add a few other small goodies. Maybe it helps.

        1) I add a simpleTimer and let it’s callback routine toggle a am-I-alive LED. That tells me straight away if it’s really the processor (or my software!!!!!) that went to lala-land or if some periferal screwed up. If you need an example I will send it to you.

        2) When the project is at least nano based, at startup, I print the free memory. It’s mighty easy to go over stack space and then lala-land is unavoidable. Careful memory management is needed especially if you’re using an ENC ethernet controller in the project, which I like to do!

      4. yes the7805, 2 elcos and a 4001 for polarity protection were my suspects, already have 100nF close to chip,but as i got the same problem with a new psu that gave 5volt it wasnt my 7805 or elcos.

        So: direct 5V (from USB) in one location no problem
        direct 5V from other source in other location… freezes
        I had an ‘i am awake’ signal on my lcd, but lcd blanc.
        anyway, seems the difference is in location (as I cant imagine a 2nd psu, now giving 5V is faulty too)
        so now I test in the same location as where my computer is and see how that goes. Could be there is a loose wire in the wall socket I was using. Will know tomorrow. … and yes, seems that was the problem

      5. Finally pinpointed the problem. nothing to do with my circuit or software. It seems to be a problem in the wall socket I was using.
        Hours of tinkering, checking the code, checking the wiring, replacing parts, changing the code….. and then it is the socket.
        Well got that solved, now I have to see and find what the problem with the socket is

  2. Depending on the sketch you are running, sometimes is better to run an external watchdog. Now I am using a kind of arduino without a watchdog unit (just has an internal one) and sometimes it can’t recover from a freeze. I implemented an external watchdog based on a lm 7555 (I am running at 3.3), you can also use a 555 at 5volts. It is easy and you forget about other kind of problems.

    1. Interesting approach Paul. Do you let the timer simply reset the processor? Because if not, I would be interested on how you’ve implemented the “tap the watchdog” part in the loop(), unless you do that of course through a hardware line. In the latter case (yes, I am stating the obvious, sorry!) you’d have to ensure the tapping is done on a flank, not on a level of course, as you never know where the processor dashes off to lala-land.

    2. True, i have had some instances in which the internal Watchdog was not able to fully restore the program, I presume you use 1 pin of the uP to reset your watchdogtimer sort of like this: watchdog

      1. I think you would need to add a small capacitor in the line to the base of T1. If your processor freezes with the D17 high you’re still stuck.

        And why the “complicated” circuit with the diodes? Wouldn’t it be easier to let the discharge pin short out C1 and use a 1M resistor to charge it? I must be missing something!

  3. Running two domotica systems quite mission critical using ether net shield , with both dht11 and 1820b plus much more I found the best and only solution to provide system stability is to use a secondary Arduino nano (2 dollars on ebay) connected to the primary sending sync signals. If something is going wrong the secondary is doing an hardware reset on reset pin. Software reset is not a way to go. Watchdog did not prove to be 100% reliable in the sense of the non valuable reset it could provide. This said, I believe the strange things happening are caused by a buffer overflow in your sketch. In my case once exactly with the ds1820b the problem was with the text string converted to float when the temperature was going below 10 degrees C. Once you can replicate the problem at will then you are sure it is fixed. Check the text conversion dealing with results of ds1820 readouts. That is most likely your problem.

    1. Paolo, I agree that a software reset is not ideal. a secundary arduino would often be a better solution.
      With regard to my ‘problem’, nope not a buffer overrun as the program would be completely stable when fed from my computer’s USB. The problem eventually turned out to be in the wall socket of the mains grid. If i use that particular socket I would get problems within the hour. When i plug into another socket, noproblems. Could replicate that: socket 1, problems withinthe hr, other socket no problems (almost a month now of uninterrupted use)
      Your ‘below 10 degrees’issue quite interesting as well. I didnt experience that eventhough i went down to -10.
      Tnx for yr input

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s