Discord.py: discord.ext.tasks.loop doesn't handle exceptions properly

Created on 12 May 2019 · 5Comments · Source: Rapptz/discord.py

Summary

An exception in the looped function causes the function to fail silently and prevents it from being called in the future, after X seconds/hours/days.

Reproduction Steps

from discord.ext import commands, tasks

bot = commands.Bot(command_prefix='!')

@tasks.loop(seconds=3)
async def fun():
    print('Looping')
    1/0  # Raise a division by zero exception

fun.start()

bot.run('<token>')

Expected Results

To see an actual stacktrace ~~on the screen printed with print or~~ using a default logger
The function should be restarted (and, obviously, fail again) in the requested 3 seconds

Actual Results

The function stops running forever. Moreover, there is no notification that any exception happened at all. If the exception happens because of a very rare corner case, you may not even notice that something happened until a few days later, depending on what that function was supposed to be doing.

Note: I'm aware of add_exception_type but using it:

Requires you to know of every possible exception that can be raised in advance [and we're not talking about the handling of expected exceptions but about being notified of an unexpected exception ]
Makes the function restart using the exponential backoff (so after 1 second, at the begining!) and completely ignores the function's interval that may be set to 1 hour, for example.

System Information

discord.py version: 1.1.0
Python version: 3.5.4
Operating system: Windows 7

Source

overfl0

👍2

Most helpful comment

Thanks, I'm looking forward to seeing the stacktrace!

Just to get this out of the way: the except Exception I mentioned was only to print the error and re-raise afterwards. It won't be needed if the stacktrace will be printed automatically, in the future.

Regarding the second point, the unconditional ZeroDivisionError was obviously here just to provide a simple reproduction for the github issue.
What I have in mind when I say I want retrying after the original delay are real corner cases that happen sparingly and may be caused by either 3rd-party libraries and/or user input.

Example from the top of my head, may not be the best one: your bot is iterating over the player names on a server using the steam protocol and dumps some statistics based on them. It will work correctly 99.9% of the time, until one player decides to use a username with an invalid utf8 character on one single server of all the ones you're monitoring (not sure what it is, but I've seen some python-valve libraries throw exceptions because of that in the past).
Now stumbling onto that situation once will prevent your background script from running forever, when it could have just been failing for the 30 minutes while the player was online (and dumping stacktraces for the developer to analyze and fix in the future) and then start working again for the next day or maybe even week.

I know that in an ideal world your software should be 100% bug-free and you should strive to fix all your bugs ASAP, but not re-running a background task after it has failed once because of a corner-case in one if clause, would be the equivalent of doing a sys.exit(1) right away, on a bot that failed handling a single command.

I'd suggest to reconsider your decision or at least to make it configurable by the user :).

overfl0 on 12 May 2019

👍3

All 5 comments

I don't see why printing to the screen would be default behavior, these are background tasks after all...nor do I see why "Requires you to know of every possible exception that can be raised in advance" is a bad thing. If something unexpected happens in your loop then typically you do not want it to continue looping. If you know that something can raise inside it but you don't care, well then that is precisely what add_exception_type is for.

You can include logging/printing inside of your task if you want that; the only thing in this issue that I see being valid is the very last point you make. I can definitely see cases where a "valid exception" happens but you still want to continue with the delay you've given.

Phxntxm on 12 May 2019

I don't see why printing to the screen would be default behavior

It's not about printing things with print. It's about getting any feedback that your code has failed. Right now, it's just silenced and stopped. You're not even notified that anything went wrong! [removed the print part in the description to not confuse people anymore]

then typically you do not want it to continue looping.

Tools like celerybeat or even crontab always ignore whether the task has finished successfully or not and just do what they are asked to do: run the task at a given interval. It's the programmer's or admin's job to notice the errors and act upon them. That's what I would normally expect of a tool running background tasks :)

EDIT: Additionally, tools like sentry can notice that some exceptions are raised repeatedly (meaning it's a more important issue) and inform you of that fact!

You can include logging/printing inside of your task

I would then have to encompass my whole function with a big "try/except Exception" to see where and why my function fails. And remember that we're now talking about unexpected errors, like a HTTP/500 error happening on the 3rd party site, once a week. That's why python programs usually print stacktraces on your screen and don't silently discard them.

That's definitely not the behavior a programmer is expecting to see.

overfl0 on 12 May 2019

👍1

This task extension encompasses asyncio's tasks which do not give feedback to stdout/logging...this mimics this behavior (because well, it's using it). You have the task object returned when you call start, you have the task object via the get_task method. You can use this to retrieve the exception of the task. This is how asyncio tasks function outside of this scope, stacktraces are printed to your screen, they are accessible via this lookup on the task object.

The only beneficial things surrounding this would perhaps be an on_task_error type method that is called on on things not included in the valid exceptions....as well as something I mentioned in the first reply to allow the delay to trigger on valid exceptions. Just know that you do have feedback of this error, it is via the exception method on the task as mentioned above.

Phxntxm on 12 May 2019

Your first point about printing to a logger is fine and something I was planning on doing anyway.

Your second point on retrying in case of any failure is not fine and I don't want to do it. Just because other libraries ignore exceptions by swallowing them and then unconditionally retrying doesn't mean that I have to follow suit on a design that I consider bad.

If some code is going to raise ZeroDivisionError or NameError, well it's going to keep doing it and I see no point in constantly retrying just to satisfy some exception that you didn't expect to exist. This is precisely why I designed it with a limited set of exceptions in mind and with the ability to add more to handle dynamically.

Also, just so you know doing things like try except Exception: is generally a bad smell unless you know what you're doing. If an exception can't be handled then don't handle it.