Tasks are no longer completing after an abrupt shutdown (power to the server unplugged) while an update task was ongoing.
The tasks were completing and had even cleared them prior to the abrupt shutdown.
Tasks should complete.
Can't accomplish any task in the Device -> Channels module.
Herein are attached the logs of the last few days ever since the problem started.
kolibri-2020-01-25.txt
kolibri-2020-01-26.txt
kolibri-2020-01-27.txt
kolibri-2020-01-23.txt
kolibri-2020-01-24.txt
Kolibri server - 0.3.6
Kolibri - 0.13.0
Ubuntu 18.04.3
Thanks for the report @intelliant01!
I'm a bit confused though. I believe the expected behavior for both 0.12.x and 0.13.0 is that any transfers that are outstanding during a hard shutdown will "fail" and need to be manually re-initiated.
Would you mind clarifying a couple things?
Tasks are no longer completing after an abrupt shutdown (power to the server unplugged) while an update task was ongoing.
The tasks were completing and had even cleared them prior to the abrupt shutdown.
Are you saying that in 0.12.x, content transfers would resume after a hard shutdown of Kolibri? I just tested on 0.12.9, and the behavior was that the transfer had disappeared completely, which I believe is the same behavior as in 0.13.0.
Can't accomplish any task in the Device -> Channels module.
Are you saying that after a hard shutdown, content transfers working altogether and cannot be re-initiated?
Are you saying that in 0.12.x, content transfers would resume after a hard shutdown of Kolibri?
No. They don't resume. They disappear.
Are you saying that after a hard shutdown, content transfers (stop) working altogether and cannot be re-initiated?
Yes, they stop working i.e. all content tasks can be re-initiated but they never complete.
If there is still any doubt then shall create a gif and share.
Thank you! Yes, a gif would definitely be helpful. It also sounds somewhat similar but slightly different to a problem reported in the forums here: https://community.learningequality.org/t/import-export-issues/1396
My experience is similar to point 1 therein, though I have not tried the uninstall and reinstall step -
When I import several resources at the same time the Task Manager gets stuck at 100%. Cancelling the import doesn’t work. Only after restarting the computer does the task disappear. But afterwards I get the same problem with every task. So completely unable to import anything. I have to uninstall (deleting all imports) and reinstall Kolibri to be able to import again.
However I do remember being able to start and complete multiple imports. They did execute sequentially but they did get enlisted in the tasks list simultaneously and did complete. It was only after a hard shutdown that I experienced the problem.
Shall try to provide a gif soon as I may be away from the problem system for the next two days.
Hi @intelliant01 -
We've tried unsuccessfully to reproduce. Are you still seeing this?
Pasting the main error I saw in the logs provided below:
ERROR 2020-01-23 17:04:02,347 exception Internal Server Error: /api/content/remotechannel/7426fe78415147308e81e2004b987c52/
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/connection.py", line 159, in _new_conn
(self._dns_host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/util/connection.py", line 57, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/connectionpool.py", line 600, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/connectionpool.py", line 343, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/connectionpool.py", line 839, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/connection.py", line 301, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/connection.py", line 168, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fd42ef42898>: Failed to establish a new connection: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/kolibri/dist/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/connectionpool.py", line 638, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/kolibri/dist/urllib3/util/retry.py", line 399, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='studio.learningequality.org', port=443): Max retries exceeded with url: /api/public/v1/channels/lookup/7426fe78415147308e81e2004b987c52 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fd42ef42898>: Failed to establish a new connection: [Errno -2] Name or service not known',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/kolibri/dist/django/core/handlers/exception.py", line 41, in inner
response = get_response(request)
File "/usr/lib/python3/dist-packages/kolibri/dist/django/core/handlers/base.py", line 187, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/usr/lib/python3/dist-packages/kolibri/dist/django/core/handlers/base.py", line 185, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/lib/python3/dist-packages/kolibri/dist/django/views/decorators/csrf.py", line 58, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/kolibri/dist/rest_framework/viewsets.py", line 116, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/lib/python3/dist-packages/kolibri/dist/rest_framework/views.py", line 495, in dispatch
response = self.handle_exception(exc)
File "/usr/lib/python3/dist-packages/kolibri/dist/rest_framework/views.py", line 455, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/lib/python3/dist-packages/kolibri/dist/rest_framework/views.py", line 492, in dispatch
response = handler(request, *args, **kwargs)
File "./kolibri/core/content/api.py", line 986, in retrieve
identifier=pk, baseurl=baseurl, keyword=keyword, language=language
File "./kolibri/core/content/api.py", line 906, in _make_channel_endpoint_request
resp = requests.get(url)
File "/usr/lib/python3/dist-packages/kolibri/dist/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/kolibri/dist/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/kolibri/dist/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/kolibri/dist/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/kolibri/dist/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='studio.learningequality.org', port=443): Max retries exceeded with url: /api/public/v1/channels/lookup/7426fe78415147308e81e2004b987c52 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fd42ef42898>: Failed to establish a new connection: [Errno -2] Name or service not known',))
Hi @indirectlylit,
Yes am facing this issue even after updating to Kolibri 0.13.1rc1 . Rest of the environment is the same.
Herein are 2 gifs (Had hit pause in the middle of recording to avoid nagging wait time) -


Few changes noticed from 0.13.0 -
After some live debugging with the @intelliant01 installation, the problem was on this line https://github.com/learningequality/kolibri/blob/release-v0.13.x/kolibri/core/content/management/commands/importchannel.py#L179
importchannel tries to acquire a lock to do the importation and could not do it because the database said a lock was acquired by another process with a different pid. Diskcache does not return any error in this case, it just keeps the acquire routine in an infinite loop: https://github.com/grantjenks/python-diskcache/blob/master/diskcache/recipes.py#L145-L154 until the lock is released.
The problem appears because when the abrupt shutdown happened, kolibri was interrupted while doing the import and had a pid that was inserted in the cache.db to lock the process. After kolibri restarted the pid was not the same and we were in a deadlock.
I think the solution should be deleting cache.dbwhenever kolibri starts, to avoid this kind of problems, but I'd like to hear more opinions, specially from @rtibbles who implemented the locking here.
btw, this bug may happen with all the tasks using db_task_write_lock, not only importing channels, also deleting them, trying to vacuum, importing content, etc.
I am attaching here the bad db to help debugging
Yes - I think regenerating it when Kolibri starts seems wise.
Hi @intelliant01, @jredrejo has a fix in #6567, which will be included in release 0.13.1.
I'm going to close this issue now, but please let us know if you still see the error after upgrading to 0.13.1. Thank you!
@jredrejo @lyw07 thanks for the update. Have removed the problem cache.db as of now. However, will keep an eye for recurrence of this issue.
Most helpful comment
After some live debugging with the @intelliant01 installation, the problem was on this line https://github.com/learningequality/kolibri/blob/release-v0.13.x/kolibri/core/content/management/commands/importchannel.py#L179
importchanneltries to acquire a lock to do the importation and could not do it because the database said a lock was acquired by another process with a differentpid. Diskcache does not return any error in this case, it just keeps the acquire routine in an infinite loop: https://github.com/grantjenks/python-diskcache/blob/master/diskcache/recipes.py#L145-L154 until the lock is released.The problem appears because when the abrupt shutdown happened, kolibri was interrupted while doing the import and had a pid that was inserted in the cache.db to lock the process. After kolibri restarted the pid was not the same and we were in a deadlock.
I think the solution should be deleting
cache.dbwhenever kolibri starts, to avoid this kind of problems, but I'd like to hear more opinions, specially from @rtibbles who implemented the locking here.btw, this bug may happen with all the tasks using db_task_write_lock, not only importing channels, also deleting them, trying to vacuum, importing content, etc.
I am attaching here the bad db to help debugging