On stock MIUI China (MIUI 10 9.9.26) with Magisk 19.4-736729f5 (19309), after the device being powered on for a day or two, Magisk's su daemon process gets killed, and root access cannot be obtained anymore, until next reboot. Running Stable / Canary versions or disabling Magisk Hide does not seem to resolve the problem.
By making use of SELinux (magiskpolicy --live 'auditallow * magisk process sigkill'), I have basically figured out that MIUI's /init program sent SIGKILL to the Magisk daemon process, as indicated by the following dmesg messages:
[28367.472166] type=1400 audit(1569771138.568:43671): avc: granted { sigkill } for pid=1 comm="init" scontext=u:r:init:s0 tcontext=u:r:magisk:s0 tclass=process
[28369.135257] type=1400 audit(1569771138.568:43675): avc: granted { sigkill } for pid=1 comm="init" scontext=u:r:init:s0 tcontext=u:r:magisk:s0 tclass=process duplicate messages suppressed
I am sorry that I am having no idea of further analysis. Also, I am unable to find any particular way to trigger such behavior, except for waiting.
The full dmesg log around the SIGKILLs is attached here.
This could be due to your devices over aggressive battery optimization. Have a look here:
https://dontkillmyapp.com/xiaomi
Try removing Magisk Manager from battery optimization.
Let's see what topjohnwu has to say.
Havoc 2.9 based on AOSP has this problem too :(
Try removing Magisk Manager from battery optimization.
I tried, and this time it died after about a week. Seemed to last longer. Not sure if this is due to battery optimization. IMO, it is not quite probable, since it is init killing a system service with root privileges.
magiskdnot be killed until now when I disable it.
I see. May I ask that have you tried disabling battery optimization for Magisk Manager and other root apps (before disabling overall battery management)? Did that help? I'm sorry that I cannot find those settings in MIUI, so I am not able to try and see if that is the solution for me.
@chen-456 Sorry, I haven't. And that way is useless now, magiskd be killed just now althought disable battery management. So I have deleted that post, sorry very much. Your issue are still a secret now :(
@chen-456 using magisk release (not debug) version may fix this problem. canary build release version also ok.
I never use the Canary Debug version. I have performed some major changes to my system, including updating Magisk to 20.1 and uninstalling some modules. I will wait and see if anything will change.
I'm sorry, but with the latest Magisk Canary version (20.1 20003), init still kills magiskd as before. Changes in Canary will eventually merge into Stable, so I think I'm not going to give the Stable version a try.
As far as I understand battery optimization is applicable only to processes running under zygote i.e. apps' DVMs. Native daemons' usage of resources is controlled through cgroups; two of those - acct and memcg - might be relevant here. Both are used through libprocessgroup by init and zygote when starting/stopping processes.
Your attached dmesg log doesn't provide enough information but what I conclude is that magiskd is being killed because of two reasons:
dpmd is malfunctioning (seems buggy on Xiaomi devices; this and this), andmagiskd doesn't add itself to right subgroup of control group (acct or memcg)dpmd exits with code 1 repeatedly and init sends signal 9 to all processes forked by this service by making use of cgroup:
init: Service 'dpmd' (pid ???) exited with status 1
init: Sending signal 9 to service 'dpmd' (pid ???) process group...
libprocessgroup: Successfully killed process cgroup uid 0 pid ??? in 0ms
Refer to this commit, when a service is started by Android init, it calls createProcessGroup which writes it's PID to one of the two cgroups:
/acct/uid_<UID>/pid_<PID>/cgroup.procs
/dev/memcg/apps/uid_<UID>/pid_<PID>/cgroup.procs
init expects its services to be running in foreground which isn't the case with magiskd. What happened here, /sbin/magisk --post-fs-data service was started by init with PID 648 on post-fs-data. But this process forked magsikd in background, say, with PID 649 and itself exited with success. Since the service was oneshot, init didn't take any action. So far so good.
But while starting the service, init added 648 to /acct/uid_0/pid_648/cgroup.procs which also included 649 on forking. When former exits, only latter is left i.e. now /acct/uid_0/pid_648/cgroup.procs contains 649 (plus any further forked PIDs e.g. logcat -s Magisk). Now kernel is free to assign PID 648 to any new process. If it's being assigned to non-root non-service processes or non-root service processes or non-service root processes, even for months, that's not a problem. But your bad luck, it was assigned to a self-killing init service root process dpmd. When starting, init added its PID to /acct/uid_0/pid_648/cgroup.procs, and when it exited with status 1, init triggered killProcessGroup which Successfully killed process cgroup uid 0 pid 648 i.e. magiskd and all of its forked processes.
init: Service 'dpmd' (pid 648) exited with status 1
init: Sending signal 9 to service 'dpmd' (pid 648) process group...
type=1400 audit(1569771138.568:43671): avc: granted { sigkill } for pid=1 comm="init" scontext=u:r:init:s0 tcontext=u:r:magisk:s0 tclass=process
libprocessgroup: Successfully killed process cgroup uid 0 pid 648 in 50ms
init: Untracked pid 650 received signal 9
init: Untracked pid 6160 received signal 9
So either fix your init services not to misbehave, or @topjohnwu make magisk --post-fs-data run as a foreground init service not freeing the parent PID, or add code to magiskd so that it should do:
mv /acct/uid_0/pid_PARENT /acct/uid_0/pid_CHILD
I'm also having issues with Magisk stopping to work after a few days of phone uptime. I'm using Magisk 20 on Oxygen OS 9.
Did you get the logs via adb?
Executing su from Termux gives some message like no daemons running and the Manager tells me Magisk not installed
Yes, I patched adbd to make it run as root. You can do that yourself, or switch to a userdebug ROM (e.g. most custom ROMs) and do adb root, or look for other ways to run adbd insecurely.
But for now, my suggestion is to simply check if the situation you encountered is the same as me. Simply do a dmesg when Magisk is still alive, and check if some init service keeps stopping and restarting (in my case, it's dpmd). If that is the case for you, then the mystery has been solved by mirfatif, and I think a fix will come out soon. If not, then probably you got another issue, and you will have to do the research by yourself.
Most helpful comment
As far as I understand battery optimization is applicable only to processes running under
zygotei.e. apps' DVMs. Native daemons' usage of resources is controlled throughcgroups; two of those -acctandmemcg- might be relevant here. Both are used throughlibprocessgroupbyinitandzygotewhen starting/stopping processes.Your attached
dmesglog doesn't provide enough information but what I conclude is thatmagiskdis being killed because of two reasons:dpmdis malfunctioning (seems buggy on Xiaomi devices; this and this), andmagiskddoesn't add itself to right subgroup of control group (acctormemcg)dpmdexits with code1repeatedly andinitsends signal9to all processes forked by this service by making use ofcgroup:Refer to this commit, when a service is started by Android
init, it calls createProcessGroup which writes it's PID to one of the twocgroups:initexpects its services to be running in foreground which isn't the case withmagiskd. What happened here,/sbin/magisk --post-fs-dataservice was started byinitwith PID 648on post-fs-data. But this process forkedmagsikdin background, say, with PID 649 and itself exited with success. Since the service wasoneshot,initdidn't take any action. So far so good.But while starting the service,
initadded 648 to/acct/uid_0/pid_648/cgroup.procswhich also included 649 on forking. When former exits, only latter is left i.e. now/acct/uid_0/pid_648/cgroup.procscontains 649 (plus any further forked PIDs e.g.logcat -s Magisk). Now kernel is free to assign PID 648 to any new process. If it's being assigned to non-root non-service processes or non-root service processes or non-service root processes, even for months, that's not a problem. But your bad luck, it was assigned to a self-killinginitservice root processdpmd. When starting,initadded its PID to/acct/uid_0/pid_648/cgroup.procs, and when it exited with status 1,inittriggered killProcessGroup which Successfully killed process cgroup uid 0 pid 648 i.e.magiskdand all of its forked processes.So either fix your
initservices not to misbehave, or @topjohnwu makemagisk --post-fs-datarun as a foregroundinitservice not freeing the parent PID, or add code tomagiskdso that it should do: