Ubuntu user hits thread number limit preventing SSH login
Recently I was investigating quite an interesting issue - there is Ubuntu based VM our testers run some tests on. It was reported they’re unable to log into the virtual machine.
After a brief investigation it became clear the issue is not network or SSH key related.
These are records from auth log:
/var/log/auth.log
Feb 28 20:21:39 test-instance sshd[21954]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0)
Feb 28 20:21:39 test-instance systemd-logind[756]: New session 75 of user ubuntu.
Feb 28 20:21:39 test-instance sshd[21954]: fatal: fork of unprivileged child failed
Feb 28 20:21:39 test-instance systemd-logind[756]: Removed session 75.
Quite an obscure error message, but smells like cgroup problem. Indeed:
journalctl -xe --no-pager | grep cgroup
Feb 28 20:21:39 test-instance kernel: cgroup: fork rejected by pids controller in /user.slice/user-1000.slice/session-75.scope
Number of processes in the system was not this high, so, naturally the next guy to blame is the number of threads.
Pids of the most thread-heavy processes can be found using the following oneliner
for prc in $(ps -A -o pid); do grep -s Threads /proc/${prc}/status | awk -v prc="${prc}" '{print prc, $2}'; done | sort -n -r -k 2 | head
10925 5156
10971 5138
11193 506
764 11
831 7
802 4
854 3
821 3
853 2
19109 2
First column here is a pid, second - number of threads.
Next it’s needed to find out what limit is being hit. Honestly it was quite a discovery for me that pids.max cgroup limit controls number of threads as well.
Limit set in the following file
cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
10813
Current usage can be found here:
cat /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.current
10809
user-1000
here is ubuntu
, confirmed by id ubuntu
.
As you can see the limit is almost exhausted. Once the limit was increased
echo '32768' > /sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.max
it became possible to log in as user ‘ubuntu’. Tester folks were able to identify the reason of an excessive thread spawning and the issue should not reoccur.