Observed for configuration, `linux' not
`linux-pthread', with GNU libc 2.2.93 under Red Hat
Linux 8.0.
gethostbyaddr() may call _nss_dns_gethostbyaddr_r() in
libnss_dns-2.2.93.so. That function has a stack frame
of 0x1047c bytes, which overflows the 50KB stack and
thus clobbers dynamic memory. In this particular
server, it happened to wipe out a return address in the
EventHandler thread, which then `returns' to cloud nine
right after it is scheduled next.
Stack frame size of _nss_dns_gethostbyaddr_r() is 0x81c
in libnss_dns-2.2.5.so; that version of libc does not
overflow the stack as far as I know.
Work-around: undefine RESOLVE_IPADDRESS in include/misc.h.
Proposed fix: Increase the thread's stack size to 128KB
(attached). I don't know whether that's sufficient.
Even if I did, it's impossible to ensure it remains
sufficient. The LWP thread package cannot extend a
thread's stack, and it cannot detect stack overflow.
The code to resolve player IP addresses was disabled in commit 32fac04
(v4.2.13) and deleted in commit 9ef4f1b (v4.3.33).
It was disabled to work-around this bug, i.e. insufficient stack.
Stack use wasn't its only problem, though. The code still used
obsolete gethostbyaddr() rather than getnameinfo(), and provided only
512 bytes for host names instead of the customary NI_MAXHOST (1025)
bytes.
All three issues would have been easy enough to fix. What's not so
easy is to avoid blocking on the synchronous DNS lookup. Without
that, connecting repeatedly from a range of addresses with slow
reverse lookup could conceivably be employed as a denial of service
attack.
Related: commit 030b374 (v4.2.33) enlarged the stack to fix an
unrelated stack overrun bug.