Hello VMware Community,
the following error case exists:
For our VMware environment, we have currently built a new LogInsight instance with an ILB. (Size Large, 3 Nodes) The LogInsight is the target for about ~1000 host sources (ESXi/vSphere). During the basic configuration of the ESXi hosts we noticed that they throw the following error.
[root@esxi:~] less /var/log/.vmsyslogd.err
vmsyslog.loggers.network : ERROR ] vRLI.ilb.dns:514 - socket error : [Errno 32] Broken pipe
vmsyslog.loggers.network : ERROR ] Error shutting down socket.
Error Message in the WebUI - The host "vRLI.ilb.dns:514" has become unreachable. Remote logging to this host has stopped.
Network side as well as the firewall configuration has already been checked and unfortunately this is not the source of the error.
[root@esxi:~] nc -z vRLI.ilb.dns 514
Connection to vRLI.ilb.dns 514 port [tcp/shell] succeeded!
As a result, we suspected the error was in the kernel configuration for the TCP stack and made the following changes in /etc/sysctl.d (on the LogInsight nodes):
#Max Buffer Size 2^28 --> 268.435.456 ~270MB
# Provide adequate buffer memory.
# rmem_max and wmem_max are TCP max buffer size
# settable with setsockopt(), in bytes
# tcp_rmem and tcp_wmem are per socket in bytes.
# tcp_mem is for all TCP streams, in 4096-byte pages.
net.core.rmem_max = 268435456
net.core.wmem_max = 268435456
net.core.rmem_default = 1638400
net.core.wmem_default = 1638400
net.ipv4.tcp_rmem = 4096 1638400 268435456
net.ipv4.tcp_wmem = 4096 1638400 268435456
# This server might have 1500 clients simultaneously, so:
net.ipv4.tcp_mem = 4096 1638400 268435456
# Disable TCP SACK (TCP Selective Acknowledgement),
# DSACK (duplicate TCP SACK), and FACK (Forward Acknowledgement)
net.ipv4.tcp_sack = 0
net.ipv4.tcp_dsack = 0
net.ipv4.tcp_fack = 0
#
net.ipv4.tcp_max_syn_backlog = 100000
net.core.somaxconn = 100000
net.core.netdev_max_backlog = 100000
Even after changing the buffer size and the rest of the LogInsight TCP configuration parameters, we still have multiple packet drops and the socket error on the ESXi hosts.
root@LogInight[ / ]# netstat -s | grep "SYNs to LISTEN"
59848 SYNs to LISTEN sockets dropped
Is the described error case already known and if yes, does a workaround exist?
Greeting
Garimos