VMware Workstation

 View Only
  • 1.  CentOS Stream 9 host recent kernels hang due to RCU changes

    Posted Dec 29, 2024 11:55 AM

    I've used Workstation Pro successfully on CentOS Stream 9 for years, using external "vmware-host-modules" projects by mkubecek@github and now the fork by bytium@github.

    Beginning with kernel 5.14.0-522.el9.x86_64 the vmnet module uses 'rcu_read_lock()' and 'rcu_read_unlock()' instead of 'read_lock()' and 'read_unlock()' (in vmnet-only/vmnetInt.h). Workstation Pro 17.5.1 through 17.6.2 run well after this change. However, every kernel release since 522 does not.

    After booting with a kernel later than 522 and rebuilding/installing the modules, and starting the vmware service, everything initially seems fine. But as soon as I start a VM all applications on the host start to hang, including Workstation itself. Ultimately the host cannot be shut down properly and has to be manually powered off.

    Early in the process before things become completely unusable the systemd journal contains entries like these: 

    Dec 17 11:48:30 dln kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 11-...D } 60240 jiffies s: 2261 root: 0x2/.
    Dec 17 11:48:30 dln kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:10-19:0x2/.
    Dec 17 11:50:25 dln kernel: INFO: task P2P_DISCOVER:8835 blocked for more than 122 seconds.
    Dec 17 11:50:25 dln kernel:       Tainted: P        W  OE     -------  ---  5.14.0-542.el9.x86_64 #1
    Dec 17 11:50:25 dln kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Dec 17 11:50:25 dln kernel: INFO: task DNS Res~ver #34:50875 blocked for more than 122 seconds.
    Dec 17 11:50:25 dln kernel:       Tainted: P        W  OE     -------  ---  5.14.0-542.el9.x86_64 #1
    Dec 17 11:50:25 dln kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Dec 17 11:50:25 dln kernel: INFO: task vmware:67502 blocked for more than 122 seconds.
    Dec 17 11:50:25 dln kernel:       Tainted: P        W  OE     -------  ---  5.14.0-542.el9.x86_64 #1
    Dec 17 11:50:25 dln kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Dec 17 11:50:25 dln kernel: INFO: task kworker/u80:1:67285 blocked for more than 122 seconds.
    Dec 17 11:50:25 dln kernel:       Tainted: P        W  OE     -------  ---  5.14.0-542.el9.x86_64 #1
    Dec 17 11:50:25 dln kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Dec 17 11:50:25 dln kernel: INFO: task vmx-vmem:67452 blocked for more than 122 seconds.
    Dec 17 11:50:25 dln kernel:       Tainted: P        W  OE     -------  ---  5.14.0-542.el9.x86_64 #1
    Dec 17 11:50:25 dln kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Dec 17 11:51:31 dln kernel: rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 11-...D } 240463 jiffies s: 2261 root: 0x2/.
    Dec 17 11:51:31 dln kernel: rcu: blocking rcu_node structures (internal RCU debug): l=1:10-19:0x2/.

    I've already reported this to RedHat, but I'm not sure how well they can move on it since the problem also involves the launch of a VM with Workstation. I hope posting the issue here will help. Otherwise I seem to be stuck using kernel release 522.



  • 2.  RE: CentOS Stream 9 host recent kernels hang due to RCU changes

    Broadcom Employee
    Posted Dec 31, 2024 02:19 AM

    dnadle Ticket has been raised internally, relevant team will look into the same.




  • 3.  RE: CentOS Stream 9 host recent kernels hang due to RCU changes

    Posted Jan 03, 2025 02:18 AM

    RedHat released kernel 5.14.0-547.el9.x86_64, which appears to have fixed the problem.




  • 4.  RE: CentOS Stream 9 host recent kernels hang due to RCU changes

    Broadcom Employee
    Posted Jan 03, 2025 04:28 AM

    dnadle Are you able to launch Workstation after updating to Kernel Version 5.14.0-547.el9.x86_64?




  • 5.  RE: CentOS Stream 9 host recent kernels hang due to RCU changes

    Posted Jan 03, 2025 07:49 AM

    Yes. Running 2 VMs since last night after updating to 547. So far so good.

    RedHat made this comment on my bug report:

    A x86 specific bug was introduced in the -540 release due to the backport of CVE-2024-50102. The fix was merged into the -547 release today. I would suggest you try out a -547 or later releases when you have access to them to see if it can fix this issue.

    That specific bug can crash the kernel.