CPU hang with Fling 2.1

6. RE: CPU hang with Fling 2.1

Recommend

Broadcom Employee

Cyprien Laplace

Posted Jun 16, 2025 05:19 PM
Edited by Cyprien Laplace 10 days ago

Hi Xeroxxx, can you try adding monitor_control.disable_mmu_largepages = "TRUE" in your .vmx (or /etc/vmware/config)?

EDIT: fixed the global config file path.

Original Message

Original Message:
Sent: Jan 14, 2025 08:55 AM
From: Xeroxxx
Subject: CPU hang with Fling 2.1

I might found the issue.

It seems to be related to Mem.ShareForceSalting = 0. Settings it back to 2 on one host solves the issue.

I used it to save on precious memory with a lot of small machines.

Can you replicate the issue with setting it to 0?

EDIT: Nevermind still happens.

[102941.193995] CPU: 0 UID: 0 PID: 8465 Comm: kworker/0:0 Tainted: G             L     6.12.6-arm64 #1  Debian 6.12.6-1[102941.194004] Tainted: [L]=SOFTLOCKUP[102941.194006] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024[102941.194012] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper][102941.194036] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)[102941.194040] pc : __memcpy+0x128/0x240[102941.194053] lr : vmw_diff_memcpy+0x348/0x670 [vmwgfx]

[100163.653874] CPU: 1 PID: 227739 Comm: ib_tpool_worker Not tainted 5.15.0-130-generic #140-Ubuntu[100163.653883] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024[100163.653886] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)[100163.653910] pc : arch_local_irq_enable+0xc/0x2c[100163.653956] lr : copy_process+0xb3c/0x12b0

Cheers

Xeroxxx

Original Message:
Sent: Jan 08, 2025 10:51 AM
From: Cyprien Laplace
Subject: CPU hang with Fling 2.1

Hi Xeroxxx,

I will have to try reproduce it to get a better idea of what's going on. How many vCPUs and how much memory does your VM have?

I suppose the linux distribution doesn't matter. It was ok with Fling v1, and it is happening on debian and ubuntu.

Cyprien

Original Message:
Sent: Jan 07, 2025 04:19 AM
From: Xeroxxx
Subject: CPU hang with Fling 2.1

Same on Ubuntu 5.15.x

[30392.792642] watchdog: BUG: soft lockup - CPU#0 stuck for 38s! [kcompactd0:32][30392.797306] Modules linked in: tls xt_nat xt_tcpudp nf_conntrack_netlink veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink overlay vmw_vsock_vmci_transport vsock binfmt_misc nls_iso8859_1 joydev input_leds vmw_vmci sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid vmwgfx ttm crct10dif_ce drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm xhci_pci ahci xhci_pci_renesas vmxnet3 aes_neon_bs aes_neon_blk crypto_simd cryptd[30392.797619] CPU: 0 PID: 32 Comm: kcompactd0 Not tainted 5.15.0-130-generic #140-Ubuntu[30392.797629] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024[30392.797633] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)[30392.797643] pc : isolate_freepages_block+0x3ac/0x4b0[30392.797688] lr : isolate_freepages_block+0x328/0x4b0[30392.797692] sp : ffff80000afab9f0[30392.797694] x29: ffff80000afab9f0 x28: 0000000000000800 x27: ffff80000afabd38[30392.797701] x26: 0000000000095800 x25: 0000000000000001 x24: 0000000000000000[30392.797708] x23: 0000000000000001 x22: ffff80000afabb30 x21: 0000000000000006[30392.797714] x20: fffffc000055a800 x19: 00000000000956a1 x18: 0000000000000000[30392.797720] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000[30392.797725] x14: ffff80000a96be60 x13: ffff80000a96b948 x12: ffff00007fbfdf80[30392.797731] x11: ffff00007fbfdf80 x10: 0000000000000001 x9 : 00000000f0000080[30392.797737] x8 : ffff80000aa95cf8 x7 : 0000000000000020 x6 : ffff0000001fa080[30392.797743] x5 : ffff80000aa95a70 x4 : 0000000000000001 x3 : ffff80000afabd38[30392.797749] x2 : 0000000000000000 x1 : ffff00007fbfe5d0 x0 : 0000000000000000[30392.797755] Call trace:[30392.797759]  isolate_freepages_block+0x3ac/0x4b0[30392.797764]  isolate_freepages+0x1c4/0x360[30392.797767]  compaction_alloc+0x74/0x90[30392.797771]  unmap_and_move+0x6c/0x3fc[30392.797779]  migrate_pages+0x364/0x61c[30392.797783]  compact_zone+0x2b8/0x684[30392.797787]  proactive_compact_node+0x90/0xdc[30392.797791]  kcompactd+0x208/0x4d4[30392.797794]  kthread+0x110/0x114[30392.797799]  ret_from_fork+0x10/0x20

Original Message:
Sent: Jan 05, 2025 04:12 PM
From: Xeroxxx
Subject: CPU hang with Fling 2.1

Upgraded from last 1.0 fling to latest 2.1 (in-place)

We're getting a lot of CPU hang and stacktraces on Debian 12 / testing.

Is there a known issue?

3 x RPI4 8GB, 07.12.24 EEPROM, RPI 1.38 EFI

[ 5736.409823] watchdog: Watchdog detected hard LOCKUP on cpu 2

[ 5736.410110] Modules linked in: veth nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c overlay nls_ascii nls_cp437 crct10dif_ce vfat vmwgfx fat drm_ttm_helper ttm drm_kms_helper sg drm efi_pstore configfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod sd_mod cdrom ahci libahci libata scsi_mod scsi_common vmxnet3

[ 5736.410240] Sending NMI from CPU 1 to CPUs 2:

[ 5736.410292] NMI backtrace for cpu 2

[ 5736.410337] CPU: 2 UID: 996 PID: 5205 Comm: postgres Not tainted 6.12.6-arm64 #1 Debian 6.12.6-1

[ 5736.410346] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024

[ 5736.410350] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)

[ 5736.410359] pc : __rmqueue_pcplist+0x58c/0xd70

[ 5736.410405] lr : __rmqueue_pcplist+0x548/0xd70

[ 5736.410411] sp : ffff8000832e35f0

[ 5736.410414] x29: ffff8000832e36a0 x28: 000000000000003f x27: ffff00013f587f30

[ 5736.410422] x26: fffffdffc4822980 x25: 0000000000000000 x24: ffff00013f603640

[ 5736.410430] x23: ffff00013f587f00 x22: 0000000000000001 x21: fffffdffc2e78788

[ 5736.410437] x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000

[ 5736.410458] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000

[ 5736.410465] x14: 0000000000000100 x13: 0000000000000000 x12: 0000000000000000

[ 5736.410472] x11: 0000000000000040 x10: 000000000000003f x9 : ffff8000803a8ea8

[ 5736.410479] x8 : 00000000ffffffff x7 : ffff00013f587f30 x6 : ffff8000832e35f0

[ 5736.410486] x5 : ffff00013f603640 x4 : ffff00013f587f30 x3 : 000000000000001a

[ 5736.410492] x2 : fffffdffc2e78788 x1 : ffff00013f587f30 x0 : ffff00013f603bc0

[ 5736.410500] Call trace:

[ 5736.410503] __rmqueue_pcplist+0x58c/0xd70

[ 5736.410522] get_page_from_freelist+0x6b0/0x1b30

[ 5736.410526] __alloc_pages_noprof+0x170/0xf20

[ 5736.410529] alloc_pages_mpol_noprof+0x98/0x208

[ 5736.410547] alloc_pages_noprof+0x50/0xd0

[ 5736.410551] folio_alloc_noprof+0x1c/0x70

[ 5736.410556] filemap_alloc_folio_noprof+0x144/0x160

[ 5736.410567] __filemap_get_folio+0x21c/0x3f0

[ 5736.410572] ext4_da_write_begin+0x118/0x2a8 [ext4]

[ 5736.410632] generic_perform_write+0xd8/0x268

[ 5736.410636] ext4_buffered_write_iter+0x74/0x140 [ext4]

[ 5736.410657] ext4_file_write_iter+0x70/0x8c0 [ext4]

[ 5736.410676] vfs_write+0x24c/0x3b8

[ 5736.410689] __arm64_sys_pwrite64+0xb4/0xf0

[ 5736.410693] invoke_syscall+0x6c/0x100

[ 5736.410716] el0_svc_common.constprop.0+0x48/0xf0

[ 5736.410722] do_el0_svc+0x24/0x38

[ 5736.410727] el0_svc+0x38/0x120

[ 5736.410752] el0t_64_sync_handler+0x120/0x130

[ 5736.410758] el0t_64_sync+0x190/0x198

getconf PAGE_SIZE

4096

6.12.6-arm64 #1 SMP Debian 6.12.6-1 (2024-12-21) aarch64 GNU/Linux

7. RE: CPU hang with Fling 2.1

Recommend

Xeroxxx

Posted Jun 17, 2025 05:01 PM

Hello Cyprien,

I set in the VMX file of the virtual machine while stopped. It did not solve the problem.

watchdog: BUG: soft lockup - CPU#2 stuck for 26s! [containerd-shim:2848]

Original Message

Original Message:
Sent: Jun 16, 2025 05:18 PM
From: Cyprien Laplace
Subject: CPU hang with Fling 2.1

Hi Xeroxxx, can you try adding monitor_control.disable_mmu_largepages = "TRUE" in your .vmx (or /etc/vmware/esx.conf)?

Original Message:
Sent: Jan 14, 2025 08:55 AM
From: Xeroxxx
Subject: CPU hang with Fling 2.1

I might found the issue.

It seems to be related to Mem.ShareForceSalting = 0. Settings it back to 2 on one host solves the issue.

I used it to save on precious memory with a lot of small machines.

Can you replicate the issue with setting it to 0?

EDIT: Nevermind still happens.

[102941.193995] CPU: 0 UID: 0 PID: 8465 Comm: kworker/0:0 Tainted: G             L     6.12.6-arm64 #1  Debian 6.12.6-1[102941.194004] Tainted: [L]=SOFTLOCKUP[102941.194006] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024[102941.194012] Workqueue: events drm_fb_helper_damage_work [drm_kms_helper][102941.194036] pstate: a0000005 (NzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)[102941.194040] pc : __memcpy+0x128/0x240[102941.194053] lr : vmw_diff_memcpy+0x348/0x670 [vmwgfx]

[100163.653874] CPU: 1 PID: 227739 Comm: ib_tpool_worker Not tainted 5.15.0-130-generic #140-Ubuntu[100163.653883] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024[100163.653886] pstate: 40400005 (nZcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)[100163.653910] pc : arch_local_irq_enable+0xc/0x2c[100163.653956] lr : copy_process+0xb3c/0x12b0

Cheers

Xeroxxx

Original Message:
Sent: Jan 08, 2025 10:51 AM
From: Cyprien Laplace
Subject: CPU hang with Fling 2.1

Hi Xeroxxx,

I will have to try reproduce it to get a better idea of what's going on. How many vCPUs and how much memory does your VM have?

I suppose the linux distribution doesn't matter. It was ok with Fling v1, and it is happening on debian and ubuntu.

Cyprien

Original Message:
Sent: Jan 07, 2025 04:19 AM
From: Xeroxxx
Subject: CPU hang with Fling 2.1

Same on Ubuntu 5.15.x

[30392.792642] watchdog: BUG: soft lockup - CPU#0 stuck for 38s! [kcompactd0:32][30392.797306] Modules linked in: tls xt_nat xt_tcpudp nf_conntrack_netlink veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink overlay vmw_vsock_vmci_transport vsock binfmt_misc nls_iso8859_1 joydev input_leds vmw_vmci sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid vmwgfx ttm crct10dif_ce drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm xhci_pci ahci xhci_pci_renesas vmxnet3 aes_neon_bs aes_neon_blk crypto_simd cryptd[30392.797619] CPU: 0 PID: 32 Comm: kcompactd0 Not tainted 5.15.0-130-generic #140-Ubuntu[30392.797629] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024[30392.797633] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)[30392.797643] pc : isolate_freepages_block+0x3ac/0x4b0[30392.797688] lr : isolate_freepages_block+0x328/0x4b0[30392.797692] sp : ffff80000afab9f0[30392.797694] x29: ffff80000afab9f0 x28: 0000000000000800 x27: ffff80000afabd38[30392.797701] x26: 0000000000095800 x25: 0000000000000001 x24: 0000000000000000[30392.797708] x23: 0000000000000001 x22: ffff80000afabb30 x21: 0000000000000006[30392.797714] x20: fffffc000055a800 x19: 00000000000956a1 x18: 0000000000000000[30392.797720] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000[30392.797725] x14: ffff80000a96be60 x13: ffff80000a96b948 x12: ffff00007fbfdf80[30392.797731] x11: ffff00007fbfdf80 x10: 0000000000000001 x9 : 00000000f0000080[30392.797737] x8 : ffff80000aa95cf8 x7 : 0000000000000020 x6 : ffff0000001fa080[30392.797743] x5 : ffff80000aa95a70 x4 : 0000000000000001 x3 : ffff80000afabd38[30392.797749] x2 : 0000000000000000 x1 : ffff00007fbfe5d0 x0 : 0000000000000000[30392.797755] Call trace:[30392.797759]  isolate_freepages_block+0x3ac/0x4b0[30392.797764]  isolate_freepages+0x1c4/0x360[30392.797767]  compaction_alloc+0x74/0x90[30392.797771]  unmap_and_move+0x6c/0x3fc[30392.797779]  migrate_pages+0x364/0x61c[30392.797783]  compact_zone+0x2b8/0x684[30392.797787]  proactive_compact_node+0x90/0xdc[30392.797791]  kcompactd+0x208/0x4d4[30392.797794]  kthread+0x110/0x114[30392.797799]  ret_from_fork+0x10/0x20

Original Message:
Sent: Jan 05, 2025 04:12 PM
From: Xeroxxx
Subject: CPU hang with Fling 2.1

Upgraded from last 1.0 fling to latest 2.1 (in-place)

We're getting a lot of CPU hang and stacktraces on Debian 12 / testing.

Is there a known issue?

3 x RPI4 8GB, 07.12.24 EEPROM, RPI 1.38 EFI

[ 5736.409823] watchdog: Watchdog detected hard LOCKUP on cpu 2

[ 5736.410110] Modules linked in: veth nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c overlay nls_ascii nls_cp437 crct10dif_ce vfat vmwgfx fat drm_ttm_helper ttm drm_kms_helper sg drm efi_pstore configfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod sd_mod cdrom ahci libahci libata scsi_mod scsi_common vmxnet3