I might found the issue.
It seems to be related to Mem.ShareForceSalting = 0. Settings it back to 2 on one host solves the issue.
I used it to save on precious memory with a lot of small machines.
EDIT: Nevermind still happens.
Original Message:
Sent: Jan 08, 2025 10:51 AM
From: Cyprien Laplace
Subject: CPU hang with Fling 2.1
Hi Xeroxxx,
I will have to try reproduce it to get a better idea of what's going on. How many vCPUs and how much memory does your VM have?
I suppose the linux distribution doesn't matter. It was ok with Fling v1, and it is happening on debian and ubuntu.
Cyprien
Original Message:
Sent: Jan 07, 2025 04:19 AM
From: Xeroxxx
Subject: CPU hang with Fling 2.1
Same on Ubuntu 5.15.x
[30392.792642] watchdog: BUG: soft lockup - CPU#0 stuck for 38s! [kcompactd0:32][30392.797306] Modules linked in: tls xt_nat xt_tcpudp nf_conntrack_netlink veth xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfrm_algo nft_counter xt_addrtype nft_compat nf_tables nfnetlink overlay vmw_vsock_vmci_transport vsock binfmt_misc nls_iso8859_1 joydev input_leds vmw_vmci sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua efi_pstore ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor xor_neon raid6_pq libcrc32c raid1 raid0 multipath linear hid_generic usbhid hid vmwgfx ttm crct10dif_ce drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm xhci_pci ahci xhci_pci_renesas vmxnet3 aes_neon_bs aes_neon_blk crypto_simd cryptd[30392.797619] CPU: 0 PID: 32 Comm: kcompactd0 Not tainted 5.15.0-130-generic #140-Ubuntu[30392.797629] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024[30392.797633] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)[30392.797643] pc : isolate_freepages_block+0x3ac/0x4b0[30392.797688] lr : isolate_freepages_block+0x328/0x4b0[30392.797692] sp : ffff80000afab9f0[30392.797694] x29: ffff80000afab9f0 x28: 0000000000000800 x27: ffff80000afabd38[30392.797701] x26: 0000000000095800 x25: 0000000000000001 x24: 0000000000000000[30392.797708] x23: 0000000000000001 x22: ffff80000afabb30 x21: 0000000000000006[30392.797714] x20: fffffc000055a800 x19: 00000000000956a1 x18: 0000000000000000[30392.797720] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000[30392.797725] x14: ffff80000a96be60 x13: ffff80000a96b948 x12: ffff00007fbfdf80[30392.797731] x11: ffff00007fbfdf80 x10: 0000000000000001 x9 : 00000000f0000080[30392.797737] x8 : ffff80000aa95cf8 x7 : 0000000000000020 x6 : ffff0000001fa080[30392.797743] x5 : ffff80000aa95a70 x4 : 0000000000000001 x3 : ffff80000afabd38[30392.797749] x2 : 0000000000000000 x1 : ffff00007fbfe5d0 x0 : 0000000000000000[30392.797755] Call trace:[30392.797759] isolate_freepages_block+0x3ac/0x4b0[30392.797764] isolate_freepages+0x1c4/0x360[30392.797767] compaction_alloc+0x74/0x90[30392.797771] unmap_and_move+0x6c/0x3fc[30392.797779] migrate_pages+0x364/0x61c[30392.797783] compact_zone+0x2b8/0x684[30392.797787] proactive_compact_node+0x90/0xdc[30392.797791] kcompactd+0x208/0x4d4[30392.797794] kthread+0x110/0x114[30392.797799] ret_from_fork+0x10/0x20
Original Message:
Sent: Jan 05, 2025 04:12 PM
From: Xeroxxx
Subject: CPU hang with Fling 2.1
Upgraded from last 1.0 fling to latest 2.1 (in-place)
We're getting a lot of CPU hang and stacktraces on Debian 12 / testing.
Is there a known issue?
3 x RPI4 8GB, 07.12.24 EEPROM, RPI 1.38 EFI
[ 5736.409823] watchdog: Watchdog detected hard LOCKUP on cpu 2
[ 5736.410110] Modules linked in: veth nf_conntrack_netlink xt_nat xt_tcpudp xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c overlay nls_ascii nls_cp437 crct10dif_ce vfat vmwgfx fat drm_ttm_helper ttm drm_kms_helper sg drm efi_pstore configfs nfnetlink vsock_loopback vmw_vsock_virtio_transport_common vsock efivarfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic sr_mod sd_mod cdrom ahci libahci libata scsi_mod scsi_common vmxnet3
[ 5736.410240] Sending NMI from CPU 1 to CPUs 2:
[ 5736.410292] NMI backtrace for cpu 2
[ 5736.410337] CPU: 2 UID: 996 PID: 5205 Comm: postgres Not tainted 6.12.6-arm64 #1 Debian 6.12.6-1
[ 5736.410346] Hardware name: VMware, Inc. VMware20,1/VBSA, BIOS VMW201.00V.24405116.BA64.2411261552 11/26/2024
[ 5736.410350] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5736.410359] pc : __rmqueue_pcplist+0x58c/0xd70
[ 5736.410405] lr : __rmqueue_pcplist+0x548/0xd70
[ 5736.410411] sp : ffff8000832e35f0
[ 5736.410414] x29: ffff8000832e36a0 x28: 000000000000003f x27: ffff00013f587f30
[ 5736.410422] x26: fffffdffc4822980 x25: 0000000000000000 x24: ffff00013f603640
[ 5736.410430] x23: ffff00013f587f00 x22: 0000000000000001 x21: fffffdffc2e78788
[ 5736.410437] x20: 0000000000000000 x19: 0000000000000000 x18: 0000000000000000
[ 5736.410458] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 5736.410465] x14: 0000000000000100 x13: 0000000000000000 x12: 0000000000000000
[ 5736.410472] x11: 0000000000000040 x10: 000000000000003f x9 : ffff8000803a8ea8
[ 5736.410479] x8 : 00000000ffffffff x7 : ffff00013f587f30 x6 : ffff8000832e35f0
[ 5736.410486] x5 : ffff00013f603640 x4 : ffff00013f587f30 x3 : 000000000000001a
[ 5736.410492] x2 : fffffdffc2e78788 x1 : ffff00013f587f30 x0 : ffff00013f603bc0
[ 5736.410500] Call trace:
[ 5736.410503] __rmqueue_pcplist+0x58c/0xd70
[ 5736.410522] get_page_from_freelist+0x6b0/0x1b30
[ 5736.410526] __alloc_pages_noprof+0x170/0xf20
[ 5736.410529] alloc_pages_mpol_noprof+0x98/0x208
[ 5736.410547] alloc_pages_noprof+0x50/0xd0
[ 5736.410551] folio_alloc_noprof+0x1c/0x70
[ 5736.410556] filemap_alloc_folio_noprof+0x144/0x160
[ 5736.410567] __filemap_get_folio+0x21c/0x3f0
[ 5736.410572] ext4_da_write_begin+0x118/0x2a8 [ext4]
[ 5736.410632] generic_perform_write+0xd8/0x268
[ 5736.410636] ext4_buffered_write_iter+0x74/0x140 [ext4]
[ 5736.410657] ext4_file_write_iter+0x70/0x8c0 [ext4]
[ 5736.410676] vfs_write+0x24c/0x3b8
[ 5736.410689] __arm64_sys_pwrite64+0xb4/0xf0
[ 5736.410693] invoke_syscall+0x6c/0x100
[ 5736.410716] el0_svc_common.constprop.0+0x48/0xf0
[ 5736.410722] do_el0_svc+0x24/0x38
[ 5736.410727] el0_svc+0x38/0x120
[ 5736.410752] el0t_64_sync_handler+0x120/0x130
[ 5736.410758] el0t_64_sync+0x190/0x198
getconf PAGE_SIZE
4096
6.12.6-arm64 #1 SMP Debian 6.12.6-1 (2024-12-21) aarch64 GNU/Linux