Page MenuHomeVyOS Platform

Kernel panic when QAT uses
Resolved (N/A)PublicBUG

Description

This issue related to the new kernel and QAT driver. To reproduce this issue needs only enable QAT and pass some traffic via tunnel/vti

[  182.257269] kernel BUG at mm/slub.c:304!
[  182.305889] invalid opcode: 0000 [#1] SMP NOPTI
[  345.567183] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G           O      5.10.28-amd64-vyos #1
[  345.668898] Hardware name: Dell EMC VEP1445-V220/VEP1445-V220-CPU, BIOS 3.48.0.9-4 06/26/2019
[  345.772705] RIP: 0010:__slab_free+0x18b/0x340
[  345.826505] Code: 1f 44 00 00 eb 9c 41 f7 46 08 00 0d 21 00 0f 85 26 ff ff ff 4d 85 ed 0f 85 1d ff ff ff 80 4c 24 5b 80 45 31 ff e9 54 ff ff ff <0f> 0b 49 3b 5c 24 28 75 c4 48 8b 44 24 28 49 89 4c 24 28 49 89 44
[  346.053221] RSP: 0018:ffffaed800118e00 EFLAGS: 00010246
[  346.117440] RAX: ffff8f4ee3d45d00 RBX: 000000008020001e RCX: ffff8f4ee3d45c00
[  346.204574] RDX: ffff8f4ee3d45c00 RSI: ffffde6a048f5100 RDI: ffff8f4ec0043600
[  346.291709] RBP: ffffaed800118e98 R08: 0000000000000001 R09: ffffffffc09efa48
[  346.378845] R10: ffff8f4ee3d45c00 R11: 0000000000000001 R12: ffffde6a048f5100
[  346.465981] R13: ffff8f4ee3d45c00 R14: ffff8f4ec0043600 R15: ffffffffb50060c0
[  346.553116] FS:  0000000000000000(0000) GS:ffff8f522fc80000(0000) knlGS:0000000000000000
[  346.651710] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  346.722178] CR2: 000056184b1a89b8 CR3: 000000005020a000 CR4: 00000000003506e0
[  346.809314] Call Trace:
[  346.840202]  <IRQ>
[  346.865880]  ? skb_release_all+0x9/0x20
[  346.913429]  ? xfrm_input+0x2d8/0x1110
[  346.959940]  ? kmem_cache_free+0x39c/0x3c0
[  347.010620]  esp_input_done2+0x258/0x3a0 [esp4]
[  347.066503]  esp_input_done+0xd/0x20 [esp4]
[  347.118232]  adf_handle_response+0x40/0xc0 [intel_qat]
[  347.181408]  adf_response_handler+0x78/0xd0 [intel_qat]
[  347.245618]  tasklet_action_common.isra.21+0x54/0xc0
[  347.306712]  __do_softirq+0xd2/0x227
[  347.351138]  asm_call_irq_on_stack+0x12/0x20
[  347.403897]  </IRQ>
[  347.430618]  do_softirq_own_stack+0x32/0x40
[  347.482336]  irq_exit_rcu+0x98/0xa0
[  347.525721]  common_interrupt+0x73/0x130
[  347.574315]  asm_common_interrupt+0x1e/0x40
[  347.626034] RIP: 0010:cpuidle_enter_state+0xc6/0x2c0
[  347.687127] Code: 89 c6 e8 9d e9 ba ff 45 84 ff 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 ac 01 00 00 31 ff e8 81 d9 bf ff fb 66 0f 1f 44 00 00 <85> db 0f 88 b3 00 00 00 48 63 c3 4c 2b 34 24 48 8d 14 40 48 8d 14
[  347.913845] RSP: 0018:ffffaed80008fe80 EFLAGS: 00000246
[  347.978064] RAX: ffff8f522fca2800 RBX: 0000000000000003 RCX: 000000000000001f
[  348.065199] RDX: 000000506f2e3e8b RSI: 000000003a2e8ba3 RDI: 0000000000000000
[  348.152334] RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000022080
[  348.239470] R10: 000000cb911aaa18 R11: ffff8f522fca18c4 R12: ffff8f522fcab200
[  348.326605] R13: ffffffffb50b4a60 R14: 000000506f2e3e8b R15: 0000000000000000
[  348.413743]  cpuidle_enter+0x24/0x40
[  348.458169]  do_idle+0x24b/0x2a0
[  348.498430]  cpu_startup_entry+0x14/0x20
[  348.547023]  start_secondary+0x110/0x150
[  348.595617]  secondary_startup_64_no_verify+0xb0/0xbb
Apr 18 20:10:51 [  348.657753] Modules linked in: ip_vti jitterentropy_rng drbg ansi_cprng echainiv af_packet twofish_generic twofish_x86_64_3way twofish_x86_64 twofish_common serpent_sse2_x86_64 serpent_generic blowfish_generic blowfish_x86_64 blowfish_common cast5_generic cast_common ctr ecb des_generic libdes cbc algif_skcipher camellia_generic camellia_x86_64 xcbc sha512_ssse3 sha512_generic md4 algif_hash af_alg xfrm_user xfrm4_tunnel tunnel4 ipcomp xfrm_ipcomp esp4 ah4 af_key usdm_drv(O) qat_c3xxx(O) intel_qat(O) dh_generic uio authenc fuse nft_chain_nat xt_CT xt_tcpudp nft_compat nfnetlink_cthelper nft_counter nf_tables nfnetlink nf_nat_pptp nf_conntrack_pptp nf_nat_h323 nf_conntrack_h323 nf_nat_sip nf_conntrack_sip nf_nat_tftp nf_nat_ftp nf_nat nf_conntrack_tftp nf_conntrack_ftp nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ath10k_pci ath10k_core ath mac80211 cfg80211 pnd2_edac x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel libarc4
QAT-VEP1445 kern[  348.657825]  iTCO_wdt iTCO_vendor_support tpm_crb pcspkr tpm_tis tpm_tis_core tpm evdev crypto_simd cryptd glue_helper rng_core rapl button intel_cstate acpi_cpufreq mpls_iptunnel mpls_router ip_tunnel mpls_gso br_netfilter bridge stp llc ip_tables x_tables autofs4 nls_cp437 vfat fat ohci_hcd uhci_hcd ehci_hcd squashfs zstd_decompress lz4_decompress loop overlay ext4 crc32c_generic crc16 mbcache jbd2 nls_ascii usb_storage sd_mod t10_pi mmc_block ahci libahci sdhci_pci cqhci crc32c_intel sdhci xhci_pci ixgbe libata i2c_i801 xfrm_algo i2c_smbus mmc_core mdio scsi_mod xhci_hcd i2c_ismt igb i2c_algo_bit thermal
[  350.389334] ---[ end trace d859569d05404950 ]---
el:   350.448482] RIP: 0010:__slab_free+0x18b/0x340
[  350.516714] Code: 1f 44 00 00 eb 9c 41 f7 46 08 00 0d 21 00 0f 85 26 ff ff ff 4d 85 ed 0f 85 1d ff ff ff 80 4c 24 5b 80 45 31 ff e9 54 ff ff ff <0f> 0b 49 3b 5c 24 28 75 c4 48 8b 44 24 28 49 89 4c 24 28 49 89 44
[  350.743439] RSP: 0018:ffffaed800118e00 EFLAGS: 00010246
[  350.807650] RAX: ffff8f4ee3d45d00 RBX: 000000008020001e RCX: ffff8f4ee3d45c00
;1;31m-[  350.894787] RDX: ffff8f4ee3d45c00 RSI: ffffde6a048f5100 RDI: ffff8f4ec0043600
[  350.998588] RBP: ffffaed800118e98 R08: 0000000000000001 R09: ffffffffc09efa48
[  351.085726] R10: ffff8f4ee3d45c00 R11: 0000000000000001 R12: ffffde6a048f5100
[  351.172859] R13: ffff8f4ee3d45c00 R14: ffff8f4ec0043600 R15: ffffffffb50060c0
[  351.259995] FS:  0000000000000000(0000) GS:ffff8f522fc80000(0000) knlGS:0000000000000000
[  351.358589] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  351.429058] CR2: 000056184b1a89b8 CR3: 000000005020a000 CR4: 00000000003506e0
[  351.516193] Kernel panic - not syncing: Fatal exception in interrupt
[  351.594004] Kernel Offset: 0x33000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  351.727083] Rebooting in 60 seconds..

Details

Difficulty level
Hard (possibly days)
Version
1.4-rolling-202104091411
Why the issue appeared?
Will be filled on close
Is it a breaking change?
Unspecified (possibly destroys the router)

Related Objects

StatusSubtypeAssignedTask
Resolved N/AFEATURE REQUESTNone
Resolved N/ABUGNone

Event Timeline

Unknown Object (User) created this task.Apr 18 2021, 8:09 PM
Unknown Object (User) updated the task description. (Show Details)Apr 18 2021, 8:13 PM
dmbaturin added a subscriber: dmbaturin.

I presume the issue is no longer relevant since people do successfully use QAT now, but feel free to reopen if anything.