The following commit has been merged in the master branch: commit 268325bda5299836a6ad4c3952474a2be125da5f Merge: ca1443c7e75a28c6fde5c67cb1904b624cf43c36 3e6743e28b9b43d37ced234bdf8e19955d0216f8 Author: Linus Torvalds torvalds@linux-foundation.org Date: Mon Dec 12 16:22:22 2022 -0800
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval:
get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage.
These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
diff --combined Documentation/admin-guide/kernel-parameters.txt index 75f6d485df8e,78493797460f..b36c0e0fbc83 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@@ -703,17 -703,6 +703,17 @@@ condev= [HW,S390] console device conmode=
+ con3215_drop= [S390] 3215 console drop mode. + Format: y|n|Y|N|1|0 + When set to true, drop data on the 3215 console when + the console buffer is full. In this case the + operator using a 3270 terminal emulator (for example + x3270) does not have to enter the clear key for the + console output to advance and the kernel to continue. + This leads to a much faster boot time when a 3270 + terminal emulator is active. If no 3270 terminal + emulator is used, this parameter has no effect. + console= [KNL] Output console device and options.
tty<n> Use the virtual console device <n>. @@@ -842,7 -831,7 +842,7 @@@ memory region [offset, offset + size] for that kernel image. If '@offset' is omitted, then a suitable offset is selected automatically. - [KNL, X86-64] Select a region under 4G first, and + [KNL, X86-64, ARM64] Select a region under 4G first, and fall back to reserve region above 4G when '@offset' hasn't been specified. See Documentation/admin-guide/kdump/kdump.rst for further details. @@@ -862,23 -851,26 +862,23 @@@ available. It will be ignored if crashkernel=X is specified. crashkernel=size[KMG],low - [KNL, X86-64] range under 4G. When crashkernel=X,high + [KNL, X86-64, ARM64] range under 4G. When crashkernel=X,high is passed, kernel could allocate physical memory region above 4G, that cause second kernel crash on system that require some amount of low memory, e.g. swiotlb requires at least 64M+32K low memory, also enough extra low memory is needed to make sure DMA buffers for 32-bit devices won't run out. Kernel would try to allocate - at least 256M below 4G automatically. + default size of memory below 4G automatically. The default + size is platform dependent. + --> x86: max(swiotlb_size_or_default() + 8MiB, 256MiB) + --> arm64: 128MiB This one lets the user specify own low range under 4G for second kernel instead. 0: to disable low allocation. It will be ignored when crashkernel=X,high is not used or memory reserved is below 4G.
- [KNL, ARM64] range in low memory. - This one lets the user specify a low range in the - DMA zone for the crash dump kernel. - It will be ignored when crashkernel=X,high is not used - or memory reserved is located in the DMA zones. - cryptomgr.notests [KNL] Disable crypto self-tests
@@@ -4574,17 -4566,15 +4574,15 @@@
ramdisk_start= [RAM] RAM disk image start address
- random.trust_cpu={on,off} - [KNL] Enable or disable trusting the use of the - CPU's random number generator (if available) to - fully seed the kernel's CRNG. Default is controlled - by CONFIG_RANDOM_TRUST_CPU. - - random.trust_bootloader={on,off} - [KNL] Enable or disable trusting the use of a - seed passed by the bootloader (if available) to - fully seed the kernel's CRNG. Default is controlled - by CONFIG_RANDOM_TRUST_BOOTLOADER. + random.trust_cpu=off + [KNL] Disable trusting the use of the CPU's + random number generator (if available) to + initialize the kernel's RNG. + + random.trust_bootloader=off + [KNL] Disable trusting the use of the a seed + passed by the bootloader (if available) to + initialize the kernel's RNG.
randomize_kstack_offset= [KNL] Enable or disable kernel stack offset @@@ -6967,14 -6957,3 +6965,14 @@@ memory, and other data can't be written using xmon commands. off xmon is disabled. + + amd_pstate= [X86] + disable + Do not enable amd_pstate as the default + scaling driver for the supported processors + passive + Use amd_pstate as a scaling driver, driver requests a + desired performance on this abstract scale and the power + management firmware translates the requests into actual + hardware states (core frequency, data fabric and memory + clocks etc.) diff --combined arch/arm64/kernel/process.c index 19cd05eea3f0,1395a1638427..269ac1c25ae2 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@@ -331,8 -331,6 +331,8 @@@ int arch_dup_task_struct(struct task_st clear_tsk_thread_flag(dst, TIF_SME); }
+ dst->thread.fp_type = FP_STATE_FPSIMD; + /* clear any pending asynchronous tag fault raised by the parent */ clear_tsk_thread_flag(dst, TIF_MTE_ASYNC_FAULT);
@@@ -593,7 -591,7 +593,7 @@@ unsigned long __get_wchan(struct task_s unsigned long arch_align_stack(unsigned long sp) { if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space) - sp -= prandom_u32_max(PAGE_SIZE); + sp -= get_random_u32_below(PAGE_SIZE); return sp & ~0xf; }
diff --combined arch/loongarch/kernel/process.c index ddb8ba4eb399,77952d814bad..d61c9f465b95 --- a/arch/loongarch/kernel/process.c +++ b/arch/loongarch/kernel/process.c @@@ -152,7 -152,7 +152,7 @@@ int copy_thread(struct task_struct *p, childregs->csr_crmd = p->thread.csr_crmd; childregs->csr_prmd = p->thread.csr_prmd; childregs->csr_ecfg = p->thread.csr_ecfg; - return 0; + goto out; }
/* user thread */ @@@ -171,15 -171,14 +171,15 @@@ */ childregs->csr_euen = 0;
+ if (clone_flags & CLONE_SETTLS) + childregs->regs[2] = tls; + +out: clear_tsk_thread_flag(p, TIF_USEDFPU); clear_tsk_thread_flag(p, TIF_USEDSIMD); clear_tsk_thread_flag(p, TIF_LSX_CTX_LIVE); clear_tsk_thread_flag(p, TIF_LASX_CTX_LIVE);
- if (clone_flags & CLONE_SETTLS) - childregs->regs[2] = tls; - return 0; }
@@@ -294,7 -293,7 +294,7 @@@ unsigned long stack_top(void unsigned long arch_align_stack(unsigned long sp) { if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space) - sp -= prandom_u32_max(PAGE_SIZE); + sp -= get_random_u32_below(PAGE_SIZE);
return sp & STACK_ALIGN; } diff --combined arch/s390/kernel/vdso.c index d6df7169c01f,119328e1e2b3..ff7bf4432229 --- a/arch/s390/kernel/vdso.c +++ b/arch/s390/kernel/vdso.c @@@ -44,6 -44,21 +44,6 @@@ struct vdso_data *arch_get_vdso_data(vo return (struct vdso_data *)(vvar_page); }
-static struct page *find_timens_vvar_page(struct vm_area_struct *vma) -{ - if (likely(vma->vm_mm == current->mm)) - return current->nsproxy->time_ns->vvar_page; - /* - * VM_PFNMAP | VM_IO protect .fault() handler from being called - * through interfaces like /proc/$pid/mem or - * process_vm_{readv,writev}() as long as there's no .access() - * in special_mapping_vmops(). - * For more details check_vma_flags() and __access_remote_vm() - */ - WARN(1, "vvar_page accessed remotely"); - return NULL; -} - /* * The VVAR page layout depends on whether a task belongs to the root or * non-root time namespace. Whenever a task changes its namespace, the VVAR @@@ -69,6 -84,11 +69,6 @@@ int vdso_join_timens(struct task_struc mmap_read_unlock(mm); return 0; } -#else -static inline struct page *find_timens_vvar_page(struct vm_area_struct *vma) -{ - return NULL; -} #endif
static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, @@@ -207,7 -227,7 +207,7 @@@ static unsigned long vdso_addr(unsigne end -= len;
if (end > start) { - offset = prandom_u32_max(((end - start) >> PAGE_SHIFT) + 1); + offset = get_random_u32_below(((end - start) >> PAGE_SHIFT) + 1); addr = start + (offset << PAGE_SHIFT); } else { addr = start; diff --combined arch/x86/entry/vdso/vma.c index 3c6b488b2f11,d45c5fcfeac2..b8f3f9b9e53c --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@@ -98,6 -98,24 +98,6 @@@ static int vdso_mremap(const struct vm_ }
#ifdef CONFIG_TIME_NS -static struct page *find_timens_vvar_page(struct vm_area_struct *vma) -{ - if (likely(vma->vm_mm == current->mm)) - return current->nsproxy->time_ns->vvar_page; - - /* - * VM_PFNMAP | VM_IO protect .fault() handler from being called - * through interfaces like /proc/$pid/mem or - * process_vm_{readv,writev}() as long as there's no .access() - * in special_mapping_vmops(). - * For more details check_vma_flags() and __access_remote_vm() - */ - - WARN(1, "vvar_page accessed remotely"); - - return NULL; -} - /* * The vvar page layout depends on whether a task belongs to the root or * non-root time namespace. Whenever a task changes its namespace, the VVAR @@@ -122,6 -140,11 +122,6 @@@ int vdso_join_timens(struct task_struc
return 0; } -#else -static inline struct page *find_timens_vvar_page(struct vm_area_struct *vma) -{ - return NULL; -} #endif
static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, @@@ -187,10 -210,11 +187,10 @@@ pgprot_decrypted(vma->vm_page_prot)); } } else if (sym_offset == image->sym_hvclock_page) { - struct ms_hyperv_tsc_page *tsc_pg = hv_get_tsc_page(); + pfn = hv_get_tsc_pfn();
- if (tsc_pg && vclock_was_used(VDSO_CLOCKMODE_HVCLOCK)) - return vmf_insert_pfn(vma, vmf->address, - virt_to_phys(tsc_pg) >> PAGE_SHIFT); + if (pfn && vclock_was_used(VDSO_CLOCKMODE_HVCLOCK)) + return vmf_insert_pfn(vma, vmf->address, pfn); } else if (sym_offset == image->sym_timens_page) { struct page *timens_page = find_timens_vvar_page(vma);
@@@ -303,7 -327,7 +303,7 @@@ static unsigned long vdso_addr(unsigne end -= len;
if (end > start) { - offset = prandom_u32_max(((end - start) >> PAGE_SHIFT) + 1); + offset = get_random_u32_below(((end - start) >> PAGE_SHIFT) + 1); addr = start + (offset << PAGE_SHIFT); } else { addr = start; diff --combined arch/x86/kernel/module.c index c8a25aed604d,a98687642dd0..d85a6980e263 --- a/arch/x86/kernel/module.c +++ b/arch/x86/kernel/module.c @@@ -53,7 -53,7 +53,7 @@@ static unsigned long int get_module_loa */ if (module_load_offset == 0) module_load_offset = - (prandom_u32_max(1024) + 1) * PAGE_SIZE; + get_random_u32_inclusive(1, 1024) * PAGE_SIZE; mutex_unlock(&module_kaslr_mutex); } return module_load_offset; @@@ -251,12 -251,14 +251,12 @@@ int module_finalize(const Elf_Ehdr *hdr const Elf_Shdr *sechdrs, struct module *me) { - const Elf_Shdr *s, *text = NULL, *alt = NULL, *locks = NULL, + const Elf_Shdr *s, *alt = NULL, *locks = NULL, *para = NULL, *orc = NULL, *orc_ip = NULL, *retpolines = NULL, *returns = NULL, *ibt_endbr = NULL; char *secstrings = (void *)hdr + sechdrs[hdr->e_shstrndx].sh_offset;
for (s = sechdrs; s < sechdrs + hdr->e_shnum; s++) { - if (!strcmp(".text", secstrings + s->sh_name)) - text = s; if (!strcmp(".altinstructions", secstrings + s->sh_name)) alt = s; if (!strcmp(".smp_locks", secstrings + s->sh_name)) @@@ -300,13 -302,12 +300,13 @@@ void *iseg = (void *)ibt_endbr->sh_addr; apply_ibt_endbr(iseg, iseg + ibt_endbr->sh_size); } - if (locks && text) { + if (locks) { void *lseg = (void *)locks->sh_addr; - void *tseg = (void *)text->sh_addr; + void *text = me->core_layout.base; + void *text_end = text + me->core_layout.text_size; alternatives_smp_module_add(me, me->name, lseg, lseg + locks->sh_size, - tseg, tseg + text->sh_size); + text, text_end); }
if (orc && orc_ip) diff --combined arch/x86/kernel/process.c index e436c9c1ef3b,62671ccf0404..40d156a31676 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@@ -600,7 -600,7 +600,7 @@@ static __always_inline void __speculati }
if (updmsr) - write_spec_ctrl_current(msr, false); + update_spec_ctrl_cond(msr); }
static unsigned long speculation_ctrl_update_tif(struct task_struct *tsk) @@@ -965,7 -965,7 +965,7 @@@ early_param("idle", idle_setup) unsigned long arch_align_stack(unsigned long sp) { if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space) - sp -= prandom_u32_max(8192); + sp -= get_random_u32_below(8192); return sp & ~0xf; }
diff --combined arch/x86/xen/enlighten_pv.c index 038da45f057a,745420853a7c..1a2ba31635a5 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@@ -23,7 -23,6 +23,7 @@@ #include <linux/start_kernel.h> #include <linux/sched.h> #include <linux/kprobes.h> +#include <linux/kstrtox.h> #include <linux/memblock.h> #include <linux/export.h> #include <linux/mm.h> @@@ -33,6 -32,7 +33,7 @@@ #include <linux/edd.h> #include <linux/reboot.h> #include <linux/virtio_anchor.h> + #include <linux/stackprotector.h>
#include <xen/xen.h> #include <xen/events.h> @@@ -65,7 -65,6 +66,6 @@@ #include <asm/pgalloc.h> #include <asm/tlbflush.h> #include <asm/reboot.h> - #include <asm/stackprotector.h> #include <asm/hypervisor.h> #include <asm/mach_traps.h> #include <asm/mwait.h> @@@ -114,7 -113,7 +114,7 @@@ static __read_mostly bool xen_msr_safe static int __init parse_xen_msr_safe(char *str) { if (str) - return strtobool(str, &xen_msr_safe); + return kstrtobool(str, &xen_msr_safe); return -EINVAL; } early_param("xen_msr_safe", parse_xen_msr_safe); diff --combined drivers/mmc/core/core.c index de1cc9e1ae57,a1efda85c6f2..f0d19356ad76 --- a/drivers/mmc/core/core.c +++ b/drivers/mmc/core/core.c @@@ -97,8 -97,8 +97,8 @@@ static void mmc_should_fail_request(str !should_fail(&host->fail_mmc_request, data->blksz * data->blocks)) return;
- data->error = data_errors[prandom_u32_max(ARRAY_SIZE(data_errors))]; - data->bytes_xfered = prandom_u32_max(data->bytes_xfered >> 9) << 9; + data->error = data_errors[get_random_u32_below(ARRAY_SIZE(data_errors))]; + data->bytes_xfered = get_random_u32_below(data->bytes_xfered >> 9) << 9; }
#else /* CONFIG_FAIL_MMC_REQUEST */ @@@ -1134,13 -1134,7 +1134,13 @@@ u32 mmc_select_voltage(struct mmc_host mmc_power_cycle(host, ocr); } else { bit = fls(ocr) - 1; - ocr &= 3 << bit; + /* + * The bit variable represents the highest voltage bit set in + * the OCR register. + * To keep a range of 2 values (e.g. 3.2V/3.3V and 3.3V/3.4V), + * we must shift the mask '3' with (bit - 1). + */ + ocr &= 3 << (bit - 1); if (bit != host->ios.vdd) dev_warn(mmc_dev(host), "exceeding card's volts\n"); } @@@ -1484,11 -1478,6 +1484,11 @@@ void mmc_init_erase(struct mmc_card *ca card->pref_erase = 0; }
+static bool is_trim_arg(unsigned int arg) +{ + return (arg & MMC_TRIM_OR_DISCARD_ARGS) && arg != MMC_DISCARD_ARG; +} + static unsigned int mmc_mmc_erase_timeout(struct mmc_card *card, unsigned int arg, unsigned int qty) { @@@ -1771,7 -1760,7 +1771,7 @@@ int mmc_erase(struct mmc_card *card, un !(card->ext_csd.sec_feature_support & EXT_CSD_SEC_ER_EN)) return -EOPNOTSUPP;
- if (mmc_card_mmc(card) && (arg & MMC_TRIM_ARGS) && + if (mmc_card_mmc(card) && is_trim_arg(arg) && !(card->ext_csd.sec_feature_support & EXT_CSD_SEC_GB_CL_EN)) return -EOPNOTSUPP;
@@@ -1801,7 -1790,7 +1801,7 @@@ * identified by the card->eg_boundary flag. */ rem = card->erase_size - (from % card->erase_size); - if ((arg & MMC_TRIM_ARGS) && (card->eg_boundary) && (nr > rem)) { + if ((arg & MMC_TRIM_OR_DISCARD_ARGS) && card->eg_boundary && nr > rem) { err = mmc_do_erase(card, from, from + rem - 1, arg); from += rem; if ((err) || (to <= from)) diff --combined drivers/net/phy/at803x.c index d49965907561,b07513c61c35..22f4458274aa --- a/drivers/net/phy/at803x.c +++ b/drivers/net/phy/at803x.c @@@ -870,10 -870,8 +870,10 @@@ static int at803x_probe(struct phy_devi .wolopts = 0, };
- if (ccr < 0) + if (ccr < 0) { + ret = ccr; goto err; + } mode_cfg = ccr & AT803X_MODE_CFG_MASK;
switch (mode_cfg) { @@@ -1760,7 -1758,7 +1760,7 @@@ static int qca808x_phy_fast_retrain_con
static int qca808x_phy_ms_random_seed_set(struct phy_device *phydev) { - u16 seed_value = prandom_u32_max(QCA808X_MASTER_SLAVE_SEED_RANGE); + u16 seed_value = get_random_u32_below(QCA808X_MASTER_SLAVE_SEED_RANGE);
return at803x_debug_reg_mask(phydev, QCA808X_PHY_DEBUG_LOCAL_SEED, QCA808X_MASTER_SLAVE_SEED_CFG, diff --combined drivers/scsi/scsi_debug.c index bebda917b138,a86db9761d00..a0797101a8a0 --- a/drivers/scsi/scsi_debug.c +++ b/drivers/scsi/scsi_debug.c @@@ -5702,16 -5702,16 +5702,16 @@@ static int schedule_resp(struct scsi_cm u64 ns = jiffies_to_nsecs(delta_jiff);
if (sdebug_random && ns < U32_MAX) { - ns = prandom_u32_max((u32)ns); + ns = get_random_u32_below((u32)ns); } else if (sdebug_random) { ns >>= 12; /* scale to 4 usec precision */ if (ns < U32_MAX) /* over 4 hours max */ - ns = prandom_u32_max((u32)ns); + ns = get_random_u32_below((u32)ns); ns <<= 12; } kt = ns_to_ktime(ns); } else { /* ndelay has a 4.2 second max */ - kt = sdebug_random ? prandom_u32_max((u32)ndelay) : + kt = sdebug_random ? get_random_u32_below((u32)ndelay) : (u32)ndelay; if (ndelay < INCLUSIVE_TIMING_MAX_NS) { u64 d = ktime_get_boottime_ns() - ns_from_boot; @@@ -7323,12 -7323,8 +7323,12 @@@ static int sdebug_add_host_helper(int p dev_set_name(&sdbg_host->dev, "adapter%d", sdebug_num_hosts);
error = device_register(&sdbg_host->dev); - if (error) + if (error) { + spin_lock(&sdebug_host_list_lock); + list_del(&sdbg_host->host_list); + spin_unlock(&sdebug_host_list_lock); goto clean; + }
++sdebug_num_hosts; return 0; diff --combined fs/ceph/inode.c index bad9eeb6a1a5,fb255988dee8..d53399966a2d --- a/fs/ceph/inode.c +++ b/fs/ceph/inode.c @@@ -362,7 -362,7 +362,7 @@@ static int ceph_fill_fragtree(struct in if (nsplits != ci->i_fragtree_nsplits) { update = true; } else if (nsplits) { - i = prandom_u32_max(nsplits); + i = get_random_u32_below(nsplits); id = le32_to_cpu(fragtree->splits[i].frag); if (!__ceph_find_frag(ci, id)) update = true; @@@ -2492,7 -2492,7 +2492,7 @@@ int ceph_getattr(struct user_namespace struct inode *parent;
parent = ceph_lookup_inode(sb, ceph_ino(inode)); - if (!parent) + if (IS_ERR(parent)) return PTR_ERR(parent);
pci = ceph_inode(parent); diff --combined kernel/fork.c index cfb09ca1b1bc,ec57cae58ff1..89b8b6c08592 --- a/kernel/fork.c +++ b/kernel/fork.c @@@ -75,7 -75,6 +75,6 @@@ #include <linux/freezer.h> #include <linux/delayacct.h> #include <linux/taskstats_kern.h> - #include <linux/random.h> #include <linux/tty.h> #include <linux/fs_struct.h> #include <linux/magic.h> @@@ -97,6 -96,7 +96,7 @@@ #include <linux/scs.h> #include <linux/io_uring.h> #include <linux/bpf.h> + #include <linux/stackprotector.h>
#include <asm/pgalloc.h> #include <linux/uaccess.h> @@@ -535,9 -535,6 +535,9 @@@ void put_task_stack(struct task_struct
void free_task(struct task_struct *tsk) { +#ifdef CONFIG_SECCOMP + WARN_ON_ONCE(tsk->seccomp.filter); +#endif release_user_cpus_ptr(tsk); scs_release(tsk);
@@@ -2046,6 -2043,15 +2046,6 @@@ static __latent_entropy struct task_str return ERR_PTR(-EINVAL); }
- /* - * If the new process will be in a different time namespace - * do not allow it to share VM or a thread group with the forking task. - */ - if (clone_flags & (CLONE_THREAD | CLONE_VM)) { - if (nsp->time_ns != nsp->time_ns_for_children) - return ERR_PTR(-EINVAL); - } - if (clone_flags & CLONE_PIDFD) { /* * - CLONE_DETACHED is blocked so that we can potentially @@@ -2400,6 -2406,12 +2400,6 @@@
spin_lock(¤t->sighand->siglock);
- /* - * Copy seccomp details explicitly here, in case they were changed - * before holding sighand lock. - */ - copy_seccomp(p); - rv_task_fork(p);
rseq_fork(p, clone_flags); @@@ -2416,14 -2428,6 +2416,14 @@@ goto bad_fork_cancel_cgroup; }
+ /* No more failure paths after this point. */ + + /* + * Copy seccomp details explicitly here, in case they were changed + * before holding sighand lock. + */ + copy_seccomp(p); + init_task_pid_links(p); if (likely(p->pid)) { ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace); diff --combined lib/fault-inject.c index adb2f9355ee6,9f53408c545d..1421818c9ef7 --- a/lib/fault-inject.c +++ b/lib/fault-inject.c @@@ -41,6 -41,9 +41,6 @@@ EXPORT_SYMBOL_GPL(setup_fault_attr)
static void fail_dump(struct fault_attr *attr) { - if (attr->no_warn) - return; - if (attr->verbose > 0 && __ratelimit(&attr->ratelimit_state)) { printk(KERN_NOTICE "FAULT_INJECTION: forcing a failure.\n" "name %pd, interval %lu, probability %lu, " @@@ -100,7 -103,7 +100,7 @@@ static inline bool fail_stacktrace(stru * http://www.nongnu.org/failmalloc/ */
-bool should_fail(struct fault_attr *attr, ssize_t size) +bool should_fail_ex(struct fault_attr *attr, ssize_t size, int flags) { if (in_task()) { unsigned int fail_nth = READ_ONCE(current->fail_nth); @@@ -136,26 -139,20 +136,26 @@@ return false; }
- if (attr->probability <= prandom_u32_max(100)) + if (attr->probability <= get_random_u32_below(100)) return false;
if (!fail_stacktrace(attr)) return false;
fail: - fail_dump(attr); + if (!(flags & FAULT_NOWARN)) + fail_dump(attr);
if (atomic_read(&attr->times) != -1) atomic_dec_not_zero(&attr->times);
return true; } + +bool should_fail(struct fault_attr *attr, ssize_t size) +{ + return should_fail_ex(attr, size, 0); +} EXPORT_SYMBOL_GPL(should_fail);
#ifdef CONFIG_FAULT_INJECTION_DEBUG_FS diff --combined lib/test_printf.c index d6a5d4b5f884,f098914a48d5..d34dc636b81c --- a/lib/test_printf.c +++ b/lib/test_printf.c @@@ -126,7 -126,7 +126,7 @@@ __test(const char *expect, int elen, co * be able to print it as expected. */ failed_tests += do_test(BUF_SIZE, expect, elen, fmt, ap); - rand = 1 + prandom_u32_max(elen+1); + rand = get_random_u32_inclusive(1, elen + 1); /* Since elen < BUF_SIZE, we have 1 <= rand <= BUF_SIZE. */ failed_tests += do_test(rand, expect, elen, fmt, ap); failed_tests += do_test(0, expect, elen, fmt, ap); @@@ -179,6 -179,18 +179,6 @@@ test_number(void * behaviour. */ test("00|0|0|0|0", "%.2d|%.1d|%.0d|%.*d|%1.0d", 0, 0, 0, 0, 0, 0); -#ifndef __CHAR_UNSIGNED__ - { - /* - * Passing a 'char' to a %02x specifier doesn't do - * what was presumably the intention when char is - * signed and the value is negative. One must either & - * with 0xff or cast to u8. - */ - char val = -16; - test("0xfffffff0|0xf0|0xf0", "%#02x|%#02x|%#02x", val, val & 0xff, (u8)val); - } -#endif }
static void __init @@@ -692,29 -704,31 +692,29 @@@ flags(void
static void __init fwnode_pointer(void) { - const struct software_node softnodes[] = { - { .name = "first", }, - { .name = "second", .parent = &softnodes[0], }, - { .name = "third", .parent = &softnodes[1], }, - { NULL /* Guardian */ } - }; - const char * const full_name = "first/second/third"; + const struct software_node first = { .name = "first" }; + const struct software_node second = { .name = "second", .parent = &first }; + const struct software_node third = { .name = "third", .parent = &second }; + const struct software_node *group[] = { &first, &second, &third, NULL }; const char * const full_name_second = "first/second"; + const char * const full_name_third = "first/second/third"; const char * const second_name = "second"; const char * const third_name = "third"; int rval;
- rval = software_node_register_nodes(softnodes); + rval = software_node_register_node_group(group); if (rval) { pr_warn("cannot register softnodes; rval %d\n", rval); return; }
- test(full_name_second, "%pfw", software_node_fwnode(&softnodes[1])); - test(full_name, "%pfw", software_node_fwnode(&softnodes[2])); - test(full_name, "%pfwf", software_node_fwnode(&softnodes[2])); - test(second_name, "%pfwP", software_node_fwnode(&softnodes[1])); - test(third_name, "%pfwP", software_node_fwnode(&softnodes[2])); + test(full_name_second, "%pfw", software_node_fwnode(&second)); + test(full_name_third, "%pfw", software_node_fwnode(&third)); + test(full_name_third, "%pfwf", software_node_fwnode(&third)); + test(second_name, "%pfwP", software_node_fwnode(&second)); + test(third_name, "%pfwP", software_node_fwnode(&third));
- software_node_unregister_nodes(softnodes); + software_node_unregister_node_group(group); }
static void __init fourcc_pointer(void) diff --combined lib/vsprintf.c index 5b0611c00956,2d11541ee561..be71a03c936a --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@@ -41,6 -41,7 +41,7 @@@ #include <linux/siphash.h> #include <linux/compiler.h> #include <linux/property.h> + #include <linux/notifier.h> #ifdef CONFIG_BLOCK #include <linux/blkdev.h> #endif @@@ -752,26 -753,21 +753,21 @@@ early_param("debug_boot_weak_hash", deb
static bool filled_random_ptr_key __read_mostly; static siphash_key_t ptr_key __read_mostly; - static void fill_ptr_key_workfn(struct work_struct *work); - static DECLARE_DELAYED_WORK(fill_ptr_key_work, fill_ptr_key_workfn);
- static void fill_ptr_key_workfn(struct work_struct *work) + static int fill_ptr_key(struct notifier_block *nb, unsigned long action, void *data) { - if (!rng_is_initialized()) { - queue_delayed_work(system_unbound_wq, &fill_ptr_key_work, HZ * 2); - return; - } - get_random_bytes(&ptr_key, sizeof(ptr_key));
/* Pairs with smp_rmb() before reading ptr_key. */ smp_wmb(); WRITE_ONCE(filled_random_ptr_key, true); + return NOTIFY_DONE; }
static int __init vsprintf_init_hashval(void) { - fill_ptr_key_workfn(NULL); + static struct notifier_block fill_ptr_key_nb = { .notifier_call = fill_ptr_key }; + execute_with_initialized_rng(&fill_ptr_key_nb); return 0; } subsys_initcall(vsprintf_init_hashval) @@@ -866,7 -862,7 +862,7 @@@ char *restricted_pointer(char *buf, cha * kptr_restrict==1 cannot be used in IRQ context * because its test for CAP_SYSLOG would be meaningless. */ - if (in_irq() || in_serving_softirq() || in_nmi()) { + if (in_hardirq() || in_serving_softirq() || in_nmi()) { if (spec.field_width == -1) spec.field_width = 2 * sizeof(ptr); return error_string(buf, end, "pK-error", spec); diff --combined mm/slub.c index 891df05a4d45,7cd2c657030a..89b0d962f357 --- a/mm/slub.c +++ b/mm/slub.c @@@ -187,12 -187,6 +187,12 @@@ do { #define USE_LOCKLESS_FAST_PATH() (false) #endif
+#ifndef CONFIG_SLUB_TINY +#define __fastpath_inline __always_inline +#else +#define __fastpath_inline +#endif + #ifdef CONFIG_SLUB_DEBUG #ifdef CONFIG_SLUB_DEBUG_ON DEFINE_STATIC_KEY_TRUE(slub_debug_enabled); @@@ -247,7 -241,6 +247,7 @@@ static inline bool kmem_cache_has_cpu_p /* Enable to log cmpxchg failures */ #undef SLUB_DEBUG_CMPXCHG
+#ifndef CONFIG_SLUB_TINY /* * Minimum number of partial slabs. These will be left on the partial * lists even if they are empty. kmem_cache_shrink may reclaim them. @@@ -260,10 -253,6 +260,10 @@@ * sort the partial list by the number of objects in use. */ #define MAX_PARTIAL 10 +#else +#define MIN_PARTIAL 0 +#define MAX_PARTIAL 0 +#endif
#define DEBUG_DEFAULT_FLAGS (SLAB_CONSISTENCY_CHECKS | SLAB_RED_ZONE | \ SLAB_POISON | SLAB_STORE_USER) @@@ -309,7 -298,7 +309,7 @@@ struct track
enum track_item { TRACK_ALLOC, TRACK_FREE };
-#ifdef CONFIG_SYSFS +#ifdef SLAB_SUPPORTS_SYSFS static int sysfs_slab_add(struct kmem_cache *); static int sysfs_slab_alias(struct kmem_cache *, const char *); #else @@@ -343,12 -332,10 +343,12 @@@ static inline void stat(const struct km */ static nodemask_t slab_nodes;
+#ifndef CONFIG_SLUB_TINY /* * Workqueue used for flush_cpu_slab(). */ static struct workqueue_struct *flushwq; +#endif
/******************************************************************** * Core slab cache functions @@@ -394,12 -381,10 +394,12 @@@ static inline void *get_freepointer(str return freelist_dereference(s, object + s->offset); }
+#ifndef CONFIG_SLUB_TINY static void prefetch_freepointer(const struct kmem_cache *s, void *object) { prefetchw(object + s->offset); } +#endif
/* * When running under KMSAN, get_freepointer_safe() may return an uninitialized @@@ -844,17 -829,6 +844,17 @@@ static inline void set_orig_size(struc if (!slub_debug_orig_size(s)) return;
+#ifdef CONFIG_KASAN_GENERIC + /* + * KASAN could save its free meta data in object's data area at + * offset 0, if the size is larger than 'orig_size', it will + * overlap the data redzone in [orig_size+1, object_size], and + * the check should be skipped. + */ + if (kasan_metadata_size(s, true) > orig_size) + orig_size = s->object_size; +#endif + p += get_info_end(s); p += sizeof(struct track) * 2;
@@@ -874,11 -848,6 +874,11 @@@ static inline unsigned int get_orig_siz return *(unsigned int *)p; }
+void skip_orig_size_check(struct kmem_cache *s, const void *object) +{ + set_orig_size(s, (void *)object, s->object_size); +} + static void slab_bug(struct kmem_cache *s, char *fmt, ...) { struct va_format vaf; @@@ -941,7 -910,7 +941,7 @@@ static void print_trailer(struct kmem_c if (slub_debug_orig_size(s)) off += sizeof(unsigned int);
- off += kasan_metadata_size(s); + off += kasan_metadata_size(s, false);
if (off != size_from_object(s)) /* Beginning of the filler is the free pointer */ @@@ -997,28 -966,17 +997,28 @@@ static __printf(3, 4) void slab_err(str static void init_object(struct kmem_cache *s, void *object, u8 val) { u8 *p = kasan_reset_tag(object); + unsigned int poison_size = s->object_size;
- if (s->flags & SLAB_RED_ZONE) + if (s->flags & SLAB_RED_ZONE) { memset(p - s->red_left_pad, val, s->red_left_pad);
+ if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { + /* + * Redzone the extra allocated space by kmalloc than + * requested, and the poison size will be limited to + * the original request size accordingly. + */ + poison_size = get_orig_size(s, object); + } + } + if (s->flags & __OBJECT_POISON) { - memset(p, POISON_FREE, s->object_size - 1); - p[s->object_size - 1] = POISON_END; + memset(p, POISON_FREE, poison_size - 1); + p[poison_size - 1] = POISON_END; }
if (s->flags & SLAB_RED_ZONE) - memset(p + s->object_size, val, s->inuse - s->object_size); + memset(p + poison_size, val, s->inuse - poison_size); }
static void restore_bytes(struct kmem_cache *s, char *message, u8 data, @@@ -1112,7 -1070,7 +1112,7 @@@ static int check_pad_bytes(struct kmem_ off += sizeof(unsigned int); }
- off += kasan_metadata_size(s); + off += kasan_metadata_size(s, false);
if (size_from_object(s) == off) return 1; @@@ -1162,7 -1120,6 +1162,7 @@@ static int check_object(struct kmem_cac { u8 *p = object; u8 *endobject = object + s->object_size; + unsigned int orig_size;
if (s->flags & SLAB_RED_ZONE) { if (!check_bytes_and_report(s, slab, object, "Left Redzone", @@@ -1172,17 -1129,6 +1172,17 @@@ if (!check_bytes_and_report(s, slab, object, "Right Redzone", endobject, val, s->inuse - s->object_size)) return 0; + + if (slub_debug_orig_size(s) && val == SLUB_RED_ACTIVE) { + orig_size = get_orig_size(s, object); + + if (s->object_size > orig_size && + !check_bytes_and_report(s, slab, object, + "kmalloc Redzone", p + orig_size, + val, s->object_size - orig_size)) { + return 0; + } + } } else { if ((s->flags & SLAB_POISON) && s->object_size < s->inuse) { check_bytes_and_report(s, slab, p, "Alignment padding", @@@ -1417,7 -1363,7 +1417,7 @@@ static inline int alloc_consistency_che return 1; }
-static noinline int alloc_debug_processing(struct kmem_cache *s, +static noinline bool alloc_debug_processing(struct kmem_cache *s, struct slab *slab, void *object, int orig_size) { if (s->flags & SLAB_CONSISTENCY_CHECKS) { @@@ -1429,7 -1375,7 +1429,7 @@@ trace(s, slab, object, 1); set_orig_size(s, object, orig_size); init_object(s, object, SLUB_RED_ACTIVE); - return 1; + return true;
bad: if (folio_test_slab(slab_folio(slab))) { @@@ -1442,7 -1388,7 +1442,7 @@@ slab->inuse = slab->objects; slab->freelist = NULL; } - return 0; + return false; }
static inline int free_consistency_checks(struct kmem_cache *s, @@@ -1695,17 -1641,17 +1695,17 @@@ static inline void setup_object_debug(s static inline void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {}
-static inline int alloc_debug_processing(struct kmem_cache *s, - struct slab *slab, void *object, int orig_size) { return 0; } +static inline bool alloc_debug_processing(struct kmem_cache *s, + struct slab *slab, void *object, int orig_size) { return true; }
-static inline void free_debug_processing( - struct kmem_cache *s, struct slab *slab, - void *head, void *tail, int bulk_cnt, - unsigned long addr) {} +static inline bool free_debug_processing(struct kmem_cache *s, + struct slab *slab, void *head, void *tail, int *bulk_cnt, + unsigned long addr, depot_stack_handle_t handle) { return true; }
static inline void slab_pad_check(struct kmem_cache *s, struct slab *slab) {} static inline int check_object(struct kmem_cache *s, struct slab *slab, void *object, u8 val) { return 1; } +static inline depot_stack_handle_t set_track_prepare(void) { return 0; } static inline void set_track(struct kmem_cache *s, void *object, enum track_item alloc, unsigned long addr) {} static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n, @@@ -1730,13 -1676,11 +1730,13 @@@ static inline void inc_slabs_node(struc static inline void dec_slabs_node(struct kmem_cache *s, int node, int objects) {}
+#ifndef CONFIG_SLUB_TINY static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab, void **freelist, void *nextfree) { return false; } +#endif #endif /* CONFIG_SLUB_DEBUG */
/* @@@ -1856,8 -1800,6 +1856,8 @@@ static inline struct slab *alloc_slab_p
slab = folio_slab(folio); __folio_set_slab(folio); + /* Make the flag visible before any changes to folio->mapping */ + smp_wmb(); if (page_is_pfmemalloc(folio_page(folio, 0))) slab_set_pfmemalloc(slab);
@@@ -1939,7 -1881,7 +1939,7 @@@ static bool shuffle_freelist(struct kme return false;
freelist_count = oo_objects(s->oo); - pos = prandom_u32_max(freelist_count); + pos = get_random_u32_below(freelist_count);
page_limit = slab->objects * s->size; start = fixup_red_left(s, slab_address(slab)); @@@ -2057,11 -1999,17 +2057,11 @@@ static void __free_slab(struct kmem_cac int order = folio_order(folio); int pages = 1 << order;
- if (kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) { - void *p; - - slab_pad_check(s, slab); - for_each_object(p, s, slab_address(slab), slab->objects) - check_object(s, slab, p, SLUB_RED_INACTIVE); - } - __slab_clear_pfmemalloc(slab); - __folio_clear_slab(folio); folio->mapping = NULL; + /* Make the mapping reset visible before clearing the flag */ + smp_wmb(); + __folio_clear_slab(folio); if (current->reclaim_state) current->reclaim_state->reclaimed_slab += pages; unaccount_slab(slab, order, s); @@@ -2077,17 -2025,9 +2077,17 @@@ static void rcu_free_slab(struct rcu_he
static void free_slab(struct kmem_cache *s, struct slab *slab) { - if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) { + if (kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) { + void *p; + + slab_pad_check(s, slab); + for_each_object(p, s, slab_address(slab), slab->objects) + check_object(s, slab, p, SLUB_RED_INACTIVE); + } + + if (unlikely(s->flags & SLAB_TYPESAFE_BY_RCU)) call_rcu(&slab->rcu_head, rcu_free_slab); - } else + else __free_slab(s, slab); }
@@@ -2274,7 -2214,7 +2274,7 @@@ static void *get_partial_node(struct km if (!pfmemalloc_match(slab, pc->flags)) continue;
- if (kmem_cache_debug(s)) { + if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) { object = alloc_single_from_partial(s, n, slab, pc->orig_size); if (object) @@@ -2389,8 -2329,6 +2389,8 @@@ static void *get_partial(struct kmem_ca return get_any_partial(s, pc); }
+#ifndef CONFIG_SLUB_TINY + #ifdef CONFIG_PREEMPTION /* * Calculate the next globally unique transaction for disambiguation @@@ -2404,7 -2342,7 +2404,7 @@@ * different cpus. */ #define TID_STEP 1 -#endif +#endif /* CONFIG_PREEMPTION */
static inline unsigned long next_tid(unsigned long tid) { @@@ -2473,7 -2411,7 +2473,7 @@@ static void init_kmem_cache_cpus(struc static void deactivate_slab(struct kmem_cache *s, struct slab *slab, void *freelist) { - enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE, M_FULL_NOLIST }; + enum slab_modes { M_NONE, M_PARTIAL, M_FREE, M_FULL_NOLIST }; struct kmem_cache_node *n = get_node(s, slab_nid(slab)); int free_delta = 0; enum slab_modes mode = M_NONE; @@@ -2549,6 -2487,14 +2549,6 @@@ redo * acquire_slab() will see a slab that is frozen */ spin_lock_irqsave(&n->list_lock, flags); - } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) { - mode = M_FULL; - /* - * This also ensures that the scanning of full - * slabs from diagnostic functions will not see - * any frozen slabs. - */ - spin_lock_irqsave(&n->list_lock, flags); } else { mode = M_FULL_NOLIST; } @@@ -2558,7 -2504,7 +2558,7 @@@ old.freelist, old.counters, new.freelist, new.counters, "unfreezing slab")) { - if (mode == M_PARTIAL || mode == M_FULL) + if (mode == M_PARTIAL) spin_unlock_irqrestore(&n->list_lock, flags); goto redo; } @@@ -2572,6 -2518,10 +2572,6 @@@ stat(s, DEACTIVATE_EMPTY); discard_slab(s, slab); stat(s, FREE_SLAB); - } else if (mode == M_FULL) { - add_full(s, n, slab); - spin_unlock_irqrestore(&n->list_lock, flags); - stat(s, DEACTIVATE_FULL); } else if (mode == M_FULL_NOLIST) { stat(s, DEACTIVATE_FULL); } @@@ -2853,13 -2803,6 +2853,13 @@@ static int slub_cpu_dead(unsigned int c return 0; }
+#else /* CONFIG_SLUB_TINY */ +static inline void flush_all_cpus_locked(struct kmem_cache *s) { } +static inline void flush_all(struct kmem_cache *s) { } +static inline void __flush_cpu_slab(struct kmem_cache *s, int cpu) { } +static inline int slub_cpu_dead(unsigned int cpu) { return 0; } +#endif /* CONFIG_SLUB_TINY */ + /* * Check if the objects in a per cpu structure fit numa * locality expectations. @@@ -2885,28 -2828,38 +2885,28 @@@ static inline unsigned long node_nr_obj }
/* Supports checking bulk free of a constructed freelist */ -static noinline void free_debug_processing( - struct kmem_cache *s, struct slab *slab, - void *head, void *tail, int bulk_cnt, - unsigned long addr) +static inline bool free_debug_processing(struct kmem_cache *s, + struct slab *slab, void *head, void *tail, int *bulk_cnt, + unsigned long addr, depot_stack_handle_t handle) { - struct kmem_cache_node *n = get_node(s, slab_nid(slab)); - struct slab *slab_free = NULL; + bool checks_ok = false; void *object = head; int cnt = 0; - unsigned long flags; - bool checks_ok = false; - depot_stack_handle_t handle = 0; - - if (s->flags & SLAB_STORE_USER) - handle = set_track_prepare(); - - spin_lock_irqsave(&n->list_lock, flags);
if (s->flags & SLAB_CONSISTENCY_CHECKS) { if (!check_slab(s, slab)) goto out; }
- if (slab->inuse < bulk_cnt) { + if (slab->inuse < *bulk_cnt) { slab_err(s, slab, "Slab has %d allocated objects but %d are to be freed\n", - slab->inuse, bulk_cnt); + slab->inuse, *bulk_cnt); goto out; }
next_object:
- if (++cnt > bulk_cnt) + if (++cnt > *bulk_cnt) goto out_cnt;
if (s->flags & SLAB_CONSISTENCY_CHECKS) { @@@ -2928,22 -2881,61 +2928,22 @@@ checks_ok = true;
out_cnt: - if (cnt != bulk_cnt) + if (cnt != *bulk_cnt) { slab_err(s, slab, "Bulk free expected %d objects but found %d\n", - bulk_cnt, cnt); - -out: - if (checks_ok) { - void *prior = slab->freelist; - - /* Perform the actual freeing while we still hold the locks */ - slab->inuse -= cnt; - set_freepointer(s, tail, prior); - slab->freelist = head; - - /* - * If the slab is empty, and node's partial list is full, - * it should be discarded anyway no matter it's on full or - * partial list. - */ - if (slab->inuse == 0 && n->nr_partial >= s->min_partial) - slab_free = slab; - - if (!prior) { - /* was on full list */ - remove_full(s, n, slab); - if (!slab_free) { - add_partial(n, slab, DEACTIVATE_TO_TAIL); - stat(s, FREE_ADD_PARTIAL); - } - } else if (slab_free) { - remove_partial(n, slab); - stat(s, FREE_REMOVE_PARTIAL); - } + *bulk_cnt, cnt); + *bulk_cnt = cnt; }
- if (slab_free) { - /* - * Update the counters while still holding n->list_lock to - * prevent spurious validation warnings - */ - dec_slabs_node(s, slab_nid(slab_free), slab_free->objects); - } - - spin_unlock_irqrestore(&n->list_lock, flags); +out:
if (!checks_ok) slab_fix(s, "Object at 0x%p not freed", object);
- if (slab_free) { - stat(s, FREE_SLAB); - free_slab(s, slab_free); - } + return checks_ok; } #endif /* CONFIG_SLUB_DEBUG */
-#if defined(CONFIG_SLUB_DEBUG) || defined(CONFIG_SYSFS) +#if defined(CONFIG_SLUB_DEBUG) || defined(SLAB_SUPPORTS_SYSFS) static unsigned long count_partial(struct kmem_cache_node *n, int (*get_count)(struct slab *)) { @@@ -2957,12 -2949,12 +2957,12 @@@ spin_unlock_irqrestore(&n->list_lock, flags); return x; } -#endif /* CONFIG_SLUB_DEBUG || CONFIG_SYSFS */ +#endif /* CONFIG_SLUB_DEBUG || SLAB_SUPPORTS_SYSFS */
+#ifdef CONFIG_SLUB_DEBUG static noinline void slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid) { -#ifdef CONFIG_SLUB_DEBUG static DEFINE_RATELIMIT_STATE(slub_oom_rs, DEFAULT_RATELIMIT_INTERVAL, DEFAULT_RATELIMIT_BURST); int node; @@@ -2993,11 -2985,8 +2993,11 @@@ pr_warn(" node %d: slabs: %ld, objs: %ld, free: %ld\n", node, nr_slabs, nr_objs, nr_free); } -#endif } +#else /* CONFIG_SLUB_DEBUG */ +static inline void +slab_out_of_memory(struct kmem_cache *s, gfp_t gfpflags, int nid) { } +#endif
static inline bool pfmemalloc_match(struct slab *slab, gfp_t gfpflags) { @@@ -3007,7 -2996,6 +3007,7 @@@ return true; }
+#ifndef CONFIG_SLUB_TINY /* * Check the slab->freelist and either transfer the freelist to the * per cpu freelist or deactivate the slab. @@@ -3295,13 -3283,45 +3295,13 @@@ static void *__slab_alloc(struct kmem_c return p; }
-/* - * If the object has been wiped upon free, make sure it's fully initialized by - * zeroing out freelist pointer. - */ -static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s, - void *obj) -{ - if (unlikely(slab_want_init_on_free(s)) && obj) - memset((void *)((char *)kasan_reset_tag(obj) + s->offset), - 0, sizeof(void *)); -} - -/* - * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc) - * have the fastpath folded into their functions. So no function call - * overhead for requests that can be satisfied on the fastpath. - * - * The fastpath works by first checking if the lockless freelist can be used. - * If not then __slab_alloc is called for slow processing. - * - * Otherwise we can simply pick the next object from the lockless free list. - */ -static __always_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru, +static __always_inline void *__slab_alloc_node(struct kmem_cache *s, gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) { - void *object; struct kmem_cache_cpu *c; struct slab *slab; unsigned long tid; - struct obj_cgroup *objcg = NULL; - bool init = false; - - s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); - if (!s) - return NULL; - - object = kfence_alloc(s, orig_size, gfpflags); - if (unlikely(object)) - goto out; + void *object;
redo: /* @@@ -3371,95 -3391,22 +3371,95 @@@ stat(s, ALLOC_FASTPATH); }
+ return object; +} +#else /* CONFIG_SLUB_TINY */ +static void *__slab_alloc_node(struct kmem_cache *s, + gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) +{ + struct partial_context pc; + struct slab *slab; + void *object; + + pc.flags = gfpflags; + pc.slab = &slab; + pc.orig_size = orig_size; + object = get_partial(s, node, &pc); + + if (object) + return object; + + slab = new_slab(s, gfpflags, node); + if (unlikely(!slab)) { + slab_out_of_memory(s, gfpflags, node); + return NULL; + } + + object = alloc_single_from_new_slab(s, slab, orig_size); + + return object; +} +#endif /* CONFIG_SLUB_TINY */ + +/* + * If the object has been wiped upon free, make sure it's fully initialized by + * zeroing out freelist pointer. + */ +static __always_inline void maybe_wipe_obj_freeptr(struct kmem_cache *s, + void *obj) +{ + if (unlikely(slab_want_init_on_free(s)) && obj) + memset((void *)((char *)kasan_reset_tag(obj) + s->offset), + 0, sizeof(void *)); +} + +/* + * Inlined fastpath so that allocation functions (kmalloc, kmem_cache_alloc) + * have the fastpath folded into their functions. So no function call + * overhead for requests that can be satisfied on the fastpath. + * + * The fastpath works by first checking if the lockless freelist can be used. + * If not then __slab_alloc is called for slow processing. + * + * Otherwise we can simply pick the next object from the lockless free list. + */ +static __fastpath_inline void *slab_alloc_node(struct kmem_cache *s, struct list_lru *lru, + gfp_t gfpflags, int node, unsigned long addr, size_t orig_size) +{ + void *object; + struct obj_cgroup *objcg = NULL; + bool init = false; + + s = slab_pre_alloc_hook(s, lru, &objcg, 1, gfpflags); + if (!s) + return NULL; + + object = kfence_alloc(s, orig_size, gfpflags); + if (unlikely(object)) + goto out; + + object = __slab_alloc_node(s, gfpflags, node, addr, orig_size); + maybe_wipe_obj_freeptr(s, object); init = slab_want_init_on_alloc(gfpflags, s);
out: - slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init); + /* + * When init equals 'true', like for kzalloc() family, only + * @orig_size bytes might be zeroed instead of s->object_size + */ + slab_post_alloc_hook(s, objcg, gfpflags, 1, &object, init, orig_size);
return object; }
-static __always_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru, +static __fastpath_inline void *slab_alloc(struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags, unsigned long addr, size_t orig_size) { return slab_alloc_node(s, lru, gfpflags, NUMA_NO_NODE, addr, orig_size); }
-static __always_inline +static __fastpath_inline void *__kmem_cache_alloc_lru(struct kmem_cache *s, struct list_lru *lru, gfp_t gfpflags) { @@@ -3501,67 -3448,6 +3501,67 @@@ void *kmem_cache_alloc_node(struct kmem } EXPORT_SYMBOL(kmem_cache_alloc_node);
+static noinline void free_to_partial_list( + struct kmem_cache *s, struct slab *slab, + void *head, void *tail, int bulk_cnt, + unsigned long addr) +{ + struct kmem_cache_node *n = get_node(s, slab_nid(slab)); + struct slab *slab_free = NULL; + int cnt = bulk_cnt; + unsigned long flags; + depot_stack_handle_t handle = 0; + + if (s->flags & SLAB_STORE_USER) + handle = set_track_prepare(); + + spin_lock_irqsave(&n->list_lock, flags); + + if (free_debug_processing(s, slab, head, tail, &cnt, addr, handle)) { + void *prior = slab->freelist; + + /* Perform the actual freeing while we still hold the locks */ + slab->inuse -= cnt; + set_freepointer(s, tail, prior); + slab->freelist = head; + + /* + * If the slab is empty, and node's partial list is full, + * it should be discarded anyway no matter it's on full or + * partial list. + */ + if (slab->inuse == 0 && n->nr_partial >= s->min_partial) + slab_free = slab; + + if (!prior) { + /* was on full list */ + remove_full(s, n, slab); + if (!slab_free) { + add_partial(n, slab, DEACTIVATE_TO_TAIL); + stat(s, FREE_ADD_PARTIAL); + } + } else if (slab_free) { + remove_partial(n, slab); + stat(s, FREE_REMOVE_PARTIAL); + } + } + + if (slab_free) { + /* + * Update the counters while still holding n->list_lock to + * prevent spurious validation warnings + */ + dec_slabs_node(s, slab_nid(slab_free), slab_free->objects); + } + + spin_unlock_irqrestore(&n->list_lock, flags); + + if (slab_free) { + stat(s, FREE_SLAB); + free_slab(s, slab_free); + } +} + /* * Slow path handling. This may still be called frequently since objects * have a longer lifetime than the cpu slabs in most processing loads. @@@ -3587,8 -3473,8 +3587,8 @@@ static void __slab_free(struct kmem_cac if (kfence_free(head)) return;
- if (kmem_cache_debug(s)) { - free_debug_processing(s, slab, head, tail, cnt, addr); + if (IS_ENABLED(CONFIG_SLUB_TINY) || kmem_cache_debug(s)) { + free_to_partial_list(s, slab, head, tail, cnt, addr); return; }
@@@ -3688,7 -3574,6 +3688,7 @@@ slab_empty discard_slab(s, slab); }
+#ifndef CONFIG_SLUB_TINY /* * Fastpath with forced inlining to produce a kfree and kmem_cache_free that * can perform fastpath freeing without additional function calls. @@@ -3763,18 -3648,8 +3763,18 @@@ redo } stat(s, FREE_FASTPATH); } +#else /* CONFIG_SLUB_TINY */ +static void do_slab_free(struct kmem_cache *s, + struct slab *slab, void *head, void *tail, + int cnt, unsigned long addr) +{ + void *tail_obj = tail ? : head; + + __slab_free(s, slab, head, tail_obj, cnt, addr); +} +#endif /* CONFIG_SLUB_TINY */
-static __always_inline void slab_free(struct kmem_cache *s, struct slab *slab, +static __fastpath_inline void slab_free(struct kmem_cache *s, struct slab *slab, void *head, void *tail, void **p, int cnt, unsigned long addr) { @@@ -3907,13 -3782,18 +3907,13 @@@ void kmem_cache_free_bulk(struct kmem_c } EXPORT_SYMBOL(kmem_cache_free_bulk);
-/* Note that interrupts must be enabled when calling this function. */ -int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, - void **p) +#ifndef CONFIG_SLUB_TINY +static inline int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, + size_t size, void **p, struct obj_cgroup *objcg) { struct kmem_cache_cpu *c; int i; - struct obj_cgroup *objcg = NULL;
- /* memcg and kmem_cache debug support */ - s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); - if (unlikely(!s)) - return false; /* * Drain objects in the per cpu slab, while disabling local * IRQs, which protects against PREEMPT and interrupts @@@ -3967,72 -3847,19 +3967,72 @@@ local_unlock_irq(&s->cpu_slab->lock); slub_put_cpu_ptr(s->cpu_slab);
- /* - * memcg and kmem_cache debug support and memory initialization. - * Done outside of the IRQ disabled fastpath loop. - */ - slab_post_alloc_hook(s, objcg, flags, size, p, - slab_want_init_on_alloc(flags, s)); return i; + error: slub_put_cpu_ptr(s->cpu_slab); - slab_post_alloc_hook(s, objcg, flags, i, p, false); + slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size); + kmem_cache_free_bulk(s, i, p); + return 0; + +} +#else /* CONFIG_SLUB_TINY */ +static int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, + size_t size, void **p, struct obj_cgroup *objcg) +{ + int i; + + for (i = 0; i < size; i++) { + void *object = kfence_alloc(s, s->object_size, flags); + + if (unlikely(object)) { + p[i] = object; + continue; + } + + p[i] = __slab_alloc_node(s, flags, NUMA_NO_NODE, + _RET_IP_, s->object_size); + if (unlikely(!p[i])) + goto error; + + maybe_wipe_obj_freeptr(s, p[i]); + } + + return i; + +error: + slab_post_alloc_hook(s, objcg, flags, i, p, false, s->object_size); kmem_cache_free_bulk(s, i, p); return 0; } +#endif /* CONFIG_SLUB_TINY */ + +/* Note that interrupts must be enabled when calling this function. */ +int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, + void **p) +{ + int i; + struct obj_cgroup *objcg = NULL; + + if (!size) + return 0; + + /* memcg and kmem_cache debug support */ + s = slab_pre_alloc_hook(s, NULL, &objcg, size, flags); + if (unlikely(!s)) + return 0; + + i = __kmem_cache_alloc_bulk(s, flags, size, p, objcg); + + /* + * memcg and kmem_cache debug support and memory initialization. + * Done outside of the IRQ disabled fastpath loop. + */ + if (i != 0) + slab_post_alloc_hook(s, objcg, flags, size, p, + slab_want_init_on_alloc(flags, s), s->object_size); + return i; +} EXPORT_SYMBOL(kmem_cache_alloc_bulk);
@@@ -4056,8 -3883,7 +4056,8 @@@ * take the list_lock. */ static unsigned int slub_min_order; -static unsigned int slub_max_order = PAGE_ALLOC_COSTLY_ORDER; +static unsigned int slub_max_order = + IS_ENABLED(CONFIG_SLUB_TINY) ? 1 : PAGE_ALLOC_COSTLY_ORDER; static unsigned int slub_min_objects;
/* @@@ -4188,12 -4014,10 +4188,12 @@@ init_kmem_cache_node(struct kmem_cache_ #endif }
+#ifndef CONFIG_SLUB_TINY static inline int alloc_kmem_cache_cpus(struct kmem_cache *s) { BUILD_BUG_ON(PERCPU_DYNAMIC_EARLY_SIZE < - KMALLOC_SHIFT_HIGH * sizeof(struct kmem_cache_cpu)); + NR_KMALLOC_TYPES * KMALLOC_SHIFT_HIGH * + sizeof(struct kmem_cache_cpu));
/* * Must align to double word boundary for the double cmpxchg @@@ -4209,12 -4033,6 +4209,12 @@@
return 1; } +#else +static inline int alloc_kmem_cache_cpus(struct kmem_cache *s) +{ + return 1; +} +#endif /* CONFIG_SLUB_TINY */
static struct kmem_cache *kmem_cache_node;
@@@ -4277,9 -4095,7 +4277,9 @@@ static void free_kmem_cache_nodes(struc void __kmem_cache_release(struct kmem_cache *s) { cache_random_seq_destroy(s); +#ifndef CONFIG_SLUB_TINY free_percpu(s->cpu_slab); +#endif free_kmem_cache_nodes(s); }
@@@ -4386,8 -4202,7 +4386,8 @@@ static int calculate_sizes(struct kmem_ */ s->inuse = size;
- if ((flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || + if (slub_debug_orig_size(s) || + (flags & (SLAB_TYPESAFE_BY_RCU | SLAB_POISON)) || ((flags & SLAB_RED_ZONE) && s->object_size < sizeof(void *)) || s->ctor) { /* @@@ -5057,10 -4872,8 +5057,10 @@@ void __init kmem_cache_init(void
void __init kmem_cache_init_late(void) { +#ifndef CONFIG_SLUB_TINY flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0); WARN_ON(!flushwq); +#endif }
struct kmem_cache * @@@ -5111,7 -4924,7 +5111,7 @@@ int __kmem_cache_create(struct kmem_cac return 0; }
-#ifdef CONFIG_SYSFS +#ifdef SLAB_SUPPORTS_SYSFS static int count_inuse(struct slab *slab) { return slab->inuse; @@@ -5369,7 -5182,7 +5369,7 @@@ static void process_slab(struct loc_tra #endif /* CONFIG_DEBUG_FS */ #endif /* CONFIG_SLUB_DEBUG */
-#ifdef CONFIG_SYSFS +#ifdef SLAB_SUPPORTS_SYSFS enum slab_stat_type { SL_ALL, /* All slabs */ SL_PARTIAL, /* Only partially allocated slabs */ @@@ -5689,13 -5502,11 +5689,13 @@@ static ssize_t cache_dma_show(struct km SLAB_ATTR_RO(cache_dma); #endif
+#ifdef CONFIG_HARDENED_USERCOPY static ssize_t usersize_show(struct kmem_cache *s, char *buf) { return sysfs_emit(buf, "%u\n", s->usersize); } SLAB_ATTR_RO(usersize); +#endif
static ssize_t destroy_by_rcu_show(struct kmem_cache *s, char *buf) { @@@ -5775,21 -5586,7 +5775,21 @@@ static ssize_t failslab_show(struct kme { return sysfs_emit(buf, "%d\n", !!(s->flags & SLAB_FAILSLAB)); } -SLAB_ATTR_RO(failslab); + +static ssize_t failslab_store(struct kmem_cache *s, const char *buf, + size_t length) +{ + if (s->refcount > 1) + return -EINVAL; + + if (buf[0] == '1') + WRITE_ONCE(s->flags, s->flags | SLAB_FAILSLAB); + else + WRITE_ONCE(s->flags, s->flags & ~SLAB_FAILSLAB); + + return length; +} +SLAB_ATTR(failslab); #endif
static ssize_t shrink_show(struct kmem_cache *s, char *buf) @@@ -6006,9 -5803,7 +6006,9 @@@ static struct attribute *slab_attrs[] #ifdef CONFIG_FAILSLAB &failslab_attr.attr, #endif +#ifdef CONFIG_HARDENED_USERCOPY &usersize_attr.attr, +#endif #ifdef CONFIG_KFENCE &skip_kfence_attr.attr, #endif @@@ -6125,6 -5920,11 +6125,6 @@@ static int sysfs_slab_add(struct kmem_c struct kset *kset = cache_kset(s); int unmergeable = slab_unmergeable(s);
- if (!kset) { - kobject_init(&s->kobj, &slab_ktype); - return 0; - } - if (!unmergeable && disable_higher_order_debug && (slub_debug & DEBUG_METADATA_FLAGS)) unmergeable = 1; @@@ -6254,8 -6054,9 +6254,8 @@@ static int __init slab_sysfs_init(void mutex_unlock(&slab_mutex); return 0; } - -__initcall(slab_sysfs_init); -#endif /* CONFIG_SYSFS */ +late_initcall(slab_sysfs_init); +#endif /* SLAB_SUPPORTS_SYSFS */
#if defined(CONFIG_SLUB_DEBUG) && defined(CONFIG_DEBUG_FS) static int slab_debugfs_show(struct seq_file *seq, void *v) diff --combined mm/swapfile.c index 72e481aacd5d,4ee31056d3f8..3eedf7ae957f --- a/mm/swapfile.c +++ b/mm/swapfile.c @@@ -772,8 -772,7 +772,7 @@@ static void set_cluster_next(struct swa /* No free swap slots available */ if (si->highest_bit <= si->lowest_bit) return; - next = si->lowest_bit + - prandom_u32_max(si->highest_bit - si->lowest_bit + 1); + next = get_random_u32_inclusive(si->lowest_bit, si->highest_bit); next = ALIGN_DOWN(next, SWAP_ADDRESS_SPACE_PAGES); next = max_t(unsigned int, next, si->lowest_bit); } @@@ -973,23 -972,23 +972,23 @@@ done scan: spin_unlock(&si->lock); while (++offset <= READ_ONCE(si->highest_bit)) { - if (swap_offset_available_and_locked(si, offset)) - goto checks; if (unlikely(--latency_ration < 0)) { cond_resched(); latency_ration = LATENCY_LIMIT; scanned_many = true; } + if (swap_offset_available_and_locked(si, offset)) + goto checks; } offset = si->lowest_bit; while (offset < scan_base) { - if (swap_offset_available_and_locked(si, offset)) - goto checks; if (unlikely(--latency_ration < 0)) { cond_resched(); latency_ration = LATENCY_LIMIT; scanned_many = true; } + if (swap_offset_available_and_locked(si, offset)) + goto checks; offset++; } spin_lock(&si->lock); @@@ -3089,7 -3088,7 +3088,7 @@@ SYSCALL_DEFINE2(swapon, const char __us */ for_each_possible_cpu(cpu) { per_cpu(*p->cluster_next_cpu, cpu) = - 1 + prandom_u32_max(p->highest_bit); + get_random_u32_inclusive(1, p->highest_bit); } nr_cluster = DIV_ROUND_UP(maxpages, SWAPFILE_CLUSTER);
diff --combined net/core/neighbour.c index 952a54763358,ba92762de525..f00a79fc301b --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@@ -111,7 -111,7 +111,7 @@@ static void neigh_cleanup_and_release(s
unsigned long neigh_rand_reach_time(unsigned long base) { - return base ? prandom_u32_max(base) + (base >> 1) : 0; + return base ? get_random_u32_below(base) + (base >> 1) : 0; } EXPORT_SYMBOL(neigh_rand_reach_time);
@@@ -307,31 -307,7 +307,31 @@@ static int neigh_del_timer(struct neigh return 0; }
-static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net) +static struct neigh_parms *neigh_get_dev_parms_rcu(struct net_device *dev, + int family) +{ + switch (family) { + case AF_INET: + return __in_dev_arp_parms_get_rcu(dev); + case AF_INET6: + return __in6_dev_nd_parms_get_rcu(dev); + } + return NULL; +} + +static void neigh_parms_qlen_dec(struct net_device *dev, int family) +{ + struct neigh_parms *p; + + rcu_read_lock(); + p = neigh_get_dev_parms_rcu(dev, family); + if (p) + p->qlen--; + rcu_read_unlock(); +} + +static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net, + int family) { struct sk_buff_head tmp; unsigned long flags; @@@ -345,7 -321,13 +345,7 @@@ struct net_device *dev = skb->dev;
if (net == NULL || net_eq(dev_net(dev), net)) { - struct in_device *in_dev; - - rcu_read_lock(); - in_dev = __in_dev_get_rcu(dev); - if (in_dev) - in_dev->arp_parms->qlen--; - rcu_read_unlock(); + neigh_parms_qlen_dec(dev, family); __skb_unlink(skb, list); __skb_queue_tail(&tmp, skb); } @@@ -427,8 -409,7 +427,8 @@@ static int __neigh_ifdown(struct neigh_ write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev, skip_perm); pneigh_ifdown_and_unlock(tbl, dev); - pneigh_queue_purge(&tbl->proxy_queue, dev ? dev_net(dev) : NULL); + pneigh_queue_purge(&tbl->proxy_queue, dev ? dev_net(dev) : NULL, + tbl->family); if (skb_queue_empty_lockless(&tbl->proxy_queue)) del_timer_sync(&tbl->proxy_timer); return 0; @@@ -1640,8 -1621,13 +1640,8 @@@ static void neigh_proxy_process(struct
if (tdif <= 0) { struct net_device *dev = skb->dev; - struct in_device *in_dev;
- rcu_read_lock(); - in_dev = __in_dev_get_rcu(dev); - if (in_dev) - in_dev->arp_parms->qlen--; - rcu_read_unlock(); + neigh_parms_qlen_dec(dev, tbl->family); __skb_unlink(skb, &tbl->proxy_queue);
if (tbl->proxy_redo && netif_running(dev)) { @@@ -1666,7 -1652,7 +1666,7 @@@ void pneigh_enqueue(struct neigh_table struct sk_buff *skb) { unsigned long sched_next = jiffies + - prandom_u32_max(NEIGH_VAR(p, PROXY_DELAY)); + get_random_u32_below(NEIGH_VAR(p, PROXY_DELAY));
if (p->qlen > NEIGH_VAR(p, PROXY_QLEN)) { kfree_skb(skb); @@@ -1835,7 -1821,7 +1835,7 @@@ int neigh_table_clear(int index, struc cancel_delayed_work_sync(&tbl->managed_work); cancel_delayed_work_sync(&tbl->gc_work); del_timer_sync(&tbl->proxy_timer); - pneigh_queue_purge(&tbl->proxy_queue, NULL); + pneigh_queue_purge(&tbl->proxy_queue, NULL, tbl->family); neigh_ifdown(tbl, NULL); if (atomic_read(&tbl->entries)) pr_crit("neighbour leakage\n"); @@@ -3553,6 -3539,18 +3553,6 @@@ static int proc_unres_qlen(struct ctl_t return ret; }
-static struct neigh_parms *neigh_get_dev_parms_rcu(struct net_device *dev, - int family) -{ - switch (family) { - case AF_INET: - return __in_dev_arp_parms_get_rcu(dev); - case AF_INET6: - return __in6_dev_nd_parms_get_rcu(dev); - } - return NULL; -} - static void neigh_copy_dflt_parms(struct net *net, struct neigh_parms *p, int index) { diff --combined net/ipv4/inet_hashtables.c index 3cec471a2cd2,a879ec1a267d..d039b4e732a3 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@@ -858,80 -858,34 +858,80 @@@ inet_bhash2_addr_any_hashbucket(const s return &hinfo->bhash2[hash & (hinfo->bhash_size - 1)]; }
-int inet_bhash2_update_saddr(struct inet_bind_hashbucket *prev_saddr, struct sock *sk) +static void inet_update_saddr(struct sock *sk, void *saddr, int family) +{ + if (family == AF_INET) { + inet_sk(sk)->inet_saddr = *(__be32 *)saddr; + sk_rcv_saddr_set(sk, inet_sk(sk)->inet_saddr); + } +#if IS_ENABLED(CONFIG_IPV6) + else { + sk->sk_v6_rcv_saddr = *(struct in6_addr *)saddr; + } +#endif +} + +static int __inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family, bool reset) { struct inet_hashinfo *hinfo = tcp_or_dccp_get_hashinfo(sk); + struct inet_bind_hashbucket *head, *head2; struct inet_bind2_bucket *tb2, *new_tb2; int l3mdev = inet_sk_bound_l3mdev(sk); - struct inet_bind_hashbucket *head2; int port = inet_sk(sk)->inet_num; struct net *net = sock_net(sk); + int bhash; + + if (!inet_csk(sk)->icsk_bind2_hash) { + /* Not bind()ed before. */ + if (reset) + inet_reset_saddr(sk); + else + inet_update_saddr(sk, saddr, family); + + return 0; + }
/* Allocate a bind2 bucket ahead of time to avoid permanently putting * the bhash2 table in an inconsistent state if a new tb2 bucket * allocation fails. */ new_tb2 = kmem_cache_alloc(hinfo->bind2_bucket_cachep, GFP_ATOMIC); - if (!new_tb2) + if (!new_tb2) { + if (reset) { + /* The (INADDR_ANY, port) bucket might have already + * been freed, then we cannot fixup icsk_bind2_hash, + * so we give up and unlink sk from bhash/bhash2 not + * to leave inconsistency in bhash2. + */ + inet_put_port(sk); + inet_reset_saddr(sk); + } + return -ENOMEM; + }
+ bhash = inet_bhashfn(net, port, hinfo->bhash_size); + head = &hinfo->bhash[bhash]; head2 = inet_bhashfn_portaddr(hinfo, sk, net, port);
- if (prev_saddr) { - spin_lock_bh(&prev_saddr->lock); - __sk_del_bind2_node(sk); - inet_bind2_bucket_destroy(hinfo->bind2_bucket_cachep, - inet_csk(sk)->icsk_bind2_hash); - spin_unlock_bh(&prev_saddr->lock); - } + /* If we change saddr locklessly, another thread + * iterating over bhash might see corrupted address. + */ + spin_lock_bh(&head->lock);
- spin_lock_bh(&head2->lock); + spin_lock(&head2->lock); + __sk_del_bind2_node(sk); + inet_bind2_bucket_destroy(hinfo->bind2_bucket_cachep, inet_csk(sk)->icsk_bind2_hash); + spin_unlock(&head2->lock); + + if (reset) + inet_reset_saddr(sk); + else + inet_update_saddr(sk, saddr, family); + + head2 = inet_bhashfn_portaddr(hinfo, sk, net, port); + + spin_lock(&head2->lock); tb2 = inet_bind2_bucket_find(head2, net, port, l3mdev, sk); if (!tb2) { tb2 = new_tb2; @@@ -939,40 -893,26 +939,40 @@@ } sk_add_bind2_node(sk, &tb2->owners); inet_csk(sk)->icsk_bind2_hash = tb2; - spin_unlock_bh(&head2->lock); + spin_unlock(&head2->lock); + + spin_unlock_bh(&head->lock);
if (tb2 != new_tb2) kmem_cache_free(hinfo->bind2_bucket_cachep, new_tb2);
return 0; } + +int inet_bhash2_update_saddr(struct sock *sk, void *saddr, int family) +{ + return __inet_bhash2_update_saddr(sk, saddr, family, false); +} EXPORT_SYMBOL_GPL(inet_bhash2_update_saddr);
+void inet_bhash2_reset_saddr(struct sock *sk) +{ + if (!(sk->sk_userlocks & SOCK_BINDADDR_LOCK)) + __inet_bhash2_update_saddr(sk, NULL, 0, true); +} +EXPORT_SYMBOL_GPL(inet_bhash2_reset_saddr); + /* RFC 6056 3.3.4. Algorithm 4: Double-Hash Port Selection Algorithm * Note that we use 32bit integers (vs RFC 'short integers') * because 2^16 is not a multiple of num_ephemeral and this * property might be used by clever attacker. + * * RFC claims using TABLE_LENGTH=10 buckets gives an improvement, though - * attacks were since demonstrated, thus we use 65536 instead to really - * give more isolation and privacy, at the expense of 256kB of kernel - * memory. + * attacks were since demonstrated, thus we use 65536 by default instead + * to really give more isolation and privacy, at the expense of 256kB + * of kernel memory. */ -#define INET_TABLE_PERTURB_SHIFT 16 -#define INET_TABLE_PERTURB_SIZE (1 << INET_TABLE_PERTURB_SHIFT) +#define INET_TABLE_PERTURB_SIZE (1 << CONFIG_INET_TABLE_PERTURB_ORDER) static u32 *table_perturb;
int __inet_hash_connect(struct inet_timewait_death_row *death_row, @@@ -1097,7 -1037,7 +1097,7 @@@ ok * on low contention the randomness is maximal and on high contention * it may be inexistent. */ - i = max_t(int, i, prandom_u32_max(8) * 2); + i = max_t(int, i, get_random_u32_below(8) * 2); WRITE_ONCE(table_perturb[index], READ_ONCE(table_perturb[index]) + i + 2);
/* Head lock still held and bh's disabled */ diff --combined net/netfilter/nf_conntrack_core.c index 23b3fedd619a,8703812405eb..8006ca862551 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@@ -891,7 -891,7 +891,7 @@@ nf_conntrack_hash_check_insert(struct n zone = nf_ct_zone(ct);
if (!nf_ct_ext_valid_pre(ct->ext)) { - NF_CT_STAT_INC(net, insert_failed); + NF_CT_STAT_INC_ATOMIC(net, insert_failed); return -ETIMEDOUT; }
@@@ -906,7 -906,7 +906,7 @@@ nf_ct_zone_id(nf_ct_zone(ct), IP_CT_DIR_REPLY)); } while (nf_conntrack_double_lock(net, hash, reply_hash, sequence));
- max_chainlen = MIN_CHAINLEN + prandom_u32_max(MAX_CHAINLEN); + max_chainlen = MIN_CHAINLEN + get_random_u32_below(MAX_CHAINLEN);
/* See if there's one in the list already, including reverse */ hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[hash], hnnode) { @@@ -938,7 -938,7 +938,7 @@@
if (!nf_ct_ext_valid_post(ct->ext)) { nf_ct_kill(ct); - NF_CT_STAT_INC(net, drop); + NF_CT_STAT_INC_ATOMIC(net, drop); return -ETIMEDOUT; }
@@@ -1227,7 -1227,7 +1227,7 @@@ __nf_conntrack_confirm(struct sk_buff * goto dying; }
- max_chainlen = MIN_CHAINLEN + prandom_u32_max(MAX_CHAINLEN); + max_chainlen = MIN_CHAINLEN + get_random_u32_below(MAX_CHAINLEN); /* See if there's one in the list already, including reverse: NAT could have grabbed it without realizing, since we're not in the hash. If there is, we lost race. */ @@@ -1275,7 -1275,7 +1275,7 @@@ chaintoolong */ if (!nf_ct_ext_valid_post(ct->ext)) { nf_ct_kill(ct); - NF_CT_STAT_INC(net, drop); + NF_CT_STAT_INC_ATOMIC(net, drop); return NF_DROP; }
@@@ -1781,7 -1781,7 +1781,7 @@@ init_conntrack(struct net *net, struct }
#ifdef CONFIG_NF_CONNTRACK_MARK - ct->mark = exp->master->mark; + ct->mark = READ_ONCE(exp->master->mark); #endif #ifdef CONFIG_NF_CONNTRACK_SECMARK ct->secmark = exp->master->secmark; diff --combined net/packet/af_packet.c index 1ab65f7f2a0a,51a47ade92e8..96fea8afc004 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@@ -1350,7 -1350,7 +1350,7 @@@ static bool fanout_flow_is_huge(struct if (READ_ONCE(history[i]) == rxhash) count++;
- victim = prandom_u32_max(ROLLOVER_HLEN); + victim = get_random_u32_below(ROLLOVER_HLEN);
/* Avoid dirtying the cache line if possible */ if (READ_ONCE(history[victim]) != rxhash) @@@ -1386,7 -1386,7 +1386,7 @@@ static unsigned int fanout_demux_rnd(st struct sk_buff *skb, unsigned int num) { - return prandom_u32_max(num); + return get_random_u32_below(num); }
static unsigned int fanout_demux_rollover(struct packet_fanout *f, @@@ -2293,7 -2293,8 +2293,7 @@@ static int tpacket_rcv(struct sk_buff * if (skb->ip_summed == CHECKSUM_PARTIAL) status |= TP_STATUS_CSUMNOTREADY; else if (skb->pkt_type != PACKET_OUTGOING && - (skb->ip_summed == CHECKSUM_COMPLETE || - skb_csum_unnecessary(skb))) + skb_csum_unnecessary(skb)) status |= TP_STATUS_CSUM_VALID;
if (snaplen > res) @@@ -3519,7 -3520,8 +3519,7 @@@ static int packet_recvmsg(struct socke if (skb->ip_summed == CHECKSUM_PARTIAL) aux.tp_status |= TP_STATUS_CSUMNOTREADY; else if (skb->pkt_type != PACKET_OUTGOING && - (skb->ip_summed == CHECKSUM_COMPLETE || - skb_csum_unnecessary(skb))) + skb_csum_unnecessary(skb)) aux.tp_status |= TP_STATUS_CSUM_VALID;
aux.tp_len = origlen;
linux-merge@lists.open-mesh.org