The following commit has been merged in the master branch: commit 3a36281a17199737b468befb826d4a23eb774445 Merge: 7c70f3a7488d2fa62d32849d138bf2b8420fe788 3027ce36ccbae74f2e7c1afbfc3f69fee0c2a996 Author: Linus Torvalds torvalds@linux-foundation.org Date: Mon Feb 22 13:59:43 2021 -0800
Merge tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo: "New features:
- Support instruction latency in 'perf report', with both memory latency (weight) and instruction latency information, users can locate expensive load instructions and understand time spent in different stages.
- Extend 'perf c2c' to display the number of loads which were blocked by data or address conflict.
- Add 'perf stat' support for L2 topdown events in systems such as Intel's Sapphire rapids server.
- Add support for PERF_SAMPLE_CODE_PAGE_SIZE in various tools, as a sort key, for instance:
perf report --stdio --sort=comm,symbol,code_page_size
- New 'perf daemon' command to run long running sessions while providing a way to control the enablement of events without restarting a traditional 'perf record' session.
- Enable counting events for BPF programs in 'perf stat' just like for other targets (tid, cgroup, cpu, etc), e.g.:
# perf stat -e ref-cycles,cycles -b 254 -I 1000 1.487903822 115,200 ref-cycles 1.487903822 86,012 cycles 2.489147029 80,560 ref-cycles 2.489147029 73,784 cycles ^C
The example above counts 'cycles' and 'ref-cycles' of BPF program of id 254. It is similar to bpftool-prog-profile command, but more flexible.
- Support the new layout for PERF_RECORD_MMAP2 to carry the DSO build-id using infrastructure generalised from the eBPF subsystem, removing the need for traversing the perf.data file to collect build-ids at the end of 'perf record' sessions and helping with long running sessions where binaries can get replaced in updates, leading to possible mis-resolution of symbols.
- Support filtering by hex address in 'perf script'.
- Support DSO filter in 'perf script', like in other perf tools.
- Add namespaces support to 'perf inject'
- Add support for SDT (Dtrace Style Markers) events on ARM64.
perf record:
- Fix handling of eventfd() when draining a buffer in 'perf record'.
- Improvements to the generation of metadata events for pre-existing threads (mmaps, comm, etc), speeding up the work done at the start of system wide or per CPU 'perf record' sessions.
Hardware tracing:
- Initial support for tracing KVM with Intel PT.
- Intel PT fixes for IPC
- Support Intel PT PSB (synchronization packets) events.
- Automatically group aux-output events to overcome --filter syntax.
- Enable PERF_SAMPLE_DATA_SRC on ARMs SPE.
- Update ARM's CoreSight hardware tracing OpenCSD library to v1.0.0.
perf annotate TUI:
- Fix handling of 'k' ("show line number") hotkey
- Fix jump parsing for C++ code.
perf probe:
- Add protection to avoid endless loop.
cgroups:
- Avoid reading cgroup mountpoint multiple times, caching it.
- Fix handling of cgroup v1/v2 in mixed hierarchy.
Symbol resolving:
- Add OCaml symbol demangling.
- Further fixes for handling PE executables when using perf with Wine and .exe/.dll files.
- Fix 'perf unwind' DSO handling.
- Resolve symbols against debug file first, to deal with artifacts related to LTO.
- Fix gap between kernel end and module start on powerpc.
Reporting tools:
- The DSO filter shouldn't show samples in unresolved maps.
- Improve debuginfod support in various tools.
build ids:
- Fix 16-byte build ids in 'perf buildid-cache', add a 'perf test' entry for that case.
perf test:
- Support for PERF_SAMPLE_WEIGHT_STRUCT.
- Add test case for PERF_SAMPLE_CODE_PAGE_SIZE.
- Shell based tests for 'perf daemon's commands ('start', 'stop, 'reconfig', 'list', etc).
- ARM cs-etm 'perf test' fixes.
- Add parse-metric memory bandwidth testcase.
Compiler related:
- Fix 'perf probe' kretprobe issue caused by gcc 11 bug when used with -fpatchable-function-entry.
- Fix ARM64 build with gcc 11's -Wformat-overflow.
- Fix unaligned access in sample parsing test.
- Fix printf conversion specifier for IP addresses on arm64, s390 and powerpc.
Arch specific:
- Support exposing Performance Monitor Counter SPRs as part of extended regs on powerpc.
- Add JSON 'perf stat' metrics for ARM64's imx8mp, imx8mq and imx8mn DDR, fix imx8mm ones.
- Fix common and uarch events for ARM64's A76 and Ampere eMag"
* tag 'perf-tools-for-v5.12-2020-02-19' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (148 commits) perf buildid-cache: Don't skip 16-byte build-ids perf buildid-cache: Add test for 16-byte build-id perf symbol: Remove redundant libbfd checks perf test: Output the sub testing result in cs-etm perf test: Suppress logs in cs-etm testing perf tools: Fix arm64 build error with gcc-11 perf intel-pt: Add documentation for tracing virtual machines perf intel-pt: Split VM-Entry and VM-Exit branches perf intel-pt: Adjust sample flags for VM-Exit perf intel-pt: Allow for a guest kernel address filter perf intel-pt: Support decoding of guest kernel perf machine: Factor out machine__idle_thread() perf machine: Factor out machines__find_guest() perf intel-pt: Amend decoder to track the NR flag perf intel-pt: Retain the last PIP packet payload as is perf intel_pt: Add vmlaunch and vmresume as branches perf script: Add branch types for VM-Entry and VM-Exit perf auxtrace: Automatically group aux-output events perf test: Fix unaligned access in sample parsing test perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing ...
diff --combined tools/bpf/bpftool/Makefile index 8ced1655fea6,e3292a3a0c46..b3073ae84018 --- a/tools/bpf/bpftool/Makefile +++ b/tools/bpf/bpftool/Makefile @@@ -75,6 -75,8 +75,6 @@@ endi
INSTALL ?= install RM ?= rm -f -CLANG ?= clang -LLVM_STRIP ?= llvm-strip
FEATURE_USER = .bpftool FEATURE_TESTS = libbfd disassembler-four-args reallocarray zlib libcap \ @@@ -146,6 -148,8 +146,8 @@@ VMLINUX_BTF_PATHS ?= $(if $(O),$(O)/vml /boot/vmlinux-$(shell uname -r) VMLINUX_BTF ?= $(abspath $(firstword $(wildcard $(VMLINUX_BTF_PATHS))))
+ bootstrap: $(BPFTOOL_BOOTSTRAP) + ifneq ($(VMLINUX_BTF)$(VMLINUX_H),) ifeq ($(feature-clang-bpf-co-re),1)
@@@ -164,7 -168,7 +166,7 @@@ $(OUTPUT)%.bpf.o: skeleton/%.bpf.c $(OU -I$(srctree)/tools/include/uapi/ \ -I$(LIBBPF_PATH) \ -I$(srctree)/tools/lib \ - -g -O2 -target bpf -c $< -o $@ && $(LLVM_STRIP) -g $@ + -g -O2 -Wall -target bpf -c $< -o $@ && $(LLVM_STRIP) -g $@
$(OUTPUT)%.skel.h: $(OUTPUT)%.bpf.o $(BPFTOOL_BOOTSTRAP) $(QUIET_GEN)$(BPFTOOL_BOOTSTRAP) gen skeleton $< > $@ diff --combined tools/perf/Makefile.perf index f4df7534026d,8c4e039c3b81..5345ac70cd83 --- a/tools/perf/Makefile.perf +++ b/tools/perf/Makefile.perf @@@ -126,6 -126,8 +126,8 @@@ include ../scripts/utilities.ma # # Define NO_LIBDEBUGINFOD if you do not want support debuginfod # + # Define BUILD_BPF_SKEL to enable BPF skeletons + #
# As per kernel Makefile, avoid funny character set dependencies unexport LC_ALL @@@ -175,7 -177,14 +177,13 @@@ ende
LD += $(EXTRA_LDFLAGS)
+ HOSTCC ?= gcc + HOSTLD ?= ld + HOSTAR ?= ar + CLANG ?= clang + LLVM_STRIP ?= llvm-strip + PKG_CONFIG = $(CROSS_COMPILE)pkg-config -LLVM_CONFIG ?= llvm-config
RM = rm -f LN = ln -f @@@ -730,7 -739,8 +738,8 @@@ prepare: $(OUTPUT)PERF-VERSION-FILE $(O $(x86_arch_prctl_code_array) \ $(rename_flags_array) \ $(arch_errno_name_array) \ - $(sync_file_range_arrays) + $(sync_file_range_arrays) \ + bpf-skel
$(OUTPUT)%.o: %.c prepare FORCE $(Q)$(MAKE) -f $(srctree)/tools/build/Makefile.build dir=$(build-dir) $@ @@@ -1003,7 -1013,43 +1012,43 @@@ config-clean python-clean: $(python-clean)
- clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean + SKEL_OUT := $(abspath $(OUTPUT)util/bpf_skel) + SKEL_TMP_OUT := $(abspath $(SKEL_OUT)/.tmp) + SKELETONS := $(SKEL_OUT)/bpf_prog_profiler.skel.h + + ifdef BUILD_BPF_SKEL + BPFTOOL := $(SKEL_TMP_OUT)/bootstrap/bpftool + LIBBPF_SRC := $(abspath ../lib/bpf) + BPF_INCLUDE := -I$(SKEL_TMP_OUT)/.. -I$(BPF_PATH) -I$(LIBBPF_SRC)/.. + + $(SKEL_TMP_OUT): + $(Q)$(MKDIR) -p $@ + + $(BPFTOOL): | $(SKEL_TMP_OUT) + CFLAGS= $(MAKE) -C ../bpf/bpftool \ + OUTPUT=$(SKEL_TMP_OUT)/ bootstrap + + $(SKEL_TMP_OUT)/%.bpf.o: util/bpf_skel/%.bpf.c $(LIBBPF) | $(SKEL_TMP_OUT) + $(QUIET_CLANG)$(CLANG) -g -O2 -target bpf $(BPF_INCLUDE) \ + -c $(filter util/bpf_skel/%.bpf.c,$^) -o $@ && $(LLVM_STRIP) -g $@ + + $(SKEL_OUT)/%.skel.h: $(SKEL_TMP_OUT)/%.bpf.o | $(BPFTOOL) + $(QUIET_GENSKEL)$(BPFTOOL) gen skeleton $< > $@ + + bpf-skel: $(SKELETONS) + + .PRECIOUS: $(SKEL_TMP_OUT)/%.bpf.o + + else # BUILD_BPF_SKEL + + bpf-skel: + + endif # BUILD_BPF_SKEL + + bpf-skel-clean: + $(call QUIET_CLEAN, bpf-skel) $(RM) -r $(SKEL_TMP_OUT) $(SKELETONS) + + clean:: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean $(LIBBPF)-clean $(LIBSUBCMD)-clean $(LIBPERF)-clean config-clean fixdep-clean python-clean bpf-skel-clean $(call QUIET_CLEAN, core-objs) $(RM) $(LIBPERF_A) $(OUTPUT)perf-archive $(OUTPUT)perf-with-kcore $(LANG_BINDINGS) $(Q)find $(if $(OUTPUT),$(OUTPUT),.) -name '*.o' -delete -o -name '.*.cmd' -delete -o -name '.*.d' -delete $(Q)$(RM) $(OUTPUT).config-detected diff --combined tools/scripts/Makefile.include index 4255e71f72b7,62119ce69ad9..a402f32a145c --- a/tools/scripts/Makefile.include +++ b/tools/scripts/Makefile.include @@@ -69,13 -69,6 +69,13 @@@ HOSTCC ?= gc HOSTLD ?= ld endif
+# Some tools require Clang, LLC and/or LLVM utils +CLANG ?= clang +LLC ?= llc +LLVM_CONFIG ?= llvm-config +LLVM_OBJCOPY ?= llvm-objcopy +LLVM_STRIP ?= llvm-strip + ifeq ($(CC_NO_CLANG), 1) EXTRA_WARNINGS += -Wstrict-aliasing=3 endif @@@ -134,6 -127,7 +134,7 @@@ ifneq ($(silent),1 $(MAKE) $(PRINT_DIR) -C $$subdir QUIET_FLEX = @echo ' FLEX '$@; QUIET_BISON = @echo ' BISON '$@; + QUIET_GENSKEL = @echo ' GEN-SKEL '$@;
descend = \ +@echo ' DESCEND '$(1); \