han*_*ing 6 intel performancecounter msr perf
我在Haswell CPU(Intel Core i7-4790)上安装了perf.但是"性能列表"不包括"停滞 - 循环 - 前端"和"停滞循环 - 后端".我检查了http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html,但没有找到与表19中的停滞循环后端相关的性能事件 - 7(第四代英特尔酷睿处理器的处理器核心中的非架构性能事件).
所以我的问题是:如何使用Haswell CPU内核中的perf或其他工具来测量停滞循环后端.内核是3.19,perf版本也是3.19.
谢谢
perf_events是的,对于 Ivy Bridge 或 Haswell 等较新的处理器,内核子系统中没有“停顿周期前端”和“停顿周期后端”合成事件的映射。并且在较旧的 Core 2 上没有映射。可能,这个名称/概念/想法不适合现代乱序 CPU 的变化和复杂的微体系结构,而无需对全局“失速”进行简单的标量测量。
代码位于 中arch/x86/events/intel/core.c,合成事件名称为PERF_COUNT_HW_STALLED_CYCLES_FRONTEND和 PERF_COUNT_HW_STALLED_CYCLES_BACKEND:
__init int intel_pmu_init(void)
{...
Run Code Online (Sandbox Code Playgroud)
两者都是从 Nehalem、Westmere、Sandy Bridge 开始定义的:
case INTEL_FAM6_NEHALEM:
case INTEL_FAM6_NEHALEM_EP:
case INTEL_FAM6_NEHALEM_EX:
/* UOPS_ISSUED.STALLED_CYCLES */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
/* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
X86_CONFIG(.event=0xb1, .umask=0x3f, .inv=1, .cmask=1);
case INTEL_FAM6_WESTMERE:
case INTEL_FAM6_WESTMERE_EP:
case INTEL_FAM6_WESTMERE_EX:
/* UOPS_ISSUED.STALLED_CYCLES */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
/* UOPS_EXECUTED.CORE_ACTIVE_CYCLES,c=1,i=1 */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
X86_CONFIG(.event=0xb1, .umask=0x3f, .inv=1, .cmask=1);
case INTEL_FAM6_SANDYBRIDGE:
case INTEL_FAM6_SANDYBRIDGE_X:
/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
/* UOPS_DISPATCHED.THREAD,c=1,i=1 to count stall cycles*/
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_BACKEND] =
X86_CONFIG(.event=0xb1, .umask=0x01, .inv=1, .cmask=1);
Run Code Online (Sandbox Code Playgroud)
仅为 Ivy Bridge 定义了前端停顿
case INTEL_FAM6_IVYBRIDGE:
case INTEL_FAM6_IVYBRIDGE_X:
/* UOPS_ISSUED.ANY,c=1,i=1 to count stall cycles */
intel_perfmon_event_map[PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] =
X86_CONFIG(.event=0x0e, .umask=0x01, .inv=1, .cmask=1);
Run Code Online (Sandbox Code Playgroud)
对于较新的 CPU 桌面(Haswell、Broadwell、Skylake、Kaby Lake)和 Phi(KNL、KNM),没有前端和后端停顿的映射:
case INTEL_FAM6_HASWELL_CORE:
case INTEL_FAM6_HASWELL_X:
case INTEL_FAM6_HASWELL_ULT:
case INTEL_FAM6_HASWELL_GT3E:
case INTEL_FAM6_BROADWELL_CORE:
case INTEL_FAM6_BROADWELL_XEON_D:
case INTEL_FAM6_BROADWELL_GT3E:
case INTEL_FAM6_BROADWELL_X:
case INTEL_FAM6_XEON_PHI_KNL:
case INTEL_FAM6_XEON_PHI_KNM:
case INTEL_FAM6_SKYLAKE_MOBILE:
case INTEL_FAM6_SKYLAKE_DESKTOP:
case INTEL_FAM6_SKYLAKE_X:
case INTEL_FAM6_KABYLAKE_MOBILE:
case INTEL_FAM6_KABYLAKE_DESKTOP:
Run Code Online (Sandbox Code Playgroud)
也没有为旧 Core2 定义(没有检查 Atoms):
http://elixir.free-electrons.com/linux/v4.11/source/arch/x86/events/intel/core.c#L27
static u64 intel_perfmon_event_map[PERF_COUNT_HW_MAX] __read_mostly =
{
[PERF_COUNT_HW_CPU_CYCLES] = 0x003c,
[PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0,
[PERF_COUNT_HW_CACHE_REFERENCES] = 0x4f2e,
[PERF_COUNT_HW_CACHE_MISSES] = 0x412e,
[PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c4,
[PERF_COUNT_HW_BRANCH_MISSES] = 0x00c5,
[PERF_COUNT_HW_BUS_CYCLES] = 0x013c,
[PERF_COUNT_HW_REF_CPU_CYCLES] = 0x0300, /* pseudo-encoding */
};
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
420 次 |
| 最近记录: |