Vtune 学习笔记 1 Finding Hotspots

2023-11-20 15:20

本文主要是介绍Vtune 学习笔记 1 Finding Hotspots,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

来源于手册

 

Workflow Steps to Identify and Analyze Hotspots

clip_image001

You can use the Intel® VTune™ Amplifier XE to identify and analyze hotspot functions in your serial or parallel application by performing a series of steps in a workflow. This tutorial guides you through these workflow steps while using a sample ray-tracer application named tachyon.

 

 

 

clip_image002

  1. Choose a target to analyze for hotspots.
  2. Configure environment and project settings and build your target.
  3. Choose and run the Hotspots analysis.
  4. Interpret the result data.
  5. View and analyze code of the performance-critical function.
  6. Modify the code to tune the algorithms or rebuild the code with Intel® Compiler.

 

clip_image003

 

66:这里的工程从 开发包里 解压而出

 

 

Build Target

clip_image001

After choosing the analysis target, do the following to ensure the Intel® VTune™ Amplifier XE provides the most accurate information on the performance of your application:

clip_image004

NOTE

The steps below are provided for Microsoft Visual Studio 2005. They may differ slightly for other versions of Visual Studio.

 

 

Enable Downloading the Debug Information for System Libraries

  1. Go to Tools > Options....
    The
    Options dialog box opens.
  2. From the left pane, select Debugging > Symbols.
  3. In the Symbol file (.pdb) locations field, click the  button and specify the following address: http://msdl.microsoft.com/download/symbols.
  4. Make sure the added address is checked.
  5. In the Cache symbols from symbol servers to this directory field, specify a directory where the downloaded symbol files will be stored.
  6. For Microsoft Visual Studio* 2005, check the Load symbols using the updated settings when this dialog is closed box.
  7. Click OK.

Enable Generating Debug Information for Your Binary Files

  1. Select the find_hotspots project and go to Project > Properties.
  2. From the find_hotspots Property Pages dialog box, select Configuration Properties > General and make sure the selected Configuration (top of the dialog) is Active(Release).
  3. From the find_hotspots Property Pages dialog box, select C/C++ > General pane and specify the Debug Information Format as Program Database (/Zi).
  4. From the find_hotspots Property Pages dialog box, select Linker > Debugging and set the Generate Debug Info option to Yes (/DEBUG).

Choose a Build Mode and Build a Target

  1. Go to the Build > Configuration Manager... dialog box and select the Release mode for your target project.
  2. From the Visual Studio menu, select Build > Build find_hotspots.
    The
    tachyon_find_hotspots.exe application is built.

clip_image004

NOTE

The build configuration for tachyon may initially be set to Debug, which is typically used for development. When analyzing performance issues with the VTune Amplifier XE, you are recommended to use the Release build with normal optimizations. In this way, the VTune Amplifier XE is able to analyze the realistic performance of your application.

Create a Performance Baseline

  1. From the Visual Studio menu, select Debug > Start Without Debugging.
    The
    tachyon_find_hotspots.exe application starts running.
    NOTE

Run Hotspots Analysis

clip_image001

In this tutorial, you run the Hotspots analysis to identify the hotspots that took much time to execute.

 

 

最重要的地方

 

Interpret Result Data

clip_image001

When the sample application exits, the Intel® VTune™ Amplifier XE finalizes the results and opens the Hotspots viewpoint that consists of the Summary, Bottom-up, and Top-down Tree windows. To interpret the data on the sample code performance, do the following:

  • Understand the basic performance metrics provided by the Hotspots analysis.
  • Analyze the most time-consuming functions.
  • Analyze CPU usage per function.

 

 

clip_image004

NOTE

The screenshots and execution time data provided in this tutorial are created on a system with four CPU cores. Your data may vary depending on the number and type of CPU cores on your system.

 

Understand the Basic Hotspots Metrics

Start analysis with the Summary window. To interpret the data, hover over the question mark icons

clip_image005

to read the pop-up help and better understand what each performance metric means.

clip_image006

Note that CPU Time for the sample application is equal to 64.907 seconds. It is the sum of CPU time for all application threads. Total Thread Count is 3, so the sample application is multi-threaded.

clip_image007

The Top Hotspots section provides data on the most time-consuming functions (hotspot functions) sorted by CPU time spent on their execution. For the sample application, the initialize_2D_buffer function, which took 27.671 seconds to execute, shows up at the top of the list as the hottest function.

The [Others] entry at the bottom shows the sum of CPU time for all functions not listed in the table.

 

Analyze the Most Time-consuming Functions

 

 

Click the Bottom-up tab to explore the Bottom-up pane. By default, the data in the grid is sorted by Function. You may change the grouping level using the Grouping drop-down menu at the top of the grid.

 

Analyze the CPU Time column values. This column is marked with a yellow star as the Data of Interest column. It means that the VTune Amplifier XE uses this type of data for some calculations (for example, filtering, stack contribution, and others). Functions that took most CPU time to execute are listed on top.

 

 

The initialize_2D_buffer function took 27.671 seconds to execute. Click the plus sign

clip_image008

at the initialize_2D_buffer function to expand the stacks calling this function. You see that it was called only by the setup_2D_buffer function.

 

源于buttom up

 

是不是按照第一个排序,就是 按照时间的顺序进行优化了啦?

 

 

clip_image009

 

Select the initialize_2D_buffer function in the grid and explore the data provided in the Call

Stack pane on the right.

 

The Call Stack pane displays full stack data for each hotspot function, enables you to navigate between function call stacks and understand the impact of each stack to the function CPU time. The stack functions in the Call Stack pane are represented in the following format:

<module>!<function> - <file>:<line number>, where the line number corresponds to the line calling the next function in the stack.

 

 

clip_image010

 

For the sample application, the hottest function initialize_2D_buffer is called at line 86 of the setup_2D_buffer function in the global.cpp file.

 

 

Analyze CPU Usage per Function

clip_image011

VTune Amplifier XE enables you to analyze the collected data from different perspectives by using multiple viewpoints.

 

For the Hotspots analysis result, you may switch to the Hotspots by CPU Usage viewpoint to understand how your hotspot function

performs in terms of the CPU usage. Explore this viewpoint to determine how your application utilized available cores and identify the most serial code.

 

If you go back to the Summary window, you can see the CPU Usage Histogram that represents the Elapsed time and usage level for the available logical processors. Ideally, the highest bar of your chart should match the Target level.

The tachyon_find_hotspots application ran mostly on one logical CPU. If you hover over the highest bar, you see that it spent 62.491 seconds using one core only, which is classified by the VTune Amplifier XE as a Poor utilization for a dual-core system. To understand what prevented the application from using all available logical CPUs effectively, explore the Bottom-up pane.

clip_image012

To get the detailed CPU usage information per function, use the

 

                    where??

 

clip_image013

button in the Bottom-up window to expand the CPU Time column.

Note that initialize_2D_buffer is the function with the longest poor CPU utilization (red

clip_image014

bars). This means that the processor cores were underutilized most of the time spent on executing this function.

 

clip_image015

 

 

 

 

 

 

If you change the grouping level (highlighted in the figure above) in the Bottom-up pane from Function/Call Stack to Thread/Function/Call Stack, you see that the initialize_2D_buffer function belongs to the thread_video thread. This thread is also identified as a hotspot and shows up at the top in the Bottom-up pane. To get detailed information on the hotspot thread performance, explore the Timeline pane

 

 

 

.

clip_image016

clip_image017

Timeline area. When you hover over the graph element, the timeline tooltip displays the time passed since the application has been launched.

clip_image018

Threads area that shows the distribution of CPU time utilization per thread. Hover over a bar to see the CPU time utilization in percent for this thread at each moment of time. Green zones show the time threads are active.

clip_image019

CPU Usage area that shows the distribution of CPU time utilization for the whole application. Hover over a bar to see the application-level CPU time utilization in percent at each moment of time.

VTune Amplifier XE calculates the overall CPU Usage metric as the sum of CPU time per each thread of the Threads area. Maximum CPU Usage value is equal to [number of processor cores] x 100%.

 

The Timeline analysis also identifies the thread_video thread as the most active. The tooltip shows that CPU time values rarely exceed 100% whereas the maximum CPU time value for dual-core systems is 200%. This means that the processor cores were half-utilized for most of the time spent on executing the tachyon_find_hotspots application.

 

 

Recap

You identified a function that took the most CPU time and could be a good candidate for algorithm tuning.

 

 

Analyze Code

clip_image001

You identified initialize_2D_buffer as the hottest function. In the Bottom-up pane, double-click this function to open the Source window and analyze the source code:

  • Understand basic options provided in the Source window.
  • Identify the hottest code lines.

 

66 是不是单击第一个打开函数堆栈,双击点开代码??

 

Understand Basic Source Window Options

clip_image020

 

 

The table below explains some of the features available in the Source window when viewing the Hotspots analysis data.

clip_image017

Source pane displaying the source code of the application if the function symbol information is available. The code line that took the most CPU time to execute is highlighted. The source code in the Source pane is not editable.

If the function symbol information is not available, the Assembly pane opens displaying assembler instructions for the selected hotspot function. To enable the Source pane, make sure tobuild the target properly.

 

 

clip_image018

Assembly pane displaying the assembler instructions for the selected hotspot function. Assembler instructions are grouped by basic blocks. The assembler instructions for the selected hotspot function are highlighted. To get help on an assembler instruction, right-click the instruction and select Instruction Reference.

clip_image004

NOTE

To get the help on a particular instruction, make sure to have the Adobe* Acrobat Reader* 9 (or later) installed. If an earlier version of the Adobe Acrobat Reader is installed, the Instruction Reference opens but you need to locate the help on each instruction manually.

clip_image019

Processor time attributed to a particular code line. If the hotspot is a system function, its time, by default, is attributed to the user function that called this system function.

 

clip_image021

Source window toolbar. Use the hotspot navigation buttons to switch between most performance-critical code lines. Hotspot navigation is based on the metric column selected as a Data of Interest. For the Hotspots analysis, this is CPU Time. Use the Source/Assembly buttons to toggle the Source/Assembly panes (if both of them are available) on/off.

 

 

clip_image022

Heat map markers to quickly identify performance-critical code lines (hotspots). The bright blue markers indicate hot lines for the function you selected for analysis. Light blue markers indicate hot lines for other functions. Scroll to a marker to locate the hot code line it identifies.

 

这里可以直接看到最大的消耗,看第5步骤

 

 

 

 

Tune Algorithms

clip_image001

In the Source window, you identified that in the initialize_2D_buffer hotspot function the code line 84 took the most CPU time. Focus on this line and do the following:

  • Open the code editor.
  • Resolve the performance problem using any of these options:
    • Optimize the algorithm used in this code section.
    • Recompile the code with the Intel® Compiler.

Open the Code Editor

In the Source window, click the

clip_image023

Source Editor button to open the find_hotspots.cpp file in the default code editor at the hotspot line:

clip_image024

 

 

66 作者举的例子是:赋值的时候,地址对齐与否啊。。。呵呵

 

Hotspot line 84 is used to initialize a memory array using non-sequential memory locations. For demonstration purposes, the code lines are commented as a slower method of filling the array.

 

Resolve the Problem

To resolve this issue, use one of the following methods:

Option 1: Optimize your algorithm

  1. Edit line 79 to comment out code lines 82-88 marked as a "First (slower) method".
  2. Edit line 95 to uncomment code lines 98-104 marked as a "Faster method".

In this step, you interchange the for loops to initialize the code in sequential memory locations.

  1. From the Visual Studio menu, select Build > Rebuild find_hotspots.

The project is rebuilt.

  1. From Visual Studio Debug menu, select Start Without Debugging to run the application.

clip_image025

Visual Studio runs the tachyon_find_hotspots.exe. Note that execution time has reduced from 63.609 seconds to 57.282 seconds.

Option 2: Recompile the code with Intel® Compiler

This option assumes that you have Intel® Composer XE installed. Composer XE is part of Intel® Parallel Studio XE. By default, the Intel® Compiler, one of the Composer components, uses powerful optimization switches, which typically provides some gain in performance. For more details on the Intel compiler, see the Intel Composer documentation.

As an alternative, you may consider running the default Microsoft Visual Studio compiler applying more aggressive optimization switches.

To recompile the code with the Intel compiler:

  1. From Visual Studio Project menu, select Intel Composer XE> Use Intel C++....
  2. In the Confirmation window, click OK to confirm your choice.

The project in Solution Explorer appears with the ComposerXE icon:

clip_image026

  1. From the Visual Studio menu, select Build > Rebuild find_hotspots.

The project is rebuilt with the Intel compiler.

  1. From the Visual Studio menu, select Debug > Start Without Debugging.

Visual Studio runs the tachyon_find_hotspots.exe. Note that the execution time reduced.

转载于:https://www.cnblogs.com/titer1/archive/2011/12/31/2309155.html

这篇关于Vtune 学习笔记 1 Finding Hotspots的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/395559

相关文章

HarmonyOS学习(七)——UI(五)常用布局总结

自适应布局 1.1、线性布局(LinearLayout) 通过线性容器Row和Column实现线性布局。Column容器内的子组件按照垂直方向排列,Row组件中的子组件按照水平方向排列。 属性说明space通过space参数设置主轴上子组件的间距,达到各子组件在排列上的等间距效果alignItems设置子组件在交叉轴上的对齐方式,且在各类尺寸屏幕上表现一致,其中交叉轴为垂直时,取值为Vert

Ilya-AI分享的他在OpenAI学习到的15个提示工程技巧

Ilya(不是本人,claude AI)在社交媒体上分享了他在OpenAI学习到的15个Prompt撰写技巧。 以下是详细的内容: 提示精确化:在编写提示时,力求表达清晰准确。清楚地阐述任务需求和概念定义至关重要。例:不用"分析文本",而用"判断这段话的情感倾向:积极、消极还是中性"。 快速迭代:善于快速连续调整提示。熟练的提示工程师能够灵活地进行多轮优化。例:从"总结文章"到"用

【前端学习】AntV G6-08 深入图形与图形分组、自定义节点、节点动画(下)

【课程链接】 AntV G6:深入图形与图形分组、自定义节点、节点动画(下)_哔哩哔哩_bilibili 本章十吾老师讲解了一个复杂的自定义节点中,应该怎样去计算和绘制图形,如何给一个图形制作不间断的动画,以及在鼠标事件之后产生动画。(有点难,需要好好理解) <!DOCTYPE html><html><head><meta charset="UTF-8"><title>06

学习hash总结

2014/1/29/   最近刚开始学hash,名字很陌生,但是hash的思想却很熟悉,以前早就做过此类的题,但是不知道这就是hash思想而已,说白了hash就是一个映射,往往灵活利用数组的下标来实现算法,hash的作用:1、判重;2、统计次数;

零基础学习Redis(10) -- zset类型命令使用

zset是有序集合,内部除了存储元素外,还会存储一个score,存储在zset中的元素会按照score的大小升序排列,不同元素的score可以重复,score相同的元素会按照元素的字典序排列。 1. zset常用命令 1.1 zadd  zadd key [NX | XX] [GT | LT]   [CH] [INCR] score member [score member ...]

【机器学习】高斯过程的基本概念和应用领域以及在python中的实例

引言 高斯过程(Gaussian Process,简称GP)是一种概率模型,用于描述一组随机变量的联合概率分布,其中任何一个有限维度的子集都具有高斯分布 文章目录 引言一、高斯过程1.1 基本定义1.1.1 随机过程1.1.2 高斯分布 1.2 高斯过程的特性1.2.1 联合高斯性1.2.2 均值函数1.2.3 协方差函数(或核函数) 1.3 核函数1.4 高斯过程回归(Gauss

【学习笔记】 陈强-机器学习-Python-Ch15 人工神经网络(1)sklearn

系列文章目录 监督学习:参数方法 【学习笔记】 陈强-机器学习-Python-Ch4 线性回归 【学习笔记】 陈强-机器学习-Python-Ch5 逻辑回归 【课后题练习】 陈强-机器学习-Python-Ch5 逻辑回归(SAheart.csv) 【学习笔记】 陈强-机器学习-Python-Ch6 多项逻辑回归 【学习笔记 及 课后题练习】 陈强-机器学习-Python-Ch7 判别分析 【学

系统架构师考试学习笔记第三篇——架构设计高级知识(20)通信系统架构设计理论与实践

本章知识考点:         第20课时主要学习通信系统架构设计的理论和工作中的实践。根据新版考试大纲,本课时知识点会涉及案例分析题(25分),而在历年考试中,案例题对该部分内容的考查并不多,虽在综合知识选择题目中经常考查,但分值也不高。本课时内容侧重于对知识点的记忆和理解,按照以往的出题规律,通信系统架构设计基础知识点多来源于教材内的基础网络设备、网络架构和教材外最新时事热点技术。本课时知识

线性代数|机器学习-P36在图中找聚类

文章目录 1. 常见图结构2. 谱聚类 感觉后面几节课的内容跨越太大,需要补充太多的知识点,教授讲得内容跨越较大,一般一节课的内容是书本上的一章节内容,所以看视频比较吃力,需要先预习课本内容后才能够很好的理解教授讲解的知识点。 1. 常见图结构 假设我们有如下图结构: Adjacency Matrix:行和列表示的是节点的位置,A[i,j]表示的第 i 个节点和第 j 个

Node.js学习记录(二)

目录 一、express 1、初识express 2、安装express 3、创建并启动web服务器 4、监听 GET&POST 请求、响应内容给客户端 5、获取URL中携带的查询参数 6、获取URL中动态参数 7、静态资源托管 二、工具nodemon 三、express路由 1、express中路由 2、路由的匹配 3、路由模块化 4、路由模块添加前缀 四、中间件