Use the CUDA Warp Watch

2023-11-30 19:58
文章标签 use cuda watch warp

本文主要是介绍Use the CUDA Warp Watch,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

VS2010的局部变量和全局变量工具窗口只支持在一个thread中查看变量一次,Nsight Debuger使用current focus thread and stack frame来计算这些表达式。

Nsight提出cuda Warp Watch的工具窗口,可以展示一个单独的focused Warp中的表达式的值的信息。

To use the CUDA Debugger Warp Watch feature: 
  1. 首先在VS中进入调试状态
  2. 从Nsight菜单中,选择Windows > CUDA Warp Watch. 
  3. 选择合适的Warp Watch窗口。



  4. 选择后,你可以在Warp Watch窗口中添加自己的表达式来观察,当Debugger遇到断点或异常而停止时,将会计算表达式的值

     

  5. 右击工具窗口,表达式可以从Warp Watch菜单中编辑

    The features shown here include the following: 
    • Add Watch - Adds a new expression to the Watch window. (You can use F2 to edit the expression in the current column.) 
    • Copy column - This will copy elements to the clipboard so they may be pasted into another document (e.g., a spreadsheet).
    • Delete Watch / Clear All - Deletes the current expression, or deletes all expressions that have been entered.
    • Hexadecimal Display - This menu item controls the Visual Studio global hexadecimal display setting. It is the same setting used in the Visual Studio watch, locals, and autos windows.
  6. CUDA focus可以通过以下四个工具窗口来改变
    • CUDA Focus Picker
    • CUDA Info Pages
    • The Next / Previous warp commands
    • A suspend event

    The view updates when the current CUDA focus changes, and always shows the warp that contains the current focus thread.

     

 

Example Scenarios

Example: Diverged Warp Watch

In this scenario, the lane is at a different PC than the focus lane. Lanes diverged from focus have a gray background.

Note this is NOT the same as inactive lanes. You can change to an inactive lane, and the other lanes will show diverged.


 
Example 1. An example of a successful evaluation, diverged at lane 16. In this figure, the focus is lane 16.


 
Example 2. Changing to a different focus shows the other lanes as diverged from focus. (This is represented by reversing the white and gray backgrounds.) In this instance, the lane was changed to 0, so lanes 16-31 are now gray; the PC is at line 54.


 
Example 3. Here, you can see that it is possible to have a variable that is valid in some lanes, but not in others.

Example: Error Types

Errors can occur for various reasons within the warp watch. For example: 

  • A lane may be at a different PC; hence the evaluation scope of a given expression could be different.
  • A shadow variable could be a different type than the focus lane.

 
Example 4. This illustrates a shadow variable error. Here, the focus variable's type isfloat, but it is shadowing anint.

 

Another common cause for error is when the lane has diverged from the focus lane and is in a different stack frame. The CUDA Warp Watch feature does not evaluate in other stack frames.

 
Example 5. This illustrates a stack frame error. Here, the frame "SubFrame" does not exist in the even lanes, hence it cannot evaluate.

 

Warp Exceptions

The CUDA Memory Checker shows results in the Warps Page of the CUDA Info window. Select theWarp Exceptions bookmark, and it will filter to show any warp that is currently at an exception. These exceptions will match the output in the Output window.

When an exception is detected, the CUDA Debugger will stop in the first CUDA thread that triggered the exception. Use theSet Focus command in the Lanes and Warps pages, to switch to other threads.


Figure 1. A misaligned load in Global memory. The exception is being hit on the first 16 lanes in 5 warps.


Figure 2. The focus lane shows that 16 of the lanes have hit the exception.

  

这篇关于Use the CUDA Warp Watch的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/438343

相关文章

Ubuntu系统怎么安装Warp? 新一代AI 终端神器安装使用方法

《Ubuntu系统怎么安装Warp?新一代AI终端神器安装使用方法》Warp是一款使用Rust开发的现代化AI终端工具,该怎么再Ubuntu系统中安装使用呢?下面我们就来看看详细教程... Warp Terminal 是一款使用 Rust 开发的现代化「AI 终端」工具。最初它只支持 MACOS,但在 20

Thread如何划分为Warp?

1 .Thread如何划分为Warp? https://jielahou.com/code/cuda/thread-to-warp.html  Thread Index和Thread ID之间有什么关系呢?(线程架构参考这里:CUDA C++ Programming Guide (nvidia.com)open in new window) 1维的Thread Index,其Thread

PyInstaller问题解决 onnxruntime-gpu 使用GPU和CUDA加速模型推理

前言 在模型推理时,需要使用GPU加速,相关的CUDA和CUDNN安装好后,通过onnxruntime-gpu实现。 直接运行python程序是正常使用GPU的,如果使用PyInstaller将.py文件打包为.exe,发现只能使用CPU推理了。 本文分析这个问题和提供解决方案,供大家参考。 问题分析——找不到ONNX Runtime GPU 动态库 首先直接运行python程序

GDB watch starti i files

watch break starti 在程序的最初开始运行的位置处断下来 ​​ i files 查看程序及加载的 so 的 sections ​​

Tomcat启动报错:transport error 202: bind failed: Address already in use

Tomcat启动报错:transport error 202: bind failed: Address already in use 了,上网查找了下面这篇文章。也是一种解决办法。 下文来自:http://blog.csdn.net/sam031503/article/details/7037033 tomcat 启动日志报出以下错误:  ERROR: transport err

Unity Adressables 使用说明(五)在运行时使用 Addressables(Use Addressables at Runtime)

一旦你将 Addressable assets 组织到 groups 并构建到 AssetBundles 中,就需要在运行时加载、实例化和释放它们。 Addressables 使用引用计数系统来确保 assets 只在需要时保留在内存中。 Addressables 初始化 Addressables 系统在运行时第一次加载 Addressable 或进行其他 Addressable API 调

CUDA:用并行计算的方法对图像进行直方图均衡处理

(一)目的 将所学算法运用于图像处理中。 (二)内容 用并行计算的方法对图像进行直方图均衡处理。 要求: 利用直方图均衡算法处理lena_salt图像 版本1:CPU实现 版本2:GPU实现  实验步骤一 软件设计分析: 数据类型: 根据实验要求,本实验的数据类型为一个256*256*8的整型矩阵,其中元素的值为256*256个0-255的灰度值。 存储方式: 图像在内存中

ffmpeg安装测试(支持cuda支持SRT)

文章目录 背景安装ffmpeg直接下载可执行文件选择版本选择对应系统版本下载测试Linux下安装 查看支持协议以及编码格式 常见错误缺少 libmvec.so.1LD_LIBRARY_PATH 错误 GPU加速测试SRT服务器搭建下载srs5.0源码解压安装配置启动 SRT推流测试SRT播放测试 背景 在音视频开发测试中,FFmpeg是一个不可或缺的工具,它以其强大的音视频处理

torch.backends.cudnn.benchmark和torch.use_deterministic_algorithms总结学习记录

经常使用PyTorch框架的应该对于torch.backends.cudnn.benchmark和torch.use_deterministic_algorithms这两个语句并不陌生,在以往开发项目的时候可能专门化花时间去了解过,也可能只是浅尝辄止简单有关注过,正好今天再次遇到了就想着总结梳理一下。 torch.backends.cudnn.benchmark 是 PyTorch 中的一个设置