笔者最近想编译cuda kernel的代码,不知道何种原因,cuda-gdb总是进入不了cuda 的kernel代码。问了身边的大神们,他们表示自己也没有用过cuda-gdb调试程序。他们告诉我最简单的方式就是设置printf输出中间变量。可是笔者有疑惑了,cuda __device__ 打头的函数不是不支持调用host端的函数吗,为何printf能调?
我试着在kernel代码中加上printf(),编译
map.cu(28): error: calling a host function("printf") from a __device__/__global__ function("map_count") is not allowed
怎么会报错?然后又去网上找相关资料,也有人告诉我用cuPrintf()函数,于是我把所有的printf全部改为cuPrinf.编译还是会报错,怎么办
最后我在stackoverflow上看到了以下内容
4 | I am writing a cuda program and trying to print something inside the cuda kernels using the printf function. But when I am compiling the program then I am getting an error
I am using the card GTX 560 ti having a compute capability greater than 2.0 and when I have searched a bit about the printing from cuda kernels I also saw that I need to change the compiler from sm_10 to sm_2.0 to take the full advantage of the card. Also some suggested for cuPrintf to serve the purpose. I am bit confused what should I do and what should be the simplest and quickest way to get the printouts on my console screen. If I need to change the nvcc compiler from 1.0 to 2.0 then what should I do? One more thing I would like to mention that I am using windows 7.0 and programming in visual studio 2010. Thanks for all your help. 这个问题涉及到cuda的计算能力,在cuda计算能力2.0 以上才支持printf(这点我也不知道为何能调用printf) 于是笔者仔细看了自己的makefile文件,做了一下修改 注解掉了一部分,只保留-gencode=arch=compute_20,code=\"sm_20,compute_20\"(也就是使计算能力为2.0的选项编译它) 最后 make dbg=1,编译成功 |