Non-OK-status: CudaLaunchKernel(FillPhiloxRandomKernelLaunch＜Distribution＞, num_blocks, block_size,

记录一个错误
2020-12-07 21:43:46.522997: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla V100-SXM2-32GB, Compute Capability 7.0
2020-12-07 21:43:46.575889: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 25.44M (26673152 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-12-07 21:43:46.579195: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 22.89M (24005888 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-12-07 21:43:46.580910: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 20.60M (21605376 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-12-07 21:43:46.582638: E tensorflow/stream_executor/cuda/cuda_driver.cc:828] failed to allocate 18.54M (19444992 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY: out of memory
2020-12-07 21:43:46.639245: F ./tensorflow/core/kernels/random_op_gpu.h:227] Non-OK-status: CudaLaunchKernel(FillPhiloxRandomKernelLaunch, num_blocks, block_size, 0, d.stream(), gen, data, size, dist) status: Internal: out of memory
Aborted (core dumped)
使用keras训练一个二分类模型时候遇到一个错误。模型训练时，遇到OOM问题。
在Linux系统下训练模型，使用指定GPU：
os.environ[“CUDA_DEVICE_ORDER”] = “PCI_BUS_ID”
os.environ[“CUDA_VISIBLE_DEVICES”] = “5”
使用模型也是很简单的卷积+池化＋全连接。
前段时间跑了模型没问题，今天使用另一个数据集（很小200M左右）重新加载模型进行训练时，遇到了OOM问题。
1，换了个空闲GPU还是同样的问题。
2，以为数据集的问题，但是减小数量时减小到10个样本量，还是遇到这样的问题。
网上找了，有说限制GPU的默认使用量。（就是这个方法解决的）
一开始没使用，因为模型什么的都一样，换个数据集就报错了，我以为是其他问题。
后来解决：
import tensorflow as tf
os.environ[“CUDA_DEVICE_ORDER”] = “PCI_BUS_ID”
os.environ[“CUDA_VISIBLE_DEVICES”] = “5”
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.83)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

原因：默认使用的GPU大小超过了当前GPU的最大值，或者说默认使用量太大了，只能手动限制到0.83（数值随便取得，其实更小一点也可以训练模型）。

原文链接：https://blog.csdn.net/weixin_45728409/article/details/110845683