2022-04-27 17:16:35.834265: I tensorflow/core/common_runtime/bfc_allocator.cc:1074] total_region_allocated_bytes_: 22727688192 memory_limit_: 22727688192 available bytes: 0 curr_region_allocation_bytes_: 45455376384
2022-04-27 17:16:35.834667: I tensorflow/core/common_runtime/bfc_allocator.cc:1080] Stats:
Limit: 22727688192
InUse: 22719993344
MaxInUse: 22727555072
NumAllocs: 1984779
MaxAllocSize: 3804246016
Reserved: 0
PeakReserved: 0
LargestFreeBlock: 0
2022-04-27 17:16:35.835488: W tensorflow/core/common_runtime/bfc_allocator.cc:468] ****************************************************************************************************
2022-04-27 17:16:35.835746: W tensorflow/core/framework/op_kernel.cc:1680] Resource exhausted: failed to allocate memory
Traceback (most recent call last):
File "D:/python-workspace/Mask_RCNN-master16/train333.py", line 320, in <module>
augmentation=augment_seq)
File "D:\python-workspace\Mask_RCNN-master16\mrcnn\model.py", line 2376, in train
use_multiprocessing=False
File "D:\Anaconda3\envs\py36_maskrcnn_env_bak\lib\site-packages\keras\engine\training_v1.py", line 796, in fit
use_multiprocessing=use_multiprocessing)
File "D:\Anaconda3\envs\py36_maskrcnn_env_bak\lib\site-packages\keras\engine\training_generator_v1.py", line 586, in fit
steps_name='steps_per_epoch')
File "D:\Anaconda3\envs\py36_maskrcnn_env_bak\lib\site-packages\keras\engine\training_generator_v1.py", line 252, in model_iteration
batch_outs = batch_function(*batch_data)
File "D:\Anaconda3\envs\py36_maskrcnn_env_bak\lib\site-packages\keras\engine\training_v1.py", line 1076, in train_on_batch
outputs = self.train_function(ins) # pylint: disable=not-callable
File "D:\Anaconda3\envs\py36_maskrcnn_env_bak\lib\site-packages\keras\backend.py", line 4032, in __call__
run_metadata=self.run_metadata)
File "D:\Anaconda3\envs\py36_maskrcnn_env_bak\lib\site-packages\tensorflow\python\client\session.py", line 1480, in __call__
run_metadata_ptr)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
(0) Resource exhausted: OOM when allocating tensor with shape[8,128,128,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node fpn_p4upsampled/resize/ResizeNearestNeighbor}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
[[Func/training_4/SGD/gradients/gradients/mrcnn_bbox_fc_dropout/dropout_1/cond_grad/StatelessIf/then/_414/input/_955/_7447]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
(1) Resource exhausted: OOM when allocating tensor with shape[8,128,128,256] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node fpn_p4upsampled/resize/ResizeNearestNeighbor}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.
0 successful operations.
0 derived errors ignored.
Process finished with exit code 1maskrcnn在win10 GPU rtx3090服务器,执行train.py的过程中,提示OOM错误,程序退出。
解决方法:
batch_size 就是 IMAGES_PER_GPU
1. 调小下面三个参数:
IMAGES_PER_GPU = 8
IMAGE_MIN_DIM = 400
IMAGE_MAX_DIM = 512
2.配置GPU利用参数:
(1)TensorFlow1.X
①占用GPU90%的显存
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.9
session = tf.Session(config=config)
②设置GPU使用量最小,动态分配现存
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.Session(config=config)
(2)TensorFlow2.X
①指定GPU使用量,占用GPU90%的显存
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
for gpu in gpus:
tf.config.experimental.per_process_gpu_memory_fraction = 0.9
②设置GPU使用量最小,动态分配现存
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)