高质量音频混音算法及应用

一、音频数据格式：

处理16bit的PCM音频数据；

二、音频混音算法的原理：

混音的原理其实很简单，只要多路语音数据做线性叠加即可，但是叠加后的音频数据容易产生溢出，而且混合的路数越多，溢出的可能性越大，所以要做溢出处理，保证数据再32767~(-32768)之间，其实如何完美的处理混音叠加后溢出问题才是混音算法的关键；

1、衰减因子：

为了更好的解决溢出问题，可以使用一个衰减因子, 对音频数据进行逐渐衰减, 衰减因子会随着数据而变化. 当溢出时, 衰减因子比较小, 使溢出的音频数据衰减以后处于临界值以内, 当没有溢出时, 衰减因子会慢慢增加, 尽量保持数据的平滑变化. 而不是对于整帧使用同一个衰减因子来进行；采用衰减因子的方式进行调整以后, 混音的数据从听觉上基本感觉不到背景噪音，会感觉很舒适，也基本上不会出现爆破音，这种方式比较推荐；

2、平均法：

叠加以后再取平均值，这样不容易产生溢出，但是会照成音质衰减过大，各路音量会逐步变小，影响通话质量，所以一般不采取求平均这种方式；但是可以扩展一下，考虑参与混音的多路音视频信号自身特点，以它们自身的比例作为权重，比如如果是背景音乐可以适当权重低一些，领导讲话可以适当权重高一些，这样混合后的效果也会随着场景的应用会更合适一些。

3、边界值法：

将每一路的语音线性相加进行溢出检测，如果溢出，以最大值或最小值来替代。这样会造成声音波形的人为削峰，在破坏语音信号特性；

4、newlc算法：

算法原型：
Y = A + B - (A * B / (-(2 pow(n-1) -1)))
Y = A + B - (A * B / (2 pow(n-1))

这个算法目前也比较火，可以尝试。

三、去除本路

混音的时候, 一般情况是需要去除本路的音频数据, 这样就不会听到本地的声音, 只能听到其他 n − 1 路的声音；混音后通常会产生噪音，所以一般做一次降噪处理效果很更好。

下面是我做的一个测试程序以及部分测试代码说明：

int main(int argc, char* argv[])
{
	FILE *input1_fd = NULL, *input2_fd = NULL, *input3_fd = NULL, *output_fd = NULL;
	short shSpeechOut[DEFAULT_SAMPLE_FRAME_LEN];
	short shSpeechIn1[DEFAULT_SAMPLE_FRAME_LEN];
	short shSpeechIn2[DEFAULT_SAMPLE_FRAME_LEN];
	short shSpeechIn3[DEFAULT_SAMPLE_FRAME_LEN];

	short* pshVoiceOut = NULL;
	short* pshVoiceIn[3];

	int iRet = 0;

	/*输入文件1  输入文件2 输入文件3  混音后的文件 */
	if(argc < 5)
	{
		printf("Error:argc=%d\n",argc);
		printf("usage: inputfile1 inputfile2 inputfile3 mixfile\n");

		return (-1);
	};
	
	input1_fd = fopen(argv[1], "rb");
	if(input1_fd == NULL)
	{
		printf("Error:can not open audio input1 file %s.\n", argv[1]);
		return (-1);
	};

	input2_fd = fopen(argv[2], "rb");
	if (input2_fd == NULL)
	{
		printf("Error:can not open audio input2 file %s.\n", argv[2]);
		return (-1);
	};

	input3_fd = fopen(argv[3], "rb");
	if (input3_fd == NULL)
	{
		printf("Error:can not open audio input3 file %s.\n", argv[3]);
		return (-1);
	};

	output_fd = fopen(argv[4], "wb");
	if (output_fd == NULL)
	{
		printf("Error:can not open audio input3 file %s.\n", argv[4]);
		return (-1);
	};

	pshVoiceOut = shSpeechOut;

	while (!feof(input1_fd))
	{
		if((iRet = fread(shSpeechIn1, sizeof(short), DEFAULT_SAMPLE_FRAME_LEN, input1_fd))!= DEFAULT_SAMPLE_FRAME_LEN)
		{
			printf("file over.\n ");	
			break;
		}

		if ((iRet = fread(shSpeechIn2, sizeof(short), DEFAULT_SAMPLE_FRAME_LEN, input2_fd)) != DEFAULT_SAMPLE_FRAME_LEN)
		{
			printf("file over.\n ");
			break;
		}

		if ((iRet = fread(shSpeechIn3, sizeof(short), DEFAULT_SAMPLE_FRAME_LEN, input3_fd)) != DEFAULT_SAMPLE_FRAME_LEN)
		{
			printf("file over.\n ");
			break;
		}
		
		pshVoiceIn[0] = shSpeechIn1;
		pshVoiceIn[1] = shSpeechIn2;
		pshVoiceIn[2] = shSpeechIn3;
		
		uniqueMixVoice(pshVoiceIn, pshVoiceOut, DEFAULT_SAMPLE_FRAME_LEN,3);

		fwrite(shSpeechOut, sizeof(short), DEFAULT_SAMPLE_FRAME_LEN, output_fd);
	}

	fclose(input1_fd);
	fclose(input2_fd);
	fclose(input3_fd);
	fclose(output_fd);

	return 0;
}

项目地址：https://download.csdn.net/download/unique_no1/84992369

里面包含测试程序和可执行文件，windows平台和linux平台的我都已经编译好了，大家可以测试看下效果，接口调用也比较简单；

联系方式：

vx：unique_no_1

-----------------------------------------------------------------------------------------------------------------------------------------

原文链接：https://blog.csdn.net/unique_no1/article/details/123520817