Magenta魔改记-1：原始数据转换

前言

本文主要讲述Magenta项目原始数据整合的过程，并介绍了读取MIDI和XML的函数。通过本文我们可以看到，在原始音乐数据整合的过程中，Magenta将不同格式的数据转换到了一个接近MusicXML的统一格式中统一存储。

Magenta中有很多自动作曲模型，它们都使用不同格式的数据输入。在Magenta中，原始数据（MIDI,MusicXML等）先被转换成基于protocol buffers的NoteSequence，之后，根据模型的不同，再将NoteSequence转换成该模型需要的输入。

Magenta支持MIDI（.mid/.midi）、MusicXML（.xml/.mxl）、ABC（http://abcnotation.com，没有测试过）等格式的原始数据文件做训练数据。

通过convert_dir_to_note_sequences.py，这些原始数据被转换为NoteSequence，并以tfrecord格式储存。

接下来我们分析在将convert_dir_to_note_sequences.py中如何将MIDI/MusicXML文件转换成NoteSequence。

Magenta version:1.1.1

魔改-1.0：从命令行输入参数：

在Magenta的github中提供了如何将原始数据通过命令行转换为NoteSequence protocol buffers的方法：
https://github.com/tensorflow/magenta/tree/master/magenta/scripts#building-your-dataset

上述链接中提供的Linux命令行如下：

INPUT_DIRECTORY=<folder containing MIDI and/or MusicXML files. can have child folders.>

# TFRecord file that will contain NoteSequence protocol buffers.
SEQUENCES_TFRECORD=/tmp/notesequences.tfrecord

convert_dir_to_note_sequences \
  --input_dir=$INPUT_DIRECTORY \
  --output_file=$SEQUENCES_TFRECORD \
  --recursive

这一步的python命令行如下（摘自convert_dir_to_note_sequences.py源代码注释）：

Example usage:
  $ python magenta/scripts/convert_dir_to_note_sequences.py \
    --input_dir=/path/to/input/dir \
    --output_file=/path/to/tfrecord/file \
    --log=INFO

那么下面介绍如何在代码中直接修改这一步预处理的参数。

这一步运行的文件位置如下：

convert_dir_to_note_sequences.py

打开源代码我们可以看到，程序一开始就定义了一系列tf.flag：

FLAGS = tf.app.flags.FLAGS

tf.app.flags.DEFINE_string('input_dir', None,
                           'Directory containing files to convert.')
#输入路径
tf.app.flags.DEFINE_string('output_file', None,
                           'Path to output TFRecord file. Will be overwritten '
                           'if it already exists.')
#输出路径
tf.app.flags.DEFINE_bool('recursive', False,
                         'Whether or not to recurse into subdirectories.')
#是否递归查找子路径的文件

tf.app.flags.DEFINE_string('log', 'INFO',
                           'The threshold for what messages will be logged '
                           'DEBUG, INFO, WARN, ERROR, or FATAL.')
#显示消息的类型

tf.app.flags是Tensorflow中用于从命令行传递参数的模块，基于argparse。如果在运行时不输入参数，则会按程序中默认填写的参数运行。

通过python convert_dir_to_note_sequences.py –h可以显示注释信息和参数及其详情。
因此，我们在自定义参数时，既可以在命令行运行时输入：

python convert_dir_to_note_sequences.py --input_dir=XXX --output_file=YYY --recursive=True

同样，我们也可以把前面这几行当做超参数变量声明，直接在convert_dir_to_note_sequences.py中修改，然后运行这个文件。

除了命令行之外，我们接下来介绍如何在python文件中直接修改参数以及如何在jupyter环境中修改参数并调试。

魔改-2.0：在jupyter notebook中调试：

接下来，我们介绍如何在jupyter notebook中调试，并展现这个程序的详细原理以及文件储存的数据类型。

程序源代码地址：
https://github.com/tensorflow/magenta/blob/master/magenta/scripts/convert_dir_to_note_sequences.py

在本程序中，大致的运行步骤为：

先检测输入路径（以及子路径）中所有符合要求的文件，生成文件路径列表。
再根据路径列表多线程地处理数据，转换为NoteSequence。
保存为.tfrecord文件。

第一步对应queue_conversions(root_dir, sub_dir, pool, recursive=False)函数，在此不多展开。

第二步对应convert_midi(root_dir, sub_dir, full_file_path)、
convert_musicxml(root_dir, sub_dir, full_file_path)两个函数。顾名思义就是针对midi和xml文件的处理函数（一开始说的ABC数据处理函数未知）。它们的参数以及返回值可以在函数注释中找到详细的介绍。简单来说就是输入文件路径、文件所在文件夹路径、上一级路径，输出NoteSequence proto，一个在Magenta项目中用来表示音符序列的数据类型。

第三步则对应convert_directory(root_dir, output_file, num_threads,recursive=False)，是总的函数。

首先我们可以把这个文件导入：

import tensorflow as tf
import magenta as mgt
import magenta.scripts.convert_dir_to_note_sequences as cvrt

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0.
For more information, please see:
  * https://github.com/tensorflow/community/blob/master/rfcs/20180907-contrib-sunset.md
  * https://github.com/tensorflow/addons
If you depend on functionality not listed there, please file an issue.

导入之后我们就可以用查看子类的方式查看它的FLAGS参数：

print(cvrt.FLAGS)

magenta.scripts.convert_dir_to_note_sequences:
  --input_dir: Directory containing files to convert.
  --log: The threshold for what messages will be logged DEBUG, INFO, WARN,
    ERROR, or FATAL.
    (default: 'INFO')
  --output_file: Path to output TFRecord file. Will be overwritten if it already
    exists.
  --[no]recursive: Whether or not to recurse into subdirectories.
    (default: 'false')

absl.flags:
  --flagfile: Insert flag definitions from the given file into the command line.
    (default: '')
  --undefok: comma-separated list of flag names that it is okay to specify on
    the command line even if the program does not define a flag with that name.
    IMPORTANT: flags in this list that have arguments MUST use the --flag=value
    format.
    (default: '')

# 加这行是因为jupyter notebook对tf.app.flags.FLAGS有bug
# 见https://github.com/tensorflow/tensorflow/issues/17702
tf.app.flags.DEFINE_string('f', '', 'kernel')

因此我们也可以用修改FLAGS子类参数的方法运行本程序：

首先给参数赋值：

cvrt.FLAGS.input_dir = r'Dataset\raw\example-musicxml'
cvrt.FLAGS.output_file = r'Dataset\pre\example-musicxml.tfrecord'
cvrt.FLAGS.recursive = True
cvrt.FLAGS.log = 'INFO'

接着，运行main函数：

如果使用convert_dir_to_note_sequences.py文件中的运行方法，可以替换成tf.app.run(cvrt.main)，但是使用tf.app.run会使进程结束并抛出异常，这里我们先使用直接运行main函数的方法。在convert_dir_to_note_sequences.py中，main函数有一个不使用的占位参数unused_argv。

unused_argv = ''
cvrt.main(unused_argv)

INFO:tensorflow:Converting files in 'Dataset\raw\example-musicxml\'.
INFO:tensorflow:0 files converted.
INFO:tensorflow:Converted MusicXML file Dataset\raw\example-musicxml\bwv1.6.mxl.

这样，我们就完成了第一步NoteSequences的创建。

如上所说，转换MIDI和MusicXML对应convert_midi(root_dir, sub_dir, full_file_path)、convert_musicxml(root_dir, sub_dir, full_file_path)两个函数。

下面我们选取一个MusicXML文件和一个MIDI，分别来运行一下转换函数并看一下它们返回的结果。

MusicXML转换：

full_file_path_xml = r'Dataset\raw\example-musicxml\bwv1.6.mxl'
root_dir_xml = r'Dataset\raw\example-musicxml'
sub_dir_xml = r'Dataset\raw\example-musicxml'
sequence_xml = cvrt.convert_musicxml(root_dir_xml, sub_dir_xml,
                                     full_file_path_xml)

INFO:tensorflow:Converted MusicXML file Dataset\raw\example-musicxml\bwv1.6.mxl.

查看转换结果sequence_xml的类型

print(type(sequence_xml))

<class 'music_pb2.NoteSequence'>

我们可以看到sequence_xml是一个基于Google protobuf的数据类型。

接下来，我们查看sequence_xml的内容：

print(str(sequence_xml)[:1000])

id: "/id/musicxml/example-musicxml/b916b4d6787e8de96484206b4c617879add937ce"
filename: "Dataset\\raw\\example-musicxml\\bwv1.6.mxl"
collection_name: "example-musicxml"
ticks_per_quarter: 220
time_signatures {
  numerator: 4
  denominator: 4
}
time_signatures {
  time: 38.5
  numerator: 3
  denominator: 4
}
time_signatures {
  numerator: 1
  denominator: 4
}
time_signatures {
  time: 0.5
  numerator: 4
  denominator: 4
}
key_signatures {
  key: F
}
tempos {
  qpm: 120.0
}
notes {
  pitch: 65
  velocity: 64
  end_time: 0.5
  numerator: 1
  denominator: 4
  instrument: 7
  program: 1
  voice: 1
}
notes {
  pitch: 67
  velocity: 64
  start_time: 0.5
  end_time: 0.75
  numerator: 1
  denominator: 8
  instrument: 7
  program: 1
  voice: 1
}
notes {
  pitch: 60
  velocity: 64
  start_time: 0.75
  end_time: 1.0
  numerator: 1
  denominator: 8
  instrument: 7
  program: 1
  voice: 1
}
notes {
  pitch: 65
  velocity: 64
  start_time: 1.0
  end_time: 1.25
  numerator: 1
  denominator: 8
  instrum

从上面我们可以看到这里面包含了路径、id、以及xml中的内容。数据格式很像MusicXML与MIDI的结合，但将它们以类的形式结构化储存了。

于是，我们也可以直接访问它的子类：

print(sequence_xml.id)
print(sequence_xml.filename)
print(sequence_xml.source_info)
print(sequence_xml.total_time)

/id/musicxml/example-musicxml/b916b4d6787e8de96484206b4c617879add937ce
Dataset\raw\example-musicxml\bwv1.6.mxl
source_type: SCORE_BASED
encoding_type: MUSIC_XML
parser: MAGENTA_MUSIC_XML

40.0

print(type(sequence_xml.notes))

<class 'google.protobuf.pyext._message.RepeatedCompositeContainer'>

sequence_xml的notes类里面就是最主要的内容了，notes记录了所有的音符。
音符类当然也支持索引，我们可以看到每个音符由音高、音色、起始时间、终止时间等元素组成。

print(sequence_xml.notes[0])
print(sequence_xml.notes[20])
print(sequence_xml.notes[30])

pitch: 65
velocity: 64
end_time: 0.5
numerator: 1
denominator: 4
instrument: 7
program: 1
voice: 1

pitch: 70
velocity: 64
start_time: 5.0
end_time: 5.25
numerator: 1
denominator: 8
instrument: 7
program: 1
voice: 1

pitch: 71
velocity: 64
start_time: 7.0
end_time: 7.5
numerator: 1
denominator: 4
instrument: 7
program: 1
voice: 1

接下来，我们查看MIDI文件的转换结果：

full_file_path_midi = r'Dataset\raw\example-midi\Bwv0525 Sonate en trio n1.mid'
root_dir_midi = r'Dataset\raw\example-midii'
sub_dir_midi = r'Dataset\raw\example-midi'
sequence_midi = cvrt.convert_midi(root_dir_midi, sub_dir_midi,
                                  full_file_path_midi)

INFO:tensorflow:Converted MIDI file Dataset\raw\example-midi\Bwv0525 Sonate en trio n1.mid.

print(str(sequence_midi)[:1000])

id: "/id/midi/example-midii/eaec1aa71ccd1892886c79883c24a044c480a2ef"
filename: "Dataset\\raw\\example-midi\\Bwv0525 Sonate en trio n1.mid"
collection_name: "example-midii"
ticks_per_quarter: 480
time_signatures {
  numerator: 4
  denominator: 4
}
time_signatures {
  time: 187.07416394583333
  numerator: 12
  denominator: 8
}
time_signatures {
  time: 562.6120114458334
  numerator: 3
  denominator: 4
}
tempos {
  qpm: 75.0
}
tempos {
  time: 179.20000000000002
  qpm: 70.00007000007
}
tempos {
  time: 182.62856800000003
  qpm: 65.000065000065
}
tempos {
  time: 184.47472000000002
  qpm: 50.0
}
tempos {
  time: 185.07972
  qpm: 45.000011250002814
}
tempos {
  time: 187.07416394583333
  qpm: 120.0
}
tempos {
  time: 189.32416394583333
  qpm: 55.000004583333705
}
tempos {
  time: 554.7786789458335
  qpm: 45.000011250002814
}
tempos {
  time: 558.1120114458334
  qpm: 80.0
}
tempos {
  time: 848.3620114458334
  qpm: 50.0
}
notes {
  pitch: 70
  velocity: 92
  start_time: 6.4
  end_time: 6.80

print(sequence_midi.id)
print(sequence_midi.filename)
print(sequence_midi.source_info)
print(sequence_midi.total_time)

/id/midi/example-midii/eaec1aa71ccd1892886c79883c24a044c480a2ef
Dataset\raw\example-midi\Bwv0525 Sonate en trio n1.mid
encoding_type: MIDI
parser: PRETTY_MIDI

851.9745114458334

print(sequence_midi.notes[0])
print(sequence_midi.notes[20])
print(sequence_midi.notes[30])

pitch: 70
velocity: 92
start_time: 6.4
end_time: 6.800000000000001

pitch: 72
velocity: 92
start_time: 12.8
end_time: 13.200000000000001

pitch: 74
velocity: 92
start_time: 16.0
end_time: 16.400000000000002

我们看到，MIDI形式的储存格式和XML大同小异，但是由于MusicXML是用相对最小单位计时，而MIDI中是以绝对时间（秒）计时，MIDI中的起始和终止的时间不是整数。并且，MIDI转换而来的notes中元素更少，这是因为MIDI中包含了更少的信息。

总结

通过本文，我们研究了Magenta项目原始数据整合的过程，并介绍了读取MIDI和XML的函数。在原始音乐数据整合的过程中，Magenta将不同格式的数据转换到了一个接近MusicXML的统一格式中统一存储。

如果你想进行自己的项目的话，直接用Magenta的数据处理函数也是个不错的选择。Magenta中还有用于处理数据间转换的模块Piplines，见https://github.com/tensorflow/magenta/blob/master/magenta/pipelines，但是文档有些难懂。

其他

使用其他Python库读取并处理MIDI与MusicXML文件的方法，见：

MIDI文件以及XML文件的格式介绍，见：

对生成的tfrecord的读取，见下一节：

原文链接：https://blog.csdn.net/weixin_38090501/article/details/90524647