. . . and storing it in a PY file to use the data to graph after storing all the data in different files . . .
. . . I would want to store only "2345678@abcdef" and "365" in the new python file . . .
确实要将数据存储在python文件中吗?Python文件应该保存Python代码,它们应该可以由Python解释器执行。最好将数据存储在数据类型文件中(比如,preprocessed_data.csv)。在
要获得与模式匹配的文件列表,可以使用python内置的^{} library。在
下面是一个如何读取目录中的多个csv文件并从每个文件中提取所需列的示例:import glob
# indices of columns you want to preserve
desired_columns = [1, 4]
# change this to the directory that holds your data files
csv_directory = '/path/to/csv/files/*.csv'
# iterate over files holding data
extracted_data = []
for file_name in glob.glob(csv_directory):
with open(file_name, 'r') as data_file:
while True:
line = data_file.readline()
# stop at the end of the file
if len(line) == 0:
break
# splits the line by whitespace
tokens = line.split()
# only grab the columns we care about
desired_data = [tokens[i] for i in desired_columns]
extracted_data.append(desired_data)
将提取的数据写入新文件将很容易。以下示例显示如何将数据保存到csv文件。在
^{pr2}$
编辑:
如果您不想合并所有csv文件,以下是一次可以处理一个的版本:def process_file(input_path, output_path, selected_columns):
extracted_data = []
with open(input_path, 'r') as in_file:
while True:
line = in_file.readline()
if len(line) == 0: break
tokens = line.split()
extracted_data.append([tokens[i] for i in selected_columns])
output_string = ''
for row in extracted_data:
output_string += ','.join(row) + '\n'
with open(output_path, 'w') as out_file:
out_file.write(output_string)
# whenever you need to process a file:
process_file(
'/path/to/input.csv',
'/path/to/processed/output.csv',
[1, 4])
# if you want to process every file in a directory:
target_directory = '/path/to/my/files/*.csv'
for file in glob.glob(target_directory):
process_file(file, file + '.out', [1, 4])
编辑2:
以下示例将处理一个目录中的每个文件,并将结果写入另一个目录中同名的输出文件:import os
import glob
input_directory = '/path/to/my/files/*.csv'
output_directory = '/path/to/output'
for file in glob.glob(input_directory):
file_name = os.path.basename(file) + '.out'
out_file = os.path.join(output_directory, file_name)
process_file(file, out_file, [1, 4])
如果要将头添加到输出,则可以按如下方式修改process_file:def process_file(input_path, output_path, selected_columns, column_headers=[]):
extracted_data = []
with open(input_path, 'r') as in_file:
while True:
line = in_file.readline()
if len(line) == 0: break
tokens = line.split()
extracted_data.append([tokens[i] for i in selected_columns])
output_string = ','.join(column_headers) + '\n'
for row in extracted_data:
output_string += ','.join(row) + '\n'
with open(output_path, 'w') as out_file:
out_file.write(output_string)