python读取csv文件指定数据_在Python中从CSV文件的特定列中提取数据

. . . and storing it in a PY file to use the data to graph after storing all the data in different files . . .

. . . I would want to store only "2345678@abcdef" and "365" in the new python file . . .

确实要将数据存储在python文件中吗?Python文件应该保存Python代码,它们应该可以由Python解释器执行。最好将数据存储在数据类型文件中(比如,preprocessed_data.csv)。在

要获得与模式匹配的文件列表,可以使用python内置的^{} library。在

下面是一个如何读取目录中的多个csv文件并从每个文件中提取所需列的示例:import glob

# indices of columns you want to preserve

desired_columns = [1, 4]

# change this to the directory that holds your data files

csv_directory = '/path/to/csv/files/*.csv'

# iterate over files holding data

extracted_data = []

for file_name in glob.glob(csv_directory):

with open(file_name, 'r') as data_file:

while True:

line = data_file.readline()

# stop at the end of the file

if len(line) == 0:

break

# splits the line by whitespace

tokens = line.split()

# only grab the columns we care about

desired_data = [tokens[i] for i in desired_columns]

extracted_data.append(desired_data)

将提取的数据写入新文件将很容易。以下示例显示如何将数据保存到csv文件。在

^{pr2}$

编辑:

如果您不想合并所有csv文件,以下是一次可以处理一个的版本:def process_file(input_path, output_path, selected_columns):

extracted_data = []

with open(input_path, 'r') as in_file:

while True:

line = in_file.readline()

if len(line) == 0: break

tokens = line.split()

extracted_data.append([tokens[i] for i in selected_columns])

output_string = ''

for row in extracted_data:

output_string += ','.join(row) + '\n'

with open(output_path, 'w') as out_file:

out_file.write(output_string)

# whenever you need to process a file:

process_file(

'/path/to/input.csv',

'/path/to/processed/output.csv',

[1, 4])

# if you want to process every file in a directory:

target_directory = '/path/to/my/files/*.csv'

for file in glob.glob(target_directory):

process_file(file, file + '.out', [1, 4])

编辑2:

以下示例将处理一个目录中的每个文件,并将结果写入另一个目录中同名的输出文件:import os

import glob

input_directory = '/path/to/my/files/*.csv'

output_directory = '/path/to/output'

for file in glob.glob(input_directory):

file_name = os.path.basename(file) + '.out'

out_file = os.path.join(output_directory, file_name)

process_file(file, out_file, [1, 4])

如果要将头添加到输出,则可以按如下方式修改process_file:def process_file(input_path, output_path, selected_columns, column_headers=[]):

extracted_data = []

with open(input_path, 'r') as in_file:

while True:

line = in_file.readline()

if len(line) == 0: break

tokens = line.split()

extracted_data.append([tokens[i] for i in selected_columns])

output_string = ','.join(column_headers) + '\n'

for row in extracted_data:

output_string += ','.join(row) + '\n'

with open(output_path, 'w') as out_file:

out_file.write(output_string)