以下代码为帮别人写的代码,不会很难,遇到有相同情况可以直接使用。
函数功能:将数据表分组后进行分组均值填充
import pandas as pd
# 读取数据
df = pd.read_excel('table2.xlsx')
# 数据格式转换
#df['size']=pd.to_numeric(df['size'])
#df['GSP']=pd.to_numeric(df['GSP'])
# df['number']=pd.to_numeric(df['number'])
#df['NFA']=pd.to_numeric(df['NFA'])
#df['NFA']=pd.to_numeric(df['NFA'])
# 查看表的前5行(默认)
df.head()
# 查看数据类型
#type(df['size'][2000])
# 将表按ID分组
df_1 = df.groupby('ID')
#使分组后的每个子表用均值对NA进行填充,某些用0值填充
frames = []
for name, group in df_1:
group['AI'].fillna(0,inplace = True)
group['Year-PAI'].fillna(0,inplace = True)
group['Year-GAI'].fillna(0,inplace = True)
group['Year-SAI'].fillna(0,inplace = True)
group['rd'].fillna(group['rd'].mean(),inplace = True)
group['sales'].fillna(group['sales'].mean(),inplace = True)
group['NFA'].fillna(group['NFA'].mean(),inplace = True)
group['FA'].fillna(group['FA'].mean(),inplace = True)
group['AD'].fillna(group['AD'].mean(),inplace = True)
group['number'].fillna(group['number'].mean(),inplace = True)
group['scale'].fillna(group['scale'].mean(),inplace = True)
group['GSP'].fillna(group['GSP'].mean(),inplace = True)
group['quality'].fillna(0,inplace = True)
group['size'].fillna(group['size'].mean(),inplace = True)
frames.append(group)
result = pd.concat(frames)
result.fillna(0,inplace = True)
print("process done !")
# 将新表输出
result.to_excel('2.xlsx')
版权声明:本文为weixin_44536804原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。