python读取数组的前10个数据_Python:获取数据框中多个数组的按元素均值

I have a 16x10 panda dataframe with 1x35000 arrays (or NaN) in each cell. I want to take the element-wise mean over rows for each column.

1 2 3 ... 10

1 1x35000 1x35000 1x35000 1x35000

2 1x35000 NaN 1x35000 1x35000

3 1x35000 NaN 1x35000 NaN

...

16 1x35000 1x35000 NaN 1x35000

To avoid misunderstandings: take the first element of each array in the first column and take the mean. Then take the second element of each array in the first column and take the mean again. In the end I want to have a 1x10 dataframe with one 1x35000 array each per column. The array should be the element-wise mean of my former arrays.

1 2 3 ... 10

1 1x35000 1x35000 1x35000 1x35000

Do you have an idea to get there elegantly preferably without for-loops?

解决方案

Setup

np.random.seed([3,14159])

df = pd.DataFrame(

np.random.randint(10, size=(3, 3, 5)).tolist(),

list('XYZ'), list('ABC')

).applymap(np.array)

df.loc['X', 'B'] = np.nan

df.loc['Z', 'A'] = np.nan

df

A B C

X [4, 8, 1, 1, 9] NaN [8, 2, 8, 4, 9]

Y [4, 3, 4, 1, 5] [1, 2, 6, 2, 7] [7, 1, 1, 7, 8]

Z NaN [9, 3, 8, 7, 7] [2, 6, 3, 1, 9]

Solution

g = df.stack().groupby(level=1)

g.apply(np.sum, axis=0) / g.size()

A [4.0, 5.5, 2.5, 1.0, 7.0]

B [5.0, 2.5, 7.0, 4.5, 7.0]

C [5.66666666667, 3.0, 4.0, 4.0, 8.66666666667]

dtype: object

If you insist on the shape you presented

g = df.stack().groupby(level=1)

(g.apply(np.sum, axis=0) / g.size()).to_frame().T

A B C

0 [4.0, 5.5, 2.5, 1.0, 7.0] [5.0, 2.5, 7.0, 4.5, 7.0] [5.66666666667, 3.0, 4.0, 4.0, 8.66666666667]


版权声明:本文为weixin_32559133原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。