分组求和并对结果排序python_python - Pandas对groupby的结果排序取TopK

我想按两列对DataFrame进行分组,然后对各组中的汇总结果进行排序,怎么做?

In [167]:

df

Out[167]:

count job source

0 2 sales A

1 4 sales B

2 6 sales C

3 3 sales D

4 7 sales E

5 5 market A

6 3 market B

7 2 market C

8 4 market D

9 1 market E

In [168]:

df.groupby(['job','source']).agg({'count':sum})

Out[168]:

count

job source

market A 5

B 3

C 2

D 4

E 1

sales A 2

B 4

C 6

D 3

E 7

现在,我想在每个组中按降序对计数列进行排序。然后只取前三行。得到如下的数据:

count

job source

market A 5

D 4

B 3

sales E 7

C 6

B 4

最佳回答

在第一个groupby的结果上,再次使用groupby操作:对每个组进行排序并取前三个元素的值。

从第一个groupby的结果开始:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

我们按索引的第一级分组:

In [63]: g = df_agg['count'].groupby(level=0, group_keys=False)

然后,我们要对每个组进行排序('order'),并采用前三个元素:

In [64]: res = g.apply(lambda x: x.order(ascending=False).head(3))

当然,更好的办法是使用快捷方式函数nlargest:

In [65]: g.nlargest(3)

Out[65]:

job source

market A 5

D 4

B 3

sales E 7

C 6

B 4

dtype: int64

次佳回答

也可以一行命令就搞定,方法是先进行排序,然后使用head取每组的前3个。

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)

Out[35]:

count job source

4 7 sales E

2 6 sales C

1 4 sales B

5 5 market A

8 4 market D

6 3 market B

一个更完整的示例:

In [43]: import pandas as pd

In [44]: df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})

In [45]: df

Out[45]:

count_1 count_2 name

0 5 100 Foo

1 10 150 Foo

2 12 100 Baar

3 15 25 Foo

4 20 250 Baar

5 25 300 Foo

6 30 400 Baar

7 35 500 Baar

### Top 3 on sorted order:

In [46]: df.groupby(["name"])["count_1"].nlargest(3)

Out[46]:

name

Baar 7 35

6 30

4 20

Foo 5 25

3 15

1 10

dtype: int64

### Sorting within groups based on column "count_1":

In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)

Out[48]:

count_1 count_2 name

0 35 500 Baar

1 30 400 Baar

2 20 250 Baar

3 12 100 Baar

4 25 300 Foo

5 15 25 Foo

6 10 150 Foo

7 5 100 Foo

参考资料


版权声明:本文为weixin_39691748原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。