python 数据分析案例——电影评分分析

import pandas as pd

pd.options.display.max_rows=100
unames = ['user_id', 'gender', 'age', 'occupation', 'zip']
users=pd.read_table('D://01//users.dat',sep='::',header=None,names=unames,engine='python')
users
rnames = ['user_id', 'movie_id', 'rating', 'timestamp']
ratings = pd.read_table('D://01//ratings.dat', sep='::',
                        header=None, names=rnames)
ratings
mnames = ['movie_id', 'title', 'genres']
movies = pd.read_table('D://01//movies.dat', sep='::',
                       header=None, names=mnames)
users[:5]
ratings[:5]
movies[:5]


data=pd.merge(pd.merge(ratings,users),movies)
data
data.iloc[0]
mean_ratings=data.pivot_table('rating',index='title',
                                columns='gender', aggfunc='mean')
mean_ratings[:5]

movies[:5]

ratings_by_title=data.groupby('title').size()
ratings_by_title[:10]
active_titles=ratings_by_title.index[ratings_by_title>=250]
active_titles

mean_ratings=mean_ratings.loc[active_titles]
mean_ratings

top_female_movies=mean_ratings.sort_values('F',ascending=False)
top_female_movies[:10]



mean_ratings['diff']=mean_ratings['M']-mean_ratings['F']
sorted_by_diff=mean_ratings.sort_values('diff',ascending=False)
sorted_by_diff[-10:]
sorted_by_diff[::-1][:10]

data['rating']

rating_std_by_title=data.groupby('title')['rating'].std()
rating_std_by_title=rating_std_by_title.loc[active_titles]
rating_std_by_title.sort_values(ascending=False)[:10]

数据源 关注

后台回复  电影1  获取


版权声明:本文为dudu3332原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。