python mysql就业情况_影响员工离职因素分析—Mysql/python

本文采用python和mysql两种方式对员工离职数据进行分析,试图找出员工离职的影响因素,降低离职率,减少企业的人员招聘成本及员工培养成本。

提取码:qxvl

数据大小:15000条数据

指标说明:

第一部分 Mysql

一、观察数据

1.1 查看数据完整性

-- 查看缺失值

SELECT COUNT(*),

COUNT(satisfaction_level),

COUNT(last_evaluation),

COUNT(number_project),

COUNT(average_montly_hours),

COUNT(time_spend_company),

COUNT(Work_accident),

COUNT(`left`),

COUNT(promotion_last_5years),

COUNT(promotion_last_5years),

COUNT(sales),

COUNT(salary)

FROM hr_comma_sep;

或者

select sum((case when satisfaction_level is null then 1 else 0 end ))

from hr_comma_sep;

由结果可知,无缺失值,共14999员工。

1.2 各部门离职人数

首先通过mysql汇总各部门人员的离职情况,其次,结合数据透视表,给出各部门的离职率。

#汇总各部门是否离职的人数select sales as 部门,`left` as 是否离职, count(`left`) as 部门人数

from hr_comma_sep

group by sales ,`left`

order by sales;

从上述结果可以得知,14999人中,离职率为23.81%。其中,hr部门、accounting和technical三个部门的离职率最高,分别为29.09%,26.6%和25.63%,而management部门的离职率最低,仅有14.4%。

1.3 员工满意度与离职率之间的关系

SELECT `left`,

sum(case WHEN satisfaction_level >0 AND satisfaction_level <= 0.1 THEN 1 ELSE 0 END) as '0~0.1',

sum(case WHEN satisfaction_level >0.1 AND satisfaction_level <= 0.2 THEN 1 ELSE 0 END) as '0.1~0.2',

sum(case WHEN satisfaction_level >0.2 AND satisfaction_level <= 0.3 THEN 1 ELSE 0 END) as '0.2~0.3' ,

sum(case WHEN satisfaction_level >0.3 AND satisfaction_level <= 0.4 THEN 1 ELSE 0 END) as '0.3~0.4',

sum(case WHEN satisfaction_level >0.4 AND satisfaction_level <= 0.5 THEN 1 ELSE 0 END) as '0.4~0.5',

sum(case WHEN satisfaction_level >0.5 AND satisfaction_level <= 0.6 THEN 1 ELSE 0 END) as '0.5~0.6',

sum(case WHEN satisfaction_level >0.6 AND satisfaction_level <= 0.7 THEN 1 ELSE 0 END) as '0.6~0.7',

sum(case WHEN satisfaction_level >0.7 AND satisfaction_level <= 0.8 THEN 1 ELSE 0 END) as '0.7~0.8',

sum(case WHEN satisfaction_level >0.8 THEN 1 ELSE 0 END) as '0.8以上'

from hr_comma_sep

group by `left`;员工满意度与离职情况

由此可见,随着满意度的增高,离职率呈现先减小后增加的趋势,在满意度为0.6-0.7时,员工的离职率最低,约为1.9%。

1.4 员工参与项目数与离职率之间的关系

select `left`,

sum(case when number_project='1' then 1 else 0 end) as 1个项目,

sum(case when number_project='2' then 1 else 0 end) as 2个项目,

sum(case when number_project='3' then 1 else 0 end) as 3个项目,

sum(case when number_project='4' then 1 else 0 end) as 4个项目,

sum(case when number_project='5' then 1 else 0 end) as 5个项目,

sum(case when number_project='6' then 1 else 0 end) as 6个项目,

sum(case when number_project='7' then 1 else 0 end) as 7个项目

from hr_comma_sep

group by `left`;

由此可知,14999人中,所有人都参与超过一个项目;

参与3-4个项目的人员最多,分别任4055人和4365人,合计超过总人数的1/2,但离职率低,参与3个项目的离职率为1.8%,参与4个项目的离职率为9.4%;

参与2各项目、6个项目和7个项目的人员离职率均超50%。

1.5 员工薪资水平与离职率关系

select `left`,

sum(case when salary='low' then 1 else 0 end) as `low`,

sum(case when salary='medium' then 1 else 0 end) as `medium`,

sum(case when salary='high' then 1 else 0 end) as `high`

from hr_comma_sep

group by `left`;

随着薪资水平的增加,离职率逐渐降低。

1.6 员工工作年限与离职率的关系

SELECT `left`,

sum(case WHEN time_spend_company =1 THEN 1 ELSE 0 END) as '1年',

sum(case WHEN time_spend_company =2 THEN 1 ELSE 0 END) as '2年',

sum(case WHEN time_spend_company =3 THEN 1 ELSE 0 END) as '3年' ,

sum(case WHEN time_spend_company =4 THEN 1 ELSE 0 END) as '4年',

sum(case WHEN time_spend_company =5 THEN 1 ELSE 0 END) as '5年',

sum(case WHEN time_spend_company =6 THEN 1 ELSE 0 END) as '6年',

sum(case WHEN time_spend_company =7 THEN 1 ELSE 0 END) as '7年',

sum(case WHEN time_spend_company =8 THEN 1 ELSE 0 END) as '8年',

sum(case WHEN time_spend_company =9 THEN 1 ELSE 0 END) as '9年',

sum(case WHEN time_spend_company =10 THEN 1 ELSE 0 END) as '10年'

from hr_comma_sep

group by `left`;随着工作年限的增加,离职率先增后减,工作5年的离职率最高;

超过7年的员工,几乎不离职,员工的忠诚性很高;

工作一年之内,员工几乎不离职。

1.7 绩效评估与离职率的关系

sum(case WHEN last_evaluation >0 AND last_evaluation <= 0.1 THEN 1 ELSE 0 END) as '0~0.1',

sum(case WHEN last_evaluation >0.1 AND last_evaluation <= 0.2 THEN 1 ELSE 0 END) as '0.1~0.2',

sum(case WHEN last_evaluation >0.2 AND last_evaluation <= 0.3 THEN 1 ELSE 0 END) as '0.2~0.3' ,

sum(case WHEN last_evaluation >0.3 AND last_evaluation <= 0.4 THEN 1 ELSE 0 END) as '0.3~0.4',

sum(case WHEN last_evaluation >0.4 AND last_evaluation <= 0.5 THEN 1 ELSE 0 END) as '0.4~0.5',

sum(case WHEN last_evaluation >0.5 AND last_evaluation <= 0.6 THEN 1 ELSE 0 END) as '0.5~0.6',

sum(case WHEN last_evaluation >0.6 AND last_evaluation <= 0.7 THEN 1 ELSE 0 END) as '0.6~0.7',

sum(case WHEN last_evaluation >0.7 AND last_evaluation <= 0.8 THEN 1 ELSE 0 END) as '0.7~0.8',

sum(case WHEN last_evaluation >0.8 THEN 1 ELSE 0 END) as '0.8以上'

from hr_comma_sep

group by `left`;所有人的绩效评估均在0.3以上;

当绩效评估大于0.4时,离职率先降低后增加,在0.4-0.5时员工的离职率最高 ,在0.6--0.7时离职率最低;

1.8 工作事故与离职率的关系

SELECT `left`,

sum(case WHEN Work_accident = '1' THEN 1 ELSE 0 END) as '有',

sum(case WHEN Work_accident = '0' THEN 1 ELSE 0 END) as '无'

from hr_comma_sep

group by `left`;

有发生过事故的员工的离职率低于未发生过事故的员工的离职率,说明事故不是诱导员工离职的原因。

1.9 员工升职与离职率的关系

SELECT `left`,

sum(case WHEN promotion_last_5years = '1' THEN 1 ELSE 0 END) as '有',

sum(case WHEN promotion_last_5years = '0' THEN 1 ELSE 0 END) as '无'

from hr_comma_sep

group by `left`;

过去5年,有升职的员工的离职率为6%,无升职人员的离职率为24.2%。因此,升职有助于加强员工归属感,减少离职率。

第二部分 python

1.导入数据

# 导入数据分析包

import numpy as np

import pandas as pd

# 导入数据

employees = pd.read_csv('F:\\数据分析(每日练习)\\员工离职数据分析(10.25)\\HR_comma_sep.csv')

#查看数据

employees.info()employees数据集概况

该数据集共包含14999条数据,一共10列,无缺失数据。

# 查看数据

employees.head()

2.数据分析—可视化呈现2.1 离职率整体占比情况

import matplotlib.pyplot as plt

import seaborn as sns

#解决中文乱码

import matplotlib as mpl

mpl.rcParams['font.sans-serif']=['SimHei']

mpl.rcParams['axes.unicode_minus']=False

%matplotlib inline

接下来,准备数据,画图离职率。

# 观察离职数据

employees['left'].value_counts()

14999条数据中,离职人数3571人,未离职人员11428人。

# 离职数据

employees_left = employees['left'].value_counts()

employees_left_lables = employees['left'].value_counts().index

# 饼图

plt.pie(employees_left,

labels = employees_left_lables,

autopct = '%3.1f%%',

colors = ['g','y']

)

离职人员占23.8%,不离职人员占76.2%。2.2 满意度与离职情况

# 查看满意度数据

employees['satisfaction_level'].head()

满意度是0-1之间的小数。

'''

将满意度数据进行转换,

0.0-0.4:low

0.4-0.7:medium

0.7-1.0:high

'''

# 第一步:定义函数

def satisfaction_new_levels(b):

a=[]

for values in b:

if values <0.4:

x='low'

elif values<0.7:

x= 'medium'

else:

x= 'high'

a.append(x)

satisfaction = pd.Series(a)

return satisfaction

# 第二步:应用函数

employees['satisfaction_new_level'] = satisfaction_new_levels(employees['satisfaction_level'])

# 查看函数

employees['satisfaction_new_level'].value_counts()高、中、低三种满意度对应的人数

# 画图观察满意度与离职情况之间的关系

fig = plt.figure(figsize = (15,5))

sns.countplot('satisfaction_new_level',

data = employees,

order = ['low','medium','high'],

hue = 'left'

)随着满意度的提升,员工的离职率在降低。2.3 绩效评估与离职情况

# 查看绩效评估数据

employees['last_evaluation'].head()

employees['last_evaluation'] = satisfaction_new_levels(employees['last_evaluation'])

employees['last_evaluation'].head()

# 画图观察绩效评估与离职情况之间的关系

fig = plt.figure(figsize = (6,3))

sns.countplot('last_evaluation',

data = employees,

order = ['low','medium','high'],

hue = 'left'

)

plt.title('绩效评估与员工离职关系')


版权声明:本文为weixin_34644635原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。