摘要:数据集大学毕业生收入下载地址,本文以绘制直方图为主。整型全年全职在岗人数。浮点型收入的百分位数。各大类专业就业率图示结论相对来说,由于计算机的发展前景,计算机与数学类的就业率较高。
下载地址,本文以绘制直方图为主。
字段名称 | 字段类型 | 字段说明 |
---|---|---|
Major_code | 整型 | 专业代码。 |
Major | 字符型 | 专业名称。 |
Major_category | 字符型 | 专业所属目录。 |
Total | 整型 | 总人数。 |
Employed | 整型 | 就业人数。 |
Employed_full_time_year_round | 整型 | 全年全职在岗人数。 |
Unemployed | 整型 | 失业人数。 |
Unemployment_rate | 浮点型 | 失业率。 |
Median | 整型 | 收入的中位数。 |
P25th | 整型 | 收入的25百分位数。 |
P75th | 浮点型 | 收入的75百分位数。 |
import numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport osimport warningswarnings.filterwarnings("ignore")
df = pd.read_csv("大学毕业生收入数据集.csv")
print(df.head())
结果
:
Major_code Major ... P25th P75th0 1100 GENERAL AGRICULTURE ... 34000 80000.01 1101 AGRICULTURE PRODUCTION AND MANAGEMENT ... 36000 80000.02 1102 AGRICULTURAL ECONOMICS ... 40000 98000.03 1103 ANIMAL SCIENCES ... 30000 72000.04 1104 FOOD SCIENCE ... 38500 90000.0
df.info()
结果
:
RangeIndex: 173 entries, 0 to 172Data columns (total 11 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Major_code 173 non-null int64 1 Major 173 non-null object 2 Major_category 173 non-null object 3 Total 173 non-null int64 4 Employed 173 non-null int64 5 Employed_full_time_year_round 173 non-null int64 6 Unemployed 173 non-null int64 7 Unemployment_rate 173 non-null float64 8 Median 173 non-null int64 9 P25th 173 non-null int64 10 P75th 173 non-null float64dtypes: float64(2), int64(7), object(2)
print(df.duplicated().sum())
结果
:
0
print(df.isnull().sum())
结果
:
Major_code 0Major 0Major_category 0Total 0Employed 0Employed_full_time_year_round 0Unemployed 0Unemployment_rate 0Median 0P25th 0P75th 0dtype: int64
describe = df.describe()print(describe)
结果
:
Major_code Total ... P25th P75thcount 173.000000 1.730000e+02 ... 173.000000 173.000000mean 3879.815029 2.302566e+05 ... 38697.109827 82506.358382std 1687.753140 4.220685e+05 ... 9414.524761 20805.330126min 1100.000000 2.396000e+03 ... 24900.000000 45800.00000025% 2403.000000 2.428000e+04 ... 32000.000000 70000.00000050% 3608.000000 7.579100e+04 ... 36000.000000 80000.00000075% 5503.000000 2.057630e+05 ... 42000.000000 95000.000000max 6403.000000 3.123510e+06 ... 78000.000000 210000.000000[8 rows x 9 columns]
可在变量视图中查看
describe
Major_category_counts=df["Major_category"].value_counts()print(Major_category_counts)rects = plt.bar(range(1,17),Major_category_counts);for rect in rects: #rects 是三根柱子的集合 height = rect.get_height() plt.text(rect.get_x() + rect.get_width() / 2, height, str(height), size=12, ha="center", va="bottom")interval = ["Engineering","Education","Humanities & Liberal Arts","Biology & Life Science","Business","Health","Computers & Mathematics","Agriculture & Natural Resources","Physical Sciences","Social Science","Psychology & Social Work","Arts","Industrial Arts & Consumer Services","Law & Public Policy","Communications & Journalism","Interdisciplinary"]plt.xticks(range(1,17),interval,rotation=90);plt.title("Number of Branches by Major Category")plt.ylabel("Counts")plt.show()
结果
:
Engineering 29Education 16Humanities & Liberal Arts 15Biology & Life Science 14Business 13Health 12Computers & Mathematics 11Agriculture & Natural Resources 10Physical Sciences 10Social Science 9Psychology & Social Work 9Arts 8Industrial Arts & Consumer Services 7Law & Public Policy 5Communications & Journalism 4Interdisciplinary 1Name: Major_category, dtype: int64
图示
:
结论
:
由于机械类专业发展历史悠久,故相对来说机械类专业分支数相较其他大类专业要多
averageMoney = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Median"][j] averageMoney.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageMoney);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Annual salary by Major Category")plt.ylabel("Moneys")plt.show()
图示
:
结论
:
由于机械类专业与人工智能、自动化等领域相关,故平均工资比较高;计算机与数学类专业发展前景很好,但是小公司工资普遍不高,大公司工资相对来说较高。
averageUnemployRate = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Unemployment_rate"][j] averageUnemployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageUnemployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Unemployment Rate by Major Category")plt.ylabel("Rate")plt.show()
图示
:
结论
:
艺术类专业由于可变动性特别大,加上对人才的要求相对来说较为苛刻,故失业率较高。
averageEmployRate = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Employed"][j] / df["Total"][j] averageEmployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageEmployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Employment Rate by Major Category")plt.ylabel("Rate")plt.show()
图示
:
结论
:
相对来说,由于计算机的发展前景,计算机与数学类的就业率较高。
averageFullTimeRate = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Employed_full_time_year_round"][j] / df["Employed"][j] averageFullTimeRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageFullTimeRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Full-Time Rate by Major Category")plt.ylabel("Rate")plt.show()
图示
:
averageNum = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Total"][j] averageNum.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageNum);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Total Numbers by Major Category")plt.ylabel("Counts")plt.show()
图示
:
EUratio = []for i in range(len(interval)): EUratio.append(averageEmployRate[i]/averageUnemployRate[i])plt.bar(range(1,17),EUratio);plt.xticks(range(1,17),interval,rotation=90);plt.title("Employment-Unemployment Ratio by Major Category")plt.ylabel("Ratio")plt.show()
图示
:
结论
:
相对来说,农业就业的门槛低,就业率高的同时失业率低。
# 导包import numpy as npimport matplotlib.pyplot as pltimport pandas as pdimport osimport warningswarnings.filterwarnings("ignore")# 读取数据df = pd.read_csv("大学毕业生收入数据集.csv")# 预览数据print(df.head())# 规范字段名称(本数据集已经较为规范)# 查看基本信息df.info()# 查看重复值print(df.duplicated().sum())# 查看缺失值print(df.isnull().sum())# 查看数据集描述性信息describe = df.describe()print(describe)# 统计表中每个专业种类(Major_category)的个数Major_category_counts=df["Major_category"].value_counts()print(Major_category_counts)rects = plt.bar(range(1,17),Major_category_counts);for rect in rects: #rects 是三根柱子的集合 height = rect.get_height() plt.text(rect.get_x() + rect.get_width() / 2, height, str(height), size=12, ha="center", va="bottom")interval = ["Engineering","Education","Humanities & Liberal Arts","Biology & Life Science","Business","Health","Computers & Mathematics","Agriculture & Natural Resources","Physical Sciences","Social Science","Psychology & Social Work","Arts","Industrial Arts & Consumer Services","Law & Public Policy","Communications & Journalism","Interdisciplinary"]plt.xticks(range(1,17),interval,rotation=90);plt.title("Number of Branches by Major Category")plt.ylabel("Counts")plt.show()# 对各大类专业收入作统计并作图averageMoney = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Median"][j] averageMoney.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageMoney);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Annual salary by Major Category")plt.ylabel("Moneys")plt.show()# 对各大类专业失业率作统计并作图averageUnemployRate = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Unemployment_rate"][j] averageUnemployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageUnemployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Unemployment Rate by Major Category")plt.ylabel("Rate")plt.show()# 对各大类专业就业率作统计并作图averageEmployRate = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Employed"][j] / df["Total"][j] averageEmployRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageEmployRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Employment Rate by Major Category")plt.ylabel("Rate")plt.show()# 对各大类专业全年全职在岗率作统计并作图(没有早退的)averageFullTimeRate = []for i in range(len(interval)): sum = 0 for j in range(173): if df["Major_category"][j] == interval[i]: sum = sum + df["Employed_full_time_year_round"][j] / df["Employed"][j] averageFullTimeRate.append(sum/Major_category_counts[i])plt.bar(range(1,17),averageFullTimeRate);plt.xticks(range(1,17),interval,rotation=90);plt.title("Average Full-Time Rate
文章版权归作者所有,未经允许请勿转载,若此文章存在违规行为,您可以联系管理员删除。
转载请注明本文地址:https://www.ucloud.cn/yun/121287.html
摘要:中国的行业的蓬勃发展,蛋糕之大,让所有行业从业者的收入总体处于行业前列,可比拟的只有金融行业一个不创造财富,只分配财富的行业。每天收到十几份简历,却招聘不到合适的人。很多小伙伴冷门专业,普通学校,毕业了工作几年了月薪还是几千块,这就是现状。 中国的IT行业因为有人口福...
摘要:我想说的是,有时候选择比努力更重要。未来职业的选择是我们在毕业后面对的人生中第一次重大选择,它与我们未来几十年的人生走向有着莫大关系。就这样,几年过去了,几十年又过去了,同龄人之间的差距便会凸显出来越来越大。 大家都知道程序员这个行业,目前是站在风口上的,薪资待遇可以说是高于其他多数行业,但...
摘要:根据公司的调查,计算机科学专业在所有专业的前五年职业生涯的基础薪资中位数中占据第一位,约为万美元。市场现状,产品背景十三五规划对应年,大方向是加快壮大战略性新兴产业,打造经济社会发展新引擎。 极客时间是极客邦科技出品的IT类知识服务产品,内容包含专栏订阅、极客新闻、热点专题、直播、视频和音频等多种形式的知识服务。极客时间服...
摘要:根据公司的调查,计算机科学专业在所有专业的前五年职业生涯的基础薪资中位数中占据第一位,约为万美元。市场现状,产品背景十三五规划对应年,大方向是加快壮大战略性新兴产业,打造经济社会发展新引擎。 百度网盘提取码:u6C4 极客时间是极客邦科技出品的IT类知识服务产品,内容包含专栏订阅、极客新闻、热点专题...
摘要:作为十几年的老开发者,今天我来分享一下,我个人认为的大学计算机相关专业该怎么学,希望你们的四年能够不负年华。粉丝专属福利九关于考研有能力去考研的,我建议去尝试一下考研,理由有以下几点第一,毕业就工作的人,前三年还处于摸索和定性的阶段。 ...
阅读 3535·2021-11-18 13:20
阅读 2688·2021-10-15 09:40
阅读 1712·2021-10-11 10:58
阅读 2063·2021-09-27 13:36
阅读 2534·2021-09-07 10:06
阅读 1828·2021-08-11 11:21
阅读 1406·2019-08-29 17:04
阅读 2061·2019-08-29 14:06