学习记录648@python之pandas周期转化resample函数的使用

需求

对于金融数据,我们常常需要改变数据的周期,比如将分钟数据的周期变为日周期的数据。
以下以股票5分钟数据为准,将其转化为15分钟的数据数据。

代码

import numpy as np
import pandas as pd
import tushare as ts
# ts.set_token('your token here')
ts.set_token('你的token')
df=ts.get_k_data('399300',ktype='5')
df.head(20)
本接口即将停止更新,请尽快使用Pro版接口:https://tushare.pro/document/2
dateopenclosehighlowvolumeamountturnoverratiocode
02022-09-13 13:454117.554114.564117.944113.181475506.0{}0.5166399300
12022-09-13 13:504114.764115.084115.264113.651327791.0{}0.4649399300
22022-09-13 13:554114.904112.244115.174112.031426425.0{}0.4994399300
32022-09-13 14:004112.214114.064114.514111.481484672.0{}0.5198399300
42022-09-13 14:054113.894115.384117.784112.521606878.0{}0.5626399300
52022-09-13 14:104115.214112.844115.214112.381248984.0{}0.4373399300
62022-09-13 14:154112.984115.004115.024112.441366021.0{}0.4783399300
72022-09-13 14:204115.054112.924115.054112.671247732.0{}0.4369399300
82022-09-13 14:254113.084113.664113.894112.821240570.0{}0.4343399300
92022-09-13 14:304113.434111.724113.684111.431648447.0{}0.5771399300
102022-09-13 14:354111.724110.994112.884110.071965279.0{}0.6881399300
112022-09-13 14:404110.744111.074111.124109.191757416.0{}0.6153399300
122022-09-13 14:454111.274110.554112.614110.471792504.0{}0.6276399300
132022-09-13 14:504110.434110.674111.574110.302166563.0{}0.7585399300
142022-09-13 14:554110.654109.184110.654108.462648063.0{}0.9271399300
152022-09-13 15:004109.264111.114111.114108.992289574.0{}0.8016399300
162022-09-14 09:354058.044074.284076.784055.1311682915.0{}4.0904399300
172022-09-14 09:404073.944078.504079.084069.185309691.0{}1.8590399300
182022-09-14 09:454078.184071.314079.594070.983616729.0{}1.2663399300
192022-09-14 09:504070.474071.954074.954066.753170348.0{}1.1100399300
# 将字符串转化为datetime
df['date']=pd.to_datetime(df['date'])

# 设置date为index
df=df.set_index('date')

df_new=pd.DataFrame()

# 将5分钟转化为15分钟数据,相当于将3个数据点合并为1个数据点
# 3个数据点中,取最前面的作为open
df_new['open']=df['open'].resample('15T').first()

# 3个数据点中,取最大的作为high
df_new['high']=df['high'].resample('15T').max()

# 3个数据点中,取最小的作为low
df_new['low']=df['low'].resample('15T').min()

# 3个数据点中,取最后的作为close
df_new['close']=df['close'].resample('15T').last()

# 3个数据点加总
df_new['volume']=df['volume'].resample('15T').sum()

df_new.head(20)
openhighlowclosevolume
date
2022-09-13 13:45:004117.554117.944112.034112.244229722.0
2022-09-13 14:00:004112.214117.784111.484112.844340534.0
2022-09-13 14:15:004112.984115.054112.444113.663854323.0
2022-09-13 14:30:004113.434113.684109.194111.075371142.0
2022-09-13 14:45:004111.274112.614108.464109.186607130.0
2022-09-13 15:00:004109.264111.114108.994111.112289574.0
2022-09-13 15:15:00NaNNaNNaNNaN0.0
2022-09-13 15:30:00NaNNaNNaNNaN0.0
2022-09-13 15:45:00NaNNaNNaNNaN0.0
2022-09-13 16:00:00NaNNaNNaNNaN0.0
2022-09-13 16:15:00NaNNaNNaNNaN0.0
2022-09-13 16:30:00NaNNaNNaNNaN0.0
2022-09-13 16:45:00NaNNaNNaNNaN0.0
2022-09-13 17:00:00NaNNaNNaNNaN0.0
2022-09-13 17:15:00NaNNaNNaNNaN0.0
2022-09-13 17:30:00NaNNaNNaNNaN0.0
2022-09-13 17:45:00NaNNaNNaNNaN0.0
2022-09-13 18:00:00NaNNaNNaNNaN0.0
2022-09-13 18:15:00NaNNaNNaNNaN0.0
2022-09-13 18:30:00NaNNaNNaNNaN0.0
# 会存在NaN值,所以要删除
df_new=df_new.dropna()
df_new.head()
openhighlowclosevolume
date
2022-09-13 13:45:004117.554117.944112.034112.244229722.0
2022-09-13 14:00:004112.214117.784111.484112.844340534.0
2022-09-13 14:15:004112.984115.054112.444113.663854323.0
2022-09-13 14:30:004113.434113.684109.194111.075371142.0
2022-09-13 14:45:004111.274112.614108.464109.186607130.0

版权声明:本文为weixin_44663675原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。