nrows python_如何从熊猫HDF存储器读取nrows?

我想做什么?

pd.read_csv(... nrows=###)可以读取文件的前nrow。我希望在使用pd.read_hdf(...)时也这样做。在

有什么问题?

我被documentation搞糊涂了。start和{}看起来像我需要的,但是当我尝试它时,返回一个ValueError。我尝试的第二件事是使用nrows=10,认为它可能是允许的**kwargs。当我这样做时,不会抛出错误,但也会返回完整的数据集,而不是仅仅返回10行。在

问题:如何正确地从HDF文件中读取较小的行子集?(编辑:不必先把整件事读入内存!)在

以下是我的互动环节:>>> import pandas as pd

>>> df = pd.read_hdf('storage.h5')

Traceback (most recent call last):

File "", line 1, in

df = pd.read_hdf('storage.h5')

File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 367, in read_hdf

raise ValueError('key must be provided when HDF5 file '

ValueError: key must be provided when HDF5 file contains multiple datasets.

>>> import h5py

>>> f = h5py.File('storage.h5', mode='r')

>>> list(f.keys())[0]

'table'

>>> f.close()

>>> df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)

Traceback (most recent call last):

File "", line 1, in

df = pd.read_hdf('storage.h5', key='table', start=0, stop=10)

File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 370, in read_hdf

return store.select(key, auto_close=auto_close, **kwargs)

File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 740, in select

return it.get_result()

File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 1447, in get_result

results = self.func(self.start, self.stop, where)

File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 733, in func

columns=columns, **kwargs)

File "C:\Python35\lib\site-packages\pandas\io\pytables.py", line 2890, in read

return self.obj_type(BlockManager(blocks, axes))

File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 2795, in __init__

self._verify_integrity()

File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 3006, in _verify_integrity

construction_error(tot_items, block.shape[1:], self.axes)

File "C:\Python35\lib\site-packages\pandas\core\internals.py", line 4280, in construction_error

passed, implied))

ValueError: Shape of passed values is (614, 593430), indices imply (614, 10)

>>> df = pd.read_hdf('storage.h5', key='table', nrows=10)

>>> df.shape

(593430, 614)

编辑:

我刚尝试使用where:

^{pr2}$

收到一个TypeError,该错误指示一个固定的格式存储(默认的format值df.to_hdf(...)):TypeError: cannot pass a where specification when reading from a

Fixed format store. this store must be selected in its entirety

这是否意味着如果格式是固定格式,我就不能选择行的子集?


版权声明:本文为weixin_31143391原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。