python分词读取CSV文件时报错

写了一行这样的代码

df = pd.read_csv('景点评论/' + i, encoding='gb18030')

报错如下:

Traceback (most recent call last):
  File "C:/Users/和静/Desktop/csvdata/analysis_wordcloud.py", line 54, in <module>
    df = pd.read_csv('景点评论/' + i, encoding='gb18030')
  File "D:\anaconda\lib\site-packages\pandas\io\parsers.py", line 610, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "D:\anaconda\lib\site-packages\pandas\io\parsers.py", line 462, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "D:\anaconda\lib\site-packages\pandas\io\parsers.py", line 819, in __init__
    self._engine = self._make_engine(self.engine)
  File "D:\anaconda\lib\site-packages\pandas\io\parsers.py", line 1050, in _make_engine
    return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
  File "D:\anaconda\lib\site-packages\pandas\io\parsers.py", line 1898, in __init__
    self._reader = parsers.TextReader(self.handles.handle, **kwds)
  File "pandas\_libs\parsers.pyx", line 518, in pandas._libs.parsers.TextReader.__cinit__
  File "pandas\_libs\parsers.pyx", line 620, in pandas._libs.parsers.TextReader._get_header
  File "pandas\_libs\parsers.pyx", line 814, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas\_libs\parsers.pyx", line 1943, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'gb18030' codec can't decode byte 0xb5 in position 8: illegal multibyte sequence

去搜了一下,说是要解码的数据不符合期望的类型,在decode 中添加’ignore’参数即可。于是修改成下面这样:

df = pd.read_csv('景点评论/' + i, encoding='gb18030', errors='ignore')

报错如下:

UnicodeDecodeError: 'gb18030' codec can't decode byte 0xb5 in position 8: illegal multibyte sequence

搜了一下说是要去掉加上的errors='ignore'

emmmmmm感觉陷入了死循环,有大神解答一下吗


版权声明:本文为m0_59135228原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接和本声明。