Xin chào mọi người,
Em có thể hỏi chút về đọc file csv encoding TCVN3 trong python với thư viện pandas như thế nào được không ạ.
Em có tìm google mà không thể tìm thấy hướng dẫn.
Em chỉ đơn thuần dùng như sau:
import os
import pandas as pd
pd.read_csv('filename.csv')
thrown ra cái này:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 529, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 749, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 1: invalid start byte
Em thử:
pd.read_csv('filename.csv', encoding='utf8')
thì thrown:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 529, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 749, in pandas._libs.parsers.TextReader._get_header
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 1: invalid start byte
>>> pd.read_csv("2500_dp01.csv", encoding='utf8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 676, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 448, in _read
parser = TextFileReader(fp_or_buf, **kwds)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 880, in __init__
self._make_engine(self.engine)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1114, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "C:\Program Files (x86)\Python38-32\lib\site-packages\pandas\io\parsers.py", line 1891, in __init__
self._reader = parsers.TextReader(src, **kwds)
File "pandas\_libs\parsers.pyx", line 529, in pandas._libs.parsers.TextReader.__cinit__
File "pandas\_libs\parsers.pyx", line 720, in pandas._libs.parsers.TextReader._get_header
File "pandas\_libs\parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas\_libs\parsers.pyx", line 2063, in pandas._libs.parsers.raise_parser_error
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb7 in position 1: invalid start byte
Mong có thể nhận được hướng dẫn từ anh chị ạ, em tìm thì đều không có hướng dẫn cho python, chỉ có cho các ngôn ngữ khác.
Em cảm ơn anh chị.