Closed
Description
artagger fails to read Thai.RDR file on Windows, due to Unicode error (can't decode byte 0x8d in position 417)
https://ci.appveyor.com/project/wannaphongcom/pythainlp-9y1ch/builds/20163741
======================================================================
ERROR: test_pos_tag (tests.TestUM)
----------------------------------------------------------------------
Traceback (most recent call last):
File "c:\projects\pythainlp-9y1ch\tests\__init__.py", line 277, in test_pos_tag
str(type(pos_tag(word_tokenize("ผมรักคุณ"), engine="artagger"))),
File "c:\projects\pythainlp-9y1ch\pythainlp\tag\__init__.py", line 36, in pos_tag
return _tag(words, corpus=corpus)
File "c:\projects\pythainlp-9y1ch\pythainlp\tag\__init__.py", line 29, in _tag
words = Tagger().tag(" ".join(text))
File "C:\Python36\lib\site-packages\artagger\__init__.py", line 43, in __init__
self.load_model()
File "C:\Python36\lib\site-packages\artagger\__init__.py", line 47, in load_model
"rdr": open(os.path.join(os.path.dirname(__file__), "Models", "POS", "Thai.RDR"), "r").readlines(),
File "C:\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 417: character maps to <undefined>
----------------------------------------------------------------------
The same code works well in Travis test (Linux).
Metadata
Metadata
Assignees
Labels
No labels