Skip to content

[artagger] UnicodeDecodeError when read Thai.RDR file on Windows #155

Closed
@bact

Description

@bact

artagger fails to read Thai.RDR file on Windows, due to Unicode error (can't decode byte 0x8d in position 417)
https://ci.appveyor.com/project/wannaphongcom/pythainlp-9y1ch/builds/20163741

======================================================================
ERROR: test_pos_tag (tests.TestUM)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "c:\projects\pythainlp-9y1ch\tests\__init__.py", line 277, in test_pos_tag
    str(type(pos_tag(word_tokenize("ผมรักคุณ"), engine="artagger"))),
  File "c:\projects\pythainlp-9y1ch\pythainlp\tag\__init__.py", line 36, in pos_tag
    return _tag(words, corpus=corpus)
  File "c:\projects\pythainlp-9y1ch\pythainlp\tag\__init__.py", line 29, in _tag
    words = Tagger().tag(" ".join(text))
  File "C:\Python36\lib\site-packages\artagger\__init__.py", line 43, in __init__
    self.load_model()
  File "C:\Python36\lib\site-packages\artagger\__init__.py", line 47, in load_model
    "rdr": open(os.path.join(os.path.dirname(__file__), "Models", "POS", "Thai.RDR"), "r").readlines(),
  File "C:\Python36\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8d in position 417: character maps to <undefined>
----------------------------------------------------------------------

The same code works well in Travis test (Linux).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions