Closed
Description
When the bytes of a UTF-8 character happen to get split across chunks when requesting a resource that is treated as text and not parsed, StreamDecoder._transform
will corrupt it by applying iconv.decode
on the chunk and thereby replacing the partial unicode character with a replacement character.
needle v3.0.0, reproduce:
require("needle").get("https://data.interaktiv.cloud.funkedigital.de/wahl/example/nw/erg_05994.xml", { parse: false }, function(err, resp, data){
console.log(data.split('\n')[4379]); // line 4380
});
Suggested solutions:
- Use
iconv.decodeStream
instead oficonv.decode
- Complete chunk collection first, then use
iconv.decode
on complete body - Check if the last byte of a chunk is the first byte of a UTF-8-Character, and if so remove it from the current chunk and prepend it to the next chunk.
Metadata
Metadata
Assignees
Labels
No labels