Bug: UTF-8 characters across chunk boundaries get corrupted

When the bytes of a UTF-8 character happen to get split across chunks when requesting a resource that is treated as text and not parsed, `StreamDecoder._transform` will corrupt it by applying `iconv.decode` on the chunk and thereby replacing the partial unicode character with a replacement character.

needle v3.0.0, reproduce:

``` javascript
require("needle").get("https://data.interaktiv.cloud.funkedigital.de/wahl/example/nw/erg_05994.xml", { parse: false }, function(err, resp, data){
	console.log(data.split('\n')[4379]); // line 4380
});
```

Suggested solutions: 

- Use `iconv.decodeStream` instead of `iconv.decode`
- Complete chunk collection first, then use `iconv.decode` on complete body
- Check if the last byte of a chunk is the first byte of a UTF-8-Character, and if so remove it from the current chunk and prepend it to the next chunk.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug: UTF-8 characters across chunk boundaries get corrupted #374

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: UTF-8 characters across chunk boundaries get corrupted #374

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions