UTF-8, UTF-16 and UTF-32 support

String support in stdlib is currently limited to ASCII, @wclodius2 brought up the issue of supporting UTF-8, UTF-16 and UTF-32 as well:

> FWIW for a "string type" to supplant the intrinsic `character` I would make the internal representation an integer array so that it is straight forward to extend it to represent UCS/Unicode. The integer type could be either `INT8` if a UTF-8 representation is desired, `INT16` for a UTF-16 representation, or INT32 for UTF-32. I would expect the UTF-32 representation would be the most straight-forward to implement and best for East Asian ideographs, UTF-8 would be the most efficient for most European  and Semetic languages, UTF-16 the most efficient for most of the rest of the world.

_Originally posted by @wclodius2 in https://github.com/fortran-lang/stdlib/issues/334#issuecomment-798813426_

> Implementing `to_title` will require more than ASCII. Allowing more than just ASCII will require access to the Unicode character database, https://unicode.org/ucd/. This database will also be required for `to_upper`, `to_lower`, and `reverse` if more than ASCII is involved. This database consists of several tens of megabytes of files, http://www.unicode.org/Public/UCD/latest/, and including it in the Standard Library will be controversial, but requiring users to download and install it on their own will also be controversial. FWIW I have a couple of modules to process the more important files in the database.

_Originally posted by @wclodius2 in https://github.com/fortran-lang/stdlib/issues/335#issuecomment-798815164_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

UTF-8, UTF-16 and UTF-32 support #344

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

UTF-8, UTF-16 and UTF-32 support #344

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions