[Nix-dev] How to get correct length of a string containing non-ascii characters

Jookia 166291 at gmail.com
Tue Jan 12 18:26:20 CET 2016


On Mon, Jan 11, 2016 at 11:29:37PM +0000, Erik Rybakken wrote:
> Hi,
>
> In nix, when finding the length of a string containing non-ascii characters,
> the number of bytes in the representation is returned, instead of the actual
> number of characters:
>
> > nix-repl> builtins.stringLength "å"
> > 2
>
> Is there any way to get the number of characters instead, or does this
> require changes in the core language?

It's probably best to leave it like it is now. A string's length is two if
that's the number of bytes it uses. You'd have to start asking some hard
questions if you want other behaviour like:

Why do you want the string's length? Do you want to truncate it? What if that
creates an invalid sequence of characters somehow? Do you want to compare
lengths or equality? Should text be normalized somehow? Which way?

What should the base 'unit' be for a string? A code point? A character? A
glyph? A grapheme? How would this be implemented?

> Best Regards,
> Erik Rybakken

Cheers,
Jookia.


More information about the nix-dev mailing list