[Nix-dev] How to get correct length of a string containing non-ascii characters

zimbatm zimbatm at zimbatm.com
Wed Jan 13 14:34:05 CET 2016


Related to that, the suckless conferent talk on UTF-8[1] was pretty
interesting. The complexity of Unicode and all that goes with it is pretty
crazy.
That being said the libutf8 from the same guys seem to be pretty decent and
takes sane defaults to a lot of these questions.

[1] http://suckless.org/conference/ (last on the page)

On Tue, 12 Jan 2016 at 19:10 Christian Theune <ct at flyingcircus.io> wrote:

> Hi,
>
> there are sane approaches to dealing with Strings (encoded) vs. Text
> (decoded) properly. We might not be able to do this at the moment, but I
> find Python (3)’s byte/text model quite sane.
>
> It might be too much for us to support this with a quick fix, but we
> should keep that on the radar, I guess.
>
> Christian
>
> On 12 Jan 2016, at 18:26, Jookia <166291 at gmail.com> wrote:
>
> On Mon, Jan 11, 2016 at 11:29:37PM +0000, Erik Rybakken wrote:
>
> Hi,
>
> In nix, when finding the length of a string containing non-ascii
> characters,
> the number of bytes in the representation is returned, instead of the
> actual
> number of characters:
>
> nix-repl> builtins.stringLength "å"
> 2
>
>
> Is there any way to get the number of characters instead, or does this
> require changes in the core language?
>
>
> It's probably best to leave it like it is now. A string's length is two if
> that's the number of bytes it uses. You'd have to start asking some hard
> questions if you want other behaviour like:
>
> Why do you want the string's length? Do you want to truncate it? What if
> that
> creates an invalid sequence of characters somehow? Do you want to compare
> lengths or equality? Should text be normalized somehow? Which way?
>
> What should the base 'unit' be for a string? A code point? A character? A
> glyph? A grapheme? How would this be implemented?
>
> Best Regards,
> Erik Rybakken
>
>
> Cheers,
> Jookia.
> _______________________________________________
> nix-dev mailing list
> nix-dev at lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
>
> --
> Christian Theune · ct at flyingcircus.io · +49 345 219401 0
> Flying Circus Internet Operations GmbH · http://flyingcircus.io
> Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
> HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian.
> Zagrodnick
>
> _______________________________________________
> nix-dev mailing list
> nix-dev at lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.science.uu.nl/pipermail/nix-dev/attachments/20160113/46663122/attachment.html 


More information about the nix-dev mailing list