[Nix-dev] How to get correct length of a string containing non-ascii characters
Christian Theune
ct at flyingcircus.io
Tue Jan 12 20:10:28 CET 2016
Hi,
there are sane approaches to dealing with Strings (encoded) vs. Text (decoded) properly. We might not be able to do this at the moment, but I find Python (3)’s byte/text model quite sane.
It might be too much for us to support this with a quick fix, but we should keep that on the radar, I guess.
Christian
> On 12 Jan 2016, at 18:26, Jookia <166291 at gmail.com> wrote:
>
> On Mon, Jan 11, 2016 at 11:29:37PM +0000, Erik Rybakken wrote:
>> Hi,
>>
>> In nix, when finding the length of a string containing non-ascii characters,
>> the number of bytes in the representation is returned, instead of the actual
>> number of characters:
>>
>>> nix-repl> builtins.stringLength "å"
>>> 2
>>
>> Is there any way to get the number of characters instead, or does this
>> require changes in the core language?
>
> It's probably best to leave it like it is now. A string's length is two if
> that's the number of bytes it uses. You'd have to start asking some hard
> questions if you want other behaviour like:
>
> Why do you want the string's length? Do you want to truncate it? What if that
> creates an invalid sequence of characters somehow? Do you want to compare
> lengths or equality? Should text be normalized somehow? Which way?
>
> What should the base 'unit' be for a string? A code point? A character? A
> glyph? A grapheme? How would this be implemented?
>
>> Best Regards,
>> Erik Rybakken
>
> Cheers,
> Jookia.
> _______________________________________________
> nix-dev mailing list
> nix-dev at lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev
--
Christian Theune · ct at flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.science.uu.nl/pipermail/nix-dev/attachments/20160112/ac2147ab/attachment.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://lists.science.uu.nl/pipermail/nix-dev/attachments/20160112/ac2147ab/attachment.bin
More information about the nix-dev
mailing list