[Nix-dev] How to get correct length of a string containing non-ascii characters

Christian Theune ct at flyingcircus.io
Tue Jan 12 20:10:28 CET 2016


Hi,

there are sane approaches to dealing with Strings (encoded) vs. Text (decoded) properly. We might not be able to do this at the moment, but I find Python (3)’s byte/text model quite sane.

It might be too much for us to support this with a quick fix, but we should keep that on the radar, I guess.

Christian

> On 12 Jan 2016, at 18:26, Jookia <166291 at gmail.com> wrote:
> 
> On Mon, Jan 11, 2016 at 11:29:37PM +0000, Erik Rybakken wrote:
>> Hi,
>> 
>> In nix, when finding the length of a string containing non-ascii characters,
>> the number of bytes in the representation is returned, instead of the actual
>> number of characters:
>> 
>>> nix-repl> builtins.stringLength "å"
>>> 2
>> 
>> Is there any way to get the number of characters instead, or does this
>> require changes in the core language?
> 
> It's probably best to leave it like it is now. A string's length is two if
> that's the number of bytes it uses. You'd have to start asking some hard
> questions if you want other behaviour like:
> 
> Why do you want the string's length? Do you want to truncate it? What if that
> creates an invalid sequence of characters somehow? Do you want to compare
> lengths or equality? Should text be normalized somehow? Which way?
> 
> What should the base 'unit' be for a string? A code point? A character? A
> glyph? A grapheme? How would this be implemented?
> 
>> Best Regards,
>> Erik Rybakken
> 
> Cheers,
> Jookia.
> _______________________________________________
> nix-dev mailing list
> nix-dev at lists.science.uu.nl
> http://lists.science.uu.nl/mailman/listinfo/nix-dev

--
Christian Theune · ct at flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · http://flyingcircus.io
Forsterstraße 29 · 06112 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian. Theune, Christian. Zagrodnick

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.science.uu.nl/pipermail/nix-dev/attachments/20160112/ac2147ab/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://lists.science.uu.nl/pipermail/nix-dev/attachments/20160112/ac2147ab/attachment.bin 


More information about the nix-dev mailing list