| From: Stewart C. Russell via talk <talk@gtalug.org> | On 2019-04-16 10:36 p.m., D. Hugh Redelmeier via talk wrote: | > | > Java, Python before 3, Javascript, Microsoft's C and C++, the Jolliete | > filesystem, the NT File system, and many other things use UTF-8. | | You meant to say UTF-16? Yes. Thanks! [the rest is for orthographic nerds only] | Collation is difficult Yes. And even just string equality. And there are security implications here. | Once you get outside English*, | things get much more delightful. | *: difficult, because we assimilate everything, accents and all. Including Scots, accent and all :-) | In Welsh, for instance, 'ff' and 'll' | sort as different codepoints to f and l, but an initial 'ng' sorts as a | 'g' as it's merely an inflected form. Capitalization is a whole | different horror and left as an exercise for the reader. Suffice to say, | an initial 'ff' (as in the rare Welsh/English surnames ffrench and | ffinch) is never capitalized. And that's not all ffolkes! But Jasper Fforde, apparently. As far as I know, the idea of upper-case doesn't apply to most languages. Of course other languages have distinctions that we're not used to. Think of all the forms of eacj letter in Arabic. One UNICODE surprise: it has a capital scharfes S. Wikipedia says: In 2017, the Council for German Orthography ultimately adopted capital ß (ẞ) into German orthography, ending a long orthographic debate.[4] <https://en.wikipedia.org/wiki/%C3%9F> In English, certain "s" letters were written in a way that looks like an f to us (but the cross is missing or different). I remember thinking "King Charles the Fecond" was a witty pun (Spring Thaw, 1967). This seems to be related to the scharfes S. <https://en.wikipedia.org/wiki/Long_s> Have a look at the contrasting Britanica pages. A google for "charles the fecond" gets me lots of books.google.* hits for books that have been OCRed incorrectly: the long s has been taken as an f.