
On 2019-04-16 10:36 p.m., D. Hugh Redelmeier via talk wrote:
Java, Python before 3, Javascript, Microsoft's C and C++, the Jolliete filesystem, the NT File system, and many other things use UTF-8.
You meant to say UTF-16? Collation is difficult in anything but the simplistic "ASCIIbetical" case. People expect natural sort orders now, with '10' coming after ' 9' and case being of lesser importance. Once you get outside English*, things get much more delightful. In Welsh, for instance, 'ff' and 'll' sort as different codepoints to f and l, but an initial 'ng' sorts as a 'g' as it's merely an inflected form. Capitalization is a whole different horror and left as an exercise for the reader. Suffice to say, an initial 'ff' (as in the rare Welsh/English surnames ffrench and ffinch) is never capitalized. Stewart *: difficult, because we assimilate everything, accents and all.