
On 2018-09-27 12:55 AM, D. Hugh Redelmeier via talk wrote:
On the other hand, UTF-8 is UTF-8, whether you are in US or CA locale. So the different behaviours between the two UTF-8 locales would seem to be a bug.
The Mac I tested this on used the same CA locale as my Linux box. It still failed on the Mac. The issue is more likely to be that Mac OS 'tr' is a BSD version, and the Linux one is Gnu. Mac OS's command line suite is a mish-mash of sources and versions. Their tr is marked BSD, from 2005. Their sed (which also requires valid UTF-8 byte streams) is from FreeBSD circa 2004. Mac OS awk is bwk's "One True awk" (which doesn't seem to care if a byte stream is valid or not), but a couple of versions behind current. Linux distros tend to be more homogeneous. The only difference I've found that's common is that Debian tends to prefer mawk (it's much faster) while others ship with gawk (it has better - but still limited - UTF-8 support). There's still enough difference between the two that it can trip you up on edge-case input data. Or more likely, it's tripped *me* up a couple of times: the rest of you will know what you're doing. cheers, Stewart