tr: Illegal byte sequence

I wrote a random password generator shell script, the core of which is this one-liner: dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 The very ugly string 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' is the ALLOWED values. The two counts are replaced by variables, the first 'count=' needing to be a lot bigger than the final '-c <number>' which is the length of the password generated. The size difference is necessary because 'tr' throws away a lot of values. I've never had a problem with this on Linux, but on a Mac under some circumstances we get: tr: Illegal byte sequence My coworker, who's also using the script, always got that error. It seems to come down to locale settings. Mine by default are: $ locale LANG="en_CA.UTF-8" LC_COLLATE="en_CA.UTF-8" LC_CTYPE="en_CA.UTF-8" LC_MESSAGES="en_CA.UTF-8" LC_MONETARY="en_CA.UTF-8" LC_NUMERIC="en_CA.UTF-8" LC_TIME="en_CA.UTF-8" LC_ALL= My co-worker's settings are: LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL="en_US.UTF-8" A reliable fix (so far ...): $ export LC_CTYPE=C $ export LC_ALL=C $ dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 z%V;d9uZfWLTgsT*J]Bz`mAmA I'd really like to understand what the problem is, why 'tr' barfs, and what the 'locale' settings have to do with this. Thanks. (Should anyone have arguments against this as a method of password generation, I'll entertain those too. And yes, I'm aware of 'apg' but it's not readily available for Mac and this is much lighter weight.) -- Giles https://www.gilesorr.com/ gilesorr@gmail.com

I put a base64 after dd, and cut in place of head. Never had any issue... On Wed, Sep 26, 2018, 11:41 Giles Orr via talk <talk@gtalug.org> wrote:
I wrote a random password generator shell script, the core of which is this one-liner:
dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32
The very ugly string 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' is the ALLOWED values. The two counts are replaced by variables, the first 'count=' needing to be a lot bigger than the final '-c <number>' which is the length of the password generated. The size difference is necessary because 'tr' throws away a lot of values.
I've never had a problem with this on Linux, but on a Mac under some circumstances we get:
tr: Illegal byte sequence
My coworker, who's also using the script, always got that error. It seems to come down to locale settings. Mine by default are:
$ locale LANG="en_CA.UTF-8" LC_COLLATE="en_CA.UTF-8" LC_CTYPE="en_CA.UTF-8" LC_MESSAGES="en_CA.UTF-8" LC_MONETARY="en_CA.UTF-8" LC_NUMERIC="en_CA.UTF-8" LC_TIME="en_CA.UTF-8" LC_ALL=
My co-worker's settings are:
LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL="en_US.UTF-8"
A reliable fix (so far ...):
$ export LC_CTYPE=C $ export LC_ALL=C $ dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 z%V;d9uZfWLTgsT*J]Bz`mAmA
I'd really like to understand what the problem is, why 'tr' barfs, and what the 'locale' settings have to do with this. Thanks.
(Should anyone have arguments against this as a method of password generation, I'll entertain those too. And yes, I'm aware of 'apg' but it's not readily available for Mac and this is much lighter weight.)
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

On 2018-09-26 10:43 AM, Giles Orr via talk wrote:
I'd really like to understand what the problem is, why 'tr' barfs, and what the 'locale' settings have to do with this. Thanks.
tr on Mac OS seems to assume input is valid UTF-8 text (if locale is suitably UTF-8). You can set your tr string to something trivial and it still barfs: dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9' | head -c 32 A portable hack might be to use iconv to say that the input is an 8-bit charset: dd if=/dev/urandom bs=1 count=256 2>/dev/null | iconv -f ISO-8859-1 | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 cheers, Stewart

| From: Stewart C. Russell via talk <talk@gtalug.org> | tr on Mac OS seems to assume input is valid UTF-8 text (if locale is | suitably UTF-8). To amplify this, not all byte sequences are valid UTF-8. Random byte sequences will sometimes be invalid. Off the top of my head, I think that the following are invalid: - A 0x80 byte not preceded by a byte with the high bit on - A string ending with a byte with the high bit on - A sequence of more than n bytes with the high bit on (n is something like 4). Each valid character is represented as a sequence of zero or more bytes with the high bit on, not starting with 0x80, followed by a byte without the high bit on. All the non-high bits are concatenated to form the UTF-32 value. Overflow is forbidden. On the other hand, UTF-8 is UTF-8, whether you are in US or CA locale. So the different behaviours between the two UTF-8 locales would seem to be a bug. (In theory, collating sequences could be different so ranges in tr could be different, but I would not see that affecting the ASCII subset you are using in your ranges.) Using C locale should give you 8-bit characters, not UTF-8. So it should work. This (untested) small change to Giles' script should work. dd if=/dev/urandom bs=1 count=256 2>/dev/null | LC_ALL=C tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 LC_ALL might be overkill. I don't know. I'd probably add an echo to put a newline at the end.

On 2018-09-27 12:55 AM, D. Hugh Redelmeier via talk wrote:
On the other hand, UTF-8 is UTF-8, whether you are in US or CA locale. So the different behaviours between the two UTF-8 locales would seem to be a bug.
The Mac I tested this on used the same CA locale as my Linux box. It still failed on the Mac. The issue is more likely to be that Mac OS 'tr' is a BSD version, and the Linux one is Gnu. Mac OS's command line suite is a mish-mash of sources and versions. Their tr is marked BSD, from 2005. Their sed (which also requires valid UTF-8 byte streams) is from FreeBSD circa 2004. Mac OS awk is bwk's "One True awk" (which doesn't seem to care if a byte stream is valid or not), but a couple of versions behind current. Linux distros tend to be more homogeneous. The only difference I've found that's common is that Debian tends to prefer mawk (it's much faster) while others ship with gawk (it has better - but still limited - UTF-8 support). There's still enough difference between the two that it can trip you up on edge-case input data. Or more likely, it's tripped *me* up a couple of times: the rest of you will know what you're doing. cheers, Stewart

On 26/09/18 10:43, Giles Orr via talk wrote:
I wrote a random password generator shell script, the core of which is this one-liner:
dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32
If semi-random 32 (or n) character passwords is what you're after, pwgen should work on Linux and macOS: pwgen -s -y 32 1 f.,,H%+IMpQ-yDG+W'5'+AmjU$CcF*ZK That said, if that's a password for a human, I pity the person who has to type it. What are you using passwords like that for, as opposed to some kind of key based auth? Cheers, Jamon

On 09/27/2018 09:29 AM, Jamon Camisso via talk wrote:
f.,,H%+IMpQ-yDG+W'5'+AmjU$CcF*ZK
That said, if that's a password for a human, I pity the person who has to type it.
What??? You mean you haven't memorized it? ;-)
What are you using passwords like that for, as opposed to some kind of key based auth?
I use that sort of password for WiFi. However, I use the Perfect Passwords from www.grc.com. They have 63 random character strings just for that purpose. Here's an example: "57,%Y9N<Ure}tgrJO[7DS;NElk~/\"mxPyE1BB#,n!so%sl/j6[0JS*R_Db(Yx

On 27/09/18 09:35, James Knott via talk wrote:
On 09/27/2018 09:29 AM, Jamon Camisso via talk wrote:
f.,,H%+IMpQ-yDG+W'5'+AmjU$CcF*ZK
That said, if that's a password for a human, I pity the person who has to type it.
What??? You mean you haven't memorized it? ;-)
What are you using passwords like that for, as opposed to some kind of key based auth?
I use that sort of password for WiFi. However, I use the Perfect Passwords from www.grc.com. They have 63 random character strings just for that purpose.
Here's an example: "57,%Y9N<Ure}tgrJO[7DS;NElk~/\"mxPyE1BB#,n!so%sl/j6[0JS*R_Db(Yx
Doesn't seem worth the hassle for short sequences. GRC sequence: echo '"57,%Y9N<Ure}tgrJO[7DS;NElk~/\"mxPyE1BB#,n!so%sl/j6[0JS*R_Db(Yx' |ent |grep Entropy Entropy = 5.468750 bits per byte. pwgen sequence: pwgen -s -y 63 1 |ent |grep Entropy Entropy = 5.538910 bits per byte. Negligible difference, and FWIW Ted Ts'o wrote pwgen. Cheers, Jamon

On 2018-09-27 09:46 AM, Jamon Camisso via talk wrote:
Negligible difference, and FWIW Ted Ts'o wrote pwgen.
The big difference, though, is that pwgen isn't installed by default under Mac OS¹, and Giles's original approach was intended to be portable. Installing packages is a huge hurdle for many users. Stewart ---- ¹: and, to be fair, neither is it installed by default under Ubuntu.

The Locale indirectly controls the character encoding on the shell. that is the reason why the locale settings have to do with this. I may be wrong, but I believe the shell on the MAC is hardcoded with a specific character encoding, probably 7 bit ascii. Try changing your count to 128. Bill On Wed, Sep 26, 2018 at 10:41 AM Giles Orr via talk <talk@gtalug.org> wrote:
I wrote a random password generator shell script, the core of which is this one-liner:
dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32
The very ugly string 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' is the ALLOWED values. The two counts are replaced by variables, the first 'count=' needing to be a lot bigger than the final '-c <number>' which is the length of the password generated. The size difference is necessary because 'tr' throws away a lot of values.
I've never had a problem with this on Linux, but on a Mac under some circumstances we get:
tr: Illegal byte sequence
My coworker, who's also using the script, always got that error. It seems to come down to locale settings. Mine by default are:
$ locale LANG="en_CA.UTF-8" LC_COLLATE="en_CA.UTF-8" LC_CTYPE="en_CA.UTF-8" LC_MESSAGES="en_CA.UTF-8" LC_MONETARY="en_CA.UTF-8" LC_NUMERIC="en_CA.UTF-8" LC_TIME="en_CA.UTF-8" LC_ALL=
My co-worker's settings are:
LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL="en_US.UTF-8"
A reliable fix (so far ...):
$ export LC_CTYPE=C $ export LC_ALL=C $ dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 z%V;d9uZfWLTgsT*J]Bz`mAmA
I'd really like to understand what the problem is, why 'tr' barfs, and what the 'locale' settings have to do with this. Thanks.
(Should anyone have arguments against this as a method of password generation, I'll entertain those too. And yes, I'm aware of 'apg' but it's not readily available for Mac and this is much lighter weight.)
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

If Mac has recent Bash, then you could probably use $RANDOM variable which picks a number from 0-32767 every time you read it. From top of my head, for i in $(seq 32); do printf '%x' $((RANDOM % 94 + 33)) done | xxd -r -ps That will give you full 94 character range you want. -- William Park <opengeometry@yahoo.ca> On Wed, Sep 26, 2018 at 10:43:46AM -0400, Giles Orr via talk wrote:
I wrote a random password generator shell script, the core of which is this one-liner:
dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32
The very ugly string 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' is the ALLOWED values. The two counts are replaced by variables, the first 'count=' needing to be a lot bigger than the final '-c <number>' which is the length of the password generated. The size difference is necessary because 'tr' throws away a lot of values.
I've never had a problem with this on Linux, but on a Mac under some circumstances we get:
tr: Illegal byte sequence
My coworker, who's also using the script, always got that error. It seems to come down to locale settings. Mine by default are:
$ locale LANG="en_CA.UTF-8" LC_COLLATE="en_CA.UTF-8" LC_CTYPE="en_CA.UTF-8" LC_MESSAGES="en_CA.UTF-8" LC_MONETARY="en_CA.UTF-8" LC_NUMERIC="en_CA.UTF-8" LC_TIME="en_CA.UTF-8" LC_ALL=
My co-worker's settings are:
LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL="en_US.UTF-8"
A reliable fix (so far ...):
$ export LC_CTYPE=C $ export LC_ALL=C $ dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 z%V;d9uZfWLTgsT*J]Bz`mAmA
I'd really like to understand what the problem is, why 'tr' barfs, and what the 'locale' settings have to do with this. Thanks.
(Should anyone have arguments against this as a method of password generation, I'll entertain those too. And yes, I'm aware of 'apg' but it's not readily available for Mac and this is much lighter weight.)
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com
--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

Thanks to everyone that responded - this has been very helpful. So it seems that this _is_ locale-related: Hugh's explanation points out that not all two-byte strings are valid characters under UTF-8, and that would break 'tr'. Thus the change to 'C' fixing the problem. That really helped me understand the problem, thanks. I want a command line solution, so GRC's website doesn't work for me. And I don't think it's a good idea to take a password from another source: it's unlikely GRC stores generated passwords and then tries to hack the associated IP or web browser with it, but isn't it better to do this yourself so only you know the outputted password? As for 'pwgen', it has precisely the same problem as 'apg' - it's not installed by default as Stewart mentioned. Someone asked what these passwords used for. We have to create accounts on many services (most of which don't support any authentication method except passwords) and give those accounts to other people to use. It's my intent that the recipient should change the password to something more to their liking. But many people don't: they just let their web browser memorize the password and then let us reset the password when they "forget" it by changing browsers. At least this way I know they have a relatively random and secure password to start with, usually much better than what they would have changed it to. On Thu, 27 Sep 2018 at 15:30, William Park via talk <talk@gtalug.org> wrote:
If Mac has recent Bash, then you could probably use $RANDOM variable which picks a number from 0-32767 every time you read it. From top of my head, for i in $(seq 32); do printf '%x' $((RANDOM % 94 + 33)) done | xxd -r -ps That will give you full 94 character range you want. -- William Park <opengeometry@yahoo.ca>
I wrote a random password generator shell script, the core of which is
On Wed, Sep 26, 2018 at 10:43:46AM -0400, Giles Orr via talk wrote: this
one-liner:
dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32
The very ugly string 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' is the ALLOWED values. The two counts are replaced by variables, the first 'count=' needing to be a lot bigger than the final '-c <number>' which is the length of the password generated. The size difference is necessary because 'tr' throws away a lot of values.
I've never had a problem with this on Linux, but on a Mac under some circumstances we get:
tr: Illegal byte sequence
My coworker, who's also using the script, always got that error. It seems to come down to locale settings. Mine by default are:
$ locale LANG="en_CA.UTF-8" LC_COLLATE="en_CA.UTF-8" LC_CTYPE="en_CA.UTF-8" LC_MESSAGES="en_CA.UTF-8" LC_MONETARY="en_CA.UTF-8" LC_NUMERIC="en_CA.UTF-8" LC_TIME="en_CA.UTF-8" LC_ALL=
My co-worker's settings are:
LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL="en_US.UTF-8"
A reliable fix (so far ...):
$ export LC_CTYPE=C $ export LC_ALL=C $ dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 z%V;d9uZfWLTgsT*J]Bz`mAmA
I'd really like to understand what the problem is, why 'tr' barfs, and what the 'locale' settings have to do with this. Thanks.
(Should anyone have arguments against this as a method of password generation, I'll entertain those too. And yes, I'm aware of 'apg' but it's not readily available for Mac and this is much lighter weight.)
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com
--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk
Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com

for i in $(seq 32); do printf '%x' $((RANDOM % 94 + 33)) done | xxd -r -ps
Even more portable would be echo -e $(for i in $(seq 32); do printf '\\x%x' $((RANDOM % 94 + 33)); done) -- William Park <opengeometry@yahoo.ca> On Mon, Oct 01, 2018 at 10:16:44AM -0400, Giles Orr wrote:
Thanks to everyone that responded - this has been very helpful.
So it seems that this _is_ locale-related: Hugh's explanation points out that not all two-byte strings are valid characters under UTF-8, and that would break 'tr'. Thus the change to 'C' fixing the problem. That really helped me understand the problem, thanks.
I want a command line solution, so GRC's website doesn't work for me. And I don't think it's a good idea to take a password from another source: it's unlikely GRC stores generated passwords and then tries to hack the associated IP or web browser with it, but isn't it better to do this yourself so only you know the outputted password?
As for 'pwgen', it has precisely the same problem as 'apg' - it's not installed by default as Stewart mentioned.
Someone asked what these passwords used for. We have to create accounts on many services (most of which don't support any authentication method except passwords) and give those accounts to other people to use. It's my intent that the recipient should change the password to something more to their liking. But many people don't: they just let their web browser memorize the password and then let us reset the password when they "forget" it by changing browsers. At least this way I know they have a relatively random and secure password to start with, usually much better than what they would have changed it to.
On Thu, 27 Sep 2018 at 15:30, William Park via talk <talk@gtalug.org> wrote:
If Mac has recent Bash, then you could probably use $RANDOM variable which picks a number from 0-32767 every time you read it. From top of my head, for i in $(seq 32); do printf '%x' $((RANDOM % 94 + 33)) done | xxd -r -ps That will give you full 94 character range you want. -- William Park <opengeometry@yahoo.ca>
I wrote a random password generator shell script, the core of which is
On Wed, Sep 26, 2018 at 10:43:46AM -0400, Giles Orr via talk wrote: this
one-liner:
dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32
The very ugly string 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' is the ALLOWED values. The two counts are replaced by variables, the first 'count=' needing to be a lot bigger than the final '-c <number>' which is the length of the password generated. The size difference is necessary because 'tr' throws away a lot of values.
I've never had a problem with this on Linux, but on a Mac under some circumstances we get:
tr: Illegal byte sequence
My coworker, who's also using the script, always got that error. It seems to come down to locale settings. Mine by default are:
$ locale LANG="en_CA.UTF-8" LC_COLLATE="en_CA.UTF-8" LC_CTYPE="en_CA.UTF-8" LC_MESSAGES="en_CA.UTF-8" LC_MONETARY="en_CA.UTF-8" LC_NUMERIC="en_CA.UTF-8" LC_TIME="en_CA.UTF-8" LC_ALL=
My co-worker's settings are:
LANG="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_CTYPE="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_ALL="en_US.UTF-8"
A reliable fix (so far ...):
$ export LC_CTYPE=C $ export LC_ALL=C $ dd if=/dev/urandom bs=1 count=256 2>/dev/null | tr -dc 'A-Za-z0-9!@$%^&*(){}[]=+-_/?\|~`' | head -c 32 z%V;d9uZfWLTgsT*J]Bz`mAmA
I'd really like to understand what the problem is, why 'tr' barfs, and what the 'locale' settings have to do with this. Thanks.
(Should anyone have arguments against this as a method of password generation, I'll entertain those too. And yes, I'm aware of 'apg' but it's not readily available for Mac and this is much lighter weight.)
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com
--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk
Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com

On 2018-10-02 12:44 AM, William Park via talk wrote:
Even more portable would be
echo -e $(for i in $(seq 32); do printf '\\x%x' $((RANDOM % 94 + 33)); done)
It might be more portable, but bash's $RANDOM comes from a very simple pseudorandom number generator, where Giles's solution uses /dev/urandom. There's also a bit of modulo bias in the selection method. cheers, Stewart
participants (8)
-
Bill Thanis
-
D. Hugh Redelmeier
-
Giles Orr
-
James Knott
-
Jamon Camisso
-
Mauro Souza
-
Stewart C. Russell
-
William Park