Debian has suddenly become unstable
I have a Debian 12 system that's my daily driver. In the last two days, it crashed twice when I was away from the keyboard and nothing was happening (around the same time of day now that I think about it). The system has previously been very stable, usually up for a month at a time with reboots only to pick up new kernels. I should note that when I turned it on and ran upgrades on Monday after a week away, it upgraded a lot of packages for Debian release 12.2. I'm not great at debugging Linux crashes. The `dmesg` command is useless, as it only shows the log since the last boot. So I turned to /var/log/syslog. What I noticed was this, the only line of consequence about a millisecond before the reboot: 2023-10-10T11:36:23.839046-04:00 sli7d systemd-modules-load[399]: Inserted module 'lp' I don't have a printer, and I hadn't just done a "print-to-PDF" or anything like that - the machine had been idle for a couple hours. This morning it crashed again, and milliseconds before the crash I found these (again, the machine was idle when this happened): 2023-10-11T12:20:54.647048-04:00 sli7d systemd-modules-load[382]: Inserted module 'lp' 2023-10-11T12:20:54.647254-04:00 sli7d systemd-modules-load[382]: Inserted module 'ppdev' 2023-10-11T12:20:54.647280-04:00 sli7d systemd-modules-load[382]: Inserted module 'parport_pc' 2023-10-11T12:20:54.647290-04:00 sli7d lvm[372]: 3 logical volume(s) in volume group "primary" monitored 2023-10-11T12:20:54.647302-04:00 sli7d systemd[1]: Starting systemd-journal-flush.service - Flush Journal to Persistent Storage... 2023-10-11T12:20:54.647312-04:00 sli7d systemd-udevd[398]: Using default interface naming scheme 'v252'. I just ran `apt full-upgrade` (right now) and watched it upgrade Samba. Is it possible that Samba was triggering "lp"-related stuff which was causing the crash? Although why it would cause a crash I don't know. No new kernel (and thus no new modules). I suppose I could reboot and select and older kernel and see if that was stable ... Suggestions on how to better debug this would be most welcome. Does blacklisting the "lp" module sound like a good idea? Any other ideas? Re-installing would be ... unpleasant. This is my primary machine and heavily tweaked-up. But I guess I'll do that if I have to. Keeping it as a last resort though (daily crashes would get me there!). -- Giles https://www.gilesorr.com/ gilesorr@gmail.com
Define crashing :) Is is simply rebooting or freezing or shuting down. I would try to use journalctl to check logs; as detailed in the keyboard thread. Normally the kernel should trace the oops/fault before crashing. An alternative would be to boot an older kernel - it should be still installed - and see if that is stable. On 11/10/2023 18:30, Giles Orr via talk wrote:
I have a Debian 12 system that's my daily driver. In the last two days, it crashed twice when I was away from the keyboard and nothing was happening (around the same time of day now that I think about it). The system has previously been very stable, usually up for a month at a time with reboots only to pick up new kernels. I should note that when I turned it on and ran upgrades on Monday after a week away, it upgraded a lot of packages for Debian release 12.2.
I'm not great at debugging Linux crashes. The `dmesg` command is useless, as it only shows the log since the last boot. So I turned to /var/log/syslog. What I noticed was this, the only line of consequence about a millisecond before the reboot:
2023-10-10T11:36:23.839046-04:00 sli7d systemd-modules-load[399]: Inserted module 'lp'
I don't have a printer, and I hadn't just done a "print-to-PDF" or anything like that - the machine had been idle for a couple hours. This morning it crashed again, and milliseconds before the crash I found these (again, the machine was idle when this happened):
2023-10-11T12:20:54.647048-04:00 sli7d systemd-modules-load[382]: Inserted module 'lp' 2023-10-11T12:20:54.647254-04:00 sli7d systemd-modules-load[382]: Inserted module 'ppdev' 2023-10-11T12:20:54.647280-04:00 sli7d systemd-modules-load[382]: Inserted module 'parport_pc' 2023-10-11T12:20:54.647290-04:00 sli7d lvm[372]: 3 logical volume(s) in volume group "primary" monitored 2023-10-11T12:20:54.647302-04:00 sli7d systemd[1]: Starting systemd-journal-flush.service - Flush Journal to Persistent Storage... 2023-10-11T12:20:54.647312-04:00 sli7d systemd-udevd[398]: Using default interface naming scheme 'v252'.
I just ran `apt full-upgrade` (right now) and watched it upgrade Samba. Is it possible that Samba was triggering "lp"-related stuff which was causing the crash? Although why it would cause a crash I don't know. No new kernel (and thus no new modules). I suppose I could reboot and select and older kernel and see if that was stable ...
Suggestions on how to better debug this would be most welcome. Does blacklisting the "lp" module sound like a good idea? Any other ideas?
Re-installing would be ... unpleasant. This is my primary machine and heavily tweaked-up. But I guess I'll do that if I have to. Keeping it as a last resort though (daily crashes would get me there!).
-- -- This email has been checked for viruses by Avast antivirus software. www.avast.com
Giles Orr via talk wrote on 2023-10-11 15:30:
debugging Linux crashes. The `dmesg` command is useless, as it only shows the log since the last boot.
The tool for inspecting previous boot logs would be: ## Logs from *previous* boot for `lp` and `cups`: `journalctl --boot -1 --unit lp --unit cups`
I noticed was this, the only line of consequence about a millisecond before the reboot: 2023-10-10T11:36:23.839046-04:00 sli7d systemd-modules-load[399]: Inserted module 'lp'
I don't have a printer, and I hadn't just done a "print-to-PDF" or anything like that
One thought might be to disable cups (`journalctl disable --now cups`) and see if that helps (Common Unix Print Service)...
Is it possible that Samba was triggering "lp"-related stuff which was causing the crash?
That's possible - I can't recall much about samba, but maybe look into printer(s) is/are being shared and disable that feature.
I suppose I could reboot and select and older kernel and see if that was stable ... Suggestions on how to better debug this would be most welcome. Does blacklisting the "lp" module sound like a good idea?
Those also sound like good ideas. Good luck.
Why not switch to booting debian from a USB stick ?? * * * * * * Almost one (1) year ago, I ruined my debian 9 HDD installation by naively running a "fix broken packages" command. This deceitfully named command deleted a huge number of packages and left me with a bootable but basically useless debian system on the HDD. Command shell but no GUI. * * * * * * After struggling with trying to build a list of all the deleted packages, I abandoned that strategy. I was afraid just to re-install debian 9 to the hard drive, out of fear of damaging all the precious user data on the same hard drive. Besides, debian 9 had become increasingly annoying because some packages refused to install, reporting that the libc6 version was obsolete. So, Instead of re-installing debian 9 on the hard drive, I made a bootable debian 11 live USB memory stick. Since then, I have been running debian 11 from this USB stick. It's a little slow to boot, but this debian 11 system has only crashed maybe three (3) times since then, and I use this debian 11 every day for many hours. * * * * * * Here are details of how I implemented this bootable debian 11 live USB memory stick: (0) KEY POINT: The broken debian 9 system was STILL BOOTABLE and it still gave me a command shell interface, but no GUI. (1) downloaded a debian live iso image using a different PC (ancient Windows XP): debian-live-11.5.0-amd64-gnome+nonfree.iso (2) copied this .iso onto a USB stick on the WinXP PC, inserted the USB stick into a USB port on the wounded debian 9 PC, and copied the .iso to a folder on the debian 9 HDD. (3) unmounted the USB stick from the debian 9 file system: sudo umount /media/sdb1 sudo umount /dev/sdb (4) created the bootable debian 11 live USB stick: sudo dd if=<path to .iso on hdd> of=/dev/sdb bs=4M conv=fdatasync status=progress Takes a while to run, and may seem like it isn't doing anything, but it does work. (5) motherboard BIOS settings: -- turn off secure boot; -- do NOT use csm; -- make USB first boot device (some motherboards do NOT support USB boot, luckily my Asus board does); (6) probably need to insert the USB boot stick into the first USB port (I always leave mine in the same front panel USB port) (7) power up / reset the PC, and the PC should boot from the USB stick; MIne takes a couple of minutes to boot up to a debian 11 gnome desktop. * * * * * * TIP: My usb stick has a red LED that flashes when it is being accessed, and otherwise is steady on. This LED is very helpful in indicating that the USB is booting. It also flashes frequently during debian 11 operation e.g. while starting the firefox browser. DOWNSIDE: apt-get package installations seem to disappear after I shut down the debian 11. So I have to re-install whenever I want to run a package. I could probably figure out how to install packages to the HDD but haven't bothered with this yet. * * * * * * This asy-sleazy fix for my ruined debian 9 may seem like an obscene hack to linux purists, but hey, it works for me. I have way too many more interesting things to do with my remaining time alive on this sorry planet, than to invest time in being a perfect debian user. And besides, when time comes to upgrade to debian 12, I can easily make a USB stick for that too. Steve Petrie apetrie@aspetrie.net -------- Original Message -------- SUBJECT: Re: [GTALUG] Debian has suddenly become unstable DATE: 2023-10-11 20:06 FROM: BCLUG via talk <talk@gtalug.org> TO: talk@gtalug.org Giles Orr via talk wrote on 2023-10-11 15:30:
debugging Linux crashes. The `dmesg` command is useless, as it only shows the log since the last boot.
The tool for inspecting previous boot logs would be: ## Logs from *previous* boot for `lp` and `cups`: `journalctl --boot -1 --unit lp --unit cups`
I noticed was this, the only line of consequence about a millisecond before the reboot: 2023-10-10T11:36:23.839046-04:00 sli7d systemd-modules-load[399]: Inserted module 'lp'
I don't have a printer, and I hadn't just done a "print-to-PDF" or anything like that
One thought might be to disable cups (`journalctl disable --now cups`) and see if that helps (Common Unix Print Service)...
Is it possible that Samba was triggering "lp"-related stuff which was causing the crash?
That's possible - I can't recall much about samba, but maybe look into printer(s) is/are being shared and disable that feature.
I suppose I could reboot and select and older kernel and see if that was stable ... Suggestions on how to better debug this would be most welcome. Does blacklisting the "lp" module sound like a good idea?
Those also sound like good ideas. Good luck. --- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
BCLUG wrote:
One thought might be to disable cups (`journalctl disable --now cups`)
Did you mean 'systemctl disable --now cups' ??? --Bob. On 2023-10-11 20:06, BCLUG via talk wrote:
Giles Orr via talk wrote on 2023-10-11 15:30:
debugging Linux crashes. The `dmesg` command is useless, as it only shows the log since the last boot.
The tool for inspecting previous boot logs would be:
## Logs from *previous* boot for `lp` and `cups`: `journalctl --boot -1 --unit lp --unit cups`
I noticed was this, the only line of consequence about a millisecond before the reboot: 2023-10-10T11:36:23.839046-04:00 sli7d systemd-modules-load[399]: Inserted module 'lp'
I don't have a printer, and I hadn't just done a "print-to-PDF" or anything like that
One thought might be to disable cups (`journalctl disable --now cups`) and see if that helps (Common Unix Print Service)...
Is it possible that Samba was triggering "lp"-related stuff which was causing the crash?
That's possible - I can't recall much about samba, but maybe look into printer(s) is/are being shared and disable that feature.
I suppose I could reboot and select and older kernel and see if that was stable ... Suggestions on how to better debug this would be most welcome. Does blacklisting the "lp" module sound like a good idea?
Those also sound like good ideas.
Good luck. --- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
| From: Bob Jonkman via talk <talk@gtalug.org> | > One thought might be to disable cups (`journalctl disable --now cups`) | | Did you mean 'systemctl disable --now cups' ??? I often make that mistake. Although the names are logical, there is some kind of cognitive trap here.
From a human factors standpoint, it would be better if they didn't look alike. Of course that is less logical and elegant.
I'd also prefer "journalctl" to be shorter to type. Also, ctl seems to be redundant: what other command can you give the journal? On my system, "jou" TAB completes to "journalctl". ================ The --now flag is a feature that I've wanted but never noticed. Thanks! I wonder if I'll remember when I need it. The naming is not too good since the requested action was always immediate. --now asks for a different but related action. I've previously daydreamed about a syntax for systemctl which would allow a sequence of COMMANDS in a single invocation just as a list of UNITs is allowed. systemctl [OPTIONS...] COMMAND [UNIT...] For example: sytemctl enable,start,status dovecot
D. Hugh Redelmeier via talk wrote on 2023-10-12 08:02:
I'd also prefer "journalctl" to be shorter to type. Also, ctl seems to be redundant: what other command can you give the journal? On my system, "jou" TAB completes to "journalctl".
Yeah, the length of the names, especially `systemctl` having competing tab completions up to `systemc` has always been a bit of a PITA. I really ought to set up an alias, but I'm too lazy, so suffer through typing the full command. Maybe this'll prompt me to make an alias, *finally*.
| From: Giles Orr via talk <talk@gtalug.org> | I have a Debian 12 system that's my daily driver. In the last two | days, it crashed twice when I was away from the keyboard and nothing | was happening (around the same time of day now that I think about it). | I'm not great at debugging Linux crashes. The `dmesg` command is | useless, as it only shows the log since the last boot. I think that everything in dmesg goes to the Journal. Have you compared journal entries from just before each crash to see if there is a common theme? Good luck!
On Thu, 12 Oct 2023 at 11:09, D. Hugh Redelmeier via talk <talk@gtalug.org> wrote:
| From: Giles Orr via talk <talk@gtalug.org>
| I have a Debian 12 system that's my daily driver. In the last two | days, it crashed twice when I was away from the keyboard and nothing | was happening (around the same time of day now that I think about it).
| I'm not great at debugging Linux crashes. The `dmesg` command is | useless, as it only shows the log since the last boot.
I think that everything in dmesg goes to the Journal.
Have you compared journal entries from just before each crash to see if there is a common theme?
Good luck!
Thanks to BCLUG for `journalctl --boot -1` (and I assume `-2` etc.). That's a blessing. I ended up running `systemctl disable cups.service cups.socket cups.path` and `systemctl stop cups.service cups.socket cups.path` - so more or less what you were trying to suggest. :-) The system didn't crash yesterday, so that's good. Hugh seems to be correct: I think everything in `dmesg` ends up in the journal. But what I find interesting is that not everything in /var/log/systemlog is in the journal. I was comparing the systemlog entries, and that's how I concluded CUPS/printer drivers were the problem. The loading of those modules weren't mentioned in the journal. In this case, it seems that the information I most needed (ie. "this is a printer driver problem") came from syslog. This is a negative test case - ie. I don't know it's solved, and won't ever be certain. Unless it crashes again, then I know it's not solved. Ugh. Thanks everyone. -- Giles https://www.gilesorr.com/ gilesorr@gmail.com
| From: Giles Orr via talk <talk@gtalug.org> | Hugh seems to be correct: I think everything in `dmesg` ends up in the | journal. But what I find interesting is that not everything in | /var/log/systemlog is in the journal. That doesn't match my (possibly unreliable) model of The Way Things are Supposed to Be. I think: - the syslog facility has been replaced by the journald facility - all the APIs for syslog now go to journald. - That may require rebuilding old packages. (Not changing, just rebuilding, so that the newer libraries get linked.) - I would expect that the new stable debian had rebuilt all packages Any of these beliefs could be wrong. | I was comparing the systemlog | entries, and that's how I concluded CUPS/printer drivers were the | problem. The loading of those modules weren't mentioned in the | journal. In this case, it seems that the information I most needed | (ie. "this is a printer driver problem") came from syslog. So modifying my ealier question, can you see a commonality in the syslog entries just before each crash? | This is a negative test case - ie. I don't know it's solved, and won't | ever be certain. Unless it crashes again, then I know it's not | solved. Ugh. Yeah. And you won't know when it becomes safe to run CUPS again. The best way of gaining confidence would be if a bug fix were released. But you don't even know the bug. It doesn't even sound clear enough to report.
participants (6)
-
Aurelian Melinte -
BCLUG -
Bob Jonkman -
D. Hugh Redelmeier -
Giles Orr -
Steve Petrie