
| From: o1bigtenor via talk <talk@gtalug.org> | My server has been operational for about a year and I am working on a | number of different projects on it. Twice now (this last friday and 5 | weeks early I came into the office to find that the server has somehow | been taken down and has rebooted itself (process setup in the bios) | but as it doesn't quite complete the boot process, I have to hit a key | to tell it to continue and then finally to log in to read Debian | (stable). | | So I am trying to determine what may have caused the system to do a | reboot, Often a crash prevents logging. Clearly logging would have to happen after the crash, something that isn't easy when the system has crashed. But there is some hope. Do you have a working UPS? I don't, and I lose power a few times a year. That knocks out my computers (and clocks everywere). Aside: all device classes evolve to have enough intelligence to have clocks that need setting, and then evolve to be networked to set their own clocks. The timing of these steps is not fixed. Can you believe that I grew up with phones that had no clock? The first small computers I used had no clocks. The big ones did so that IBM could charge for the time that they were used (eg. one used to rent machines and have to pay overtime if they worked more than one shift). CP/M's file system didn't have timestamps (the were added long after I moved on). MS-DOS stupidly used local time for timestamps, even though UNIX got it right (used UTC) before MS-DOS. | AIUI servers should be | able to run happily for years without issues (barring hardware | problems) so I want that kind of reliability. Where in /var/log will I | be finding the most clues as to the events that lead up to this | 'reboot'? Not being a debian user, I don't know which files are most useful. If you are using systemd you might find that journalctl is the command you need. You could look at them all (you can skip the ones which haven't changed recently). I don't know why your system stops at the POST page. Could it be that your HDD doesn't spin up quickly enough for the normal boot logic? I have one server that hangs because the EFI System Partition's filesystem gets corrupted during a crash (oops). I think that the problem is that the OS leaves /boot/efi mounted most of the time (that's dumb) so the filesystem gets marked as "dirty" and the firmware doesn't like that.