Interesting essay on filesystem consistency

http://danluu.com/file-consistency/ I found it interesting that ReiserFS was pointed at as having amongst the best handling of errors, passing them back sensibly to applications. Also interesting was that they pointed at using SQLite as a way of mostly hiding applications from worrying about such troubles. Also entertaining was the notion of "MBox considered harmful"; makes me like MH and Maildir all the better, as they have the merit of storing a message per file, so that there's fewer concurrency issues surrounding re-opening files and rewriting them. (Possibly I'm wrong, and the problem just shifts to directory metadata access...) -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"

On 13/12/15 10:48 PM, Christopher Browne wrote:
http://danluu.com/file-consistency/
I found it interesting that ReiserFS was pointed at as having amongst the best handling of errors, passing them back sensibly to applications.
Also interesting was that they pointed at using SQLite as a way of mostly hiding applications from worrying about such troubles.
Also entertaining was the notion of "MBox considered harmful"; makes me like MH and Maildir all the better, as they have the merit of storing a message per file, so that there's fewer concurrency issues surrounding re-opening files and rewriting them. (Possibly I'm wrong, and the problem just shifts to directory metadata access...) -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
--- Talk Mailing List talk@gtalug.org http://gtalug.org/mailman/listinfo/talk
Updating a mailbox is an example of using the smallest possible set of primitives from V6. It was done in such a way that exactly and only one binary did the task, and depended only on the atomicity of creat(O_EXCL), seek and and write. It's surprisingly hard, much as is creating a directory entry, and is one of the examples we were told to understand when studying Unix, as they were wildly different from OS/360 and GCOS. --dave -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain

On Sun, Dec 13, 2015 at 10:48:09PM -0500, Christopher Browne wrote:
http://danluu.com/file-consistency/
I found it interesting that ReiserFS was pointed at as having amongst the best handling of errors, passing them back sensibly to applications.
It better have good error handling given how many insane errors it often generates (in my experience). It is not a place I want to store my data anymore.
Also interesting was that they pointed at using SQLite as a way of mostly hiding applications from worrying about such troubles.
Also entertaining was the notion of "MBox considered harmful"; makes me like MH and Maildir all the better, as they have the merit of storing a message per file, so that there's fewer concurrency issues surrounding re-opening files and rewriting them. (Possibly I'm wrong, and the problem just shifts to directory metadata access...)
Well deleting a message requires deleting a file, not rewriting the entire mbox after that message, so Maildir is a good idea. -- Len Sorensen

Close to 20 years ago I moved from mbox to cyrus and have never looked back. I did learn that the Ext file systems had some issues bit big directories and for a lot of years I used reiserfs. It seems that recent Ext filesystems have the big directory problem fixed. In those early years I had Ext give me more than a few bad days. On 12/14/2015 01:36 PM, Lennart Sorensen wrote:
On Sun, Dec 13, 2015 at 10:48:09PM -0500, Christopher Browne wrote:
http://danluu.com/file-consistency/
I found it interesting that ReiserFS was pointed at as having amongst the best handling of errors, passing them back sensibly to applications. It better have good error handling given how many insane errors it often generates (in my experience). It is not a place I want to store my data anymore.
Also interesting was that they pointed at using SQLite as a way of mostly hiding applications from worrying about such troubles.
Also entertaining was the notion of "MBox considered harmful"; makes me like MH and Maildir all the better, as they have the merit of storing a message per file, so that there's fewer concurrency issues surrounding re-opening files and rewriting them. (Possibly I'm wrong, and the problem just shifts to directory metadata access...) Well deleting a message requires deleting a file, not rewriting the entire mbox after that message, so Maildir is a good idea.
-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

On 12/14/2015 01:36 PM, Lennart Sorensen wrote:
Well deleting a message requires deleting a file, not rewriting the entire mbox after that message, so Maildir is a good idea.
My first exposure to Maildir was via Qmail. I thought it was a better way to handle lots of email than sticking everything in a single file that needs to keep updated as you read and delete messages. Makes you wonder why someone thought it was a good idea to just drop all email in to a single file in the first place. -- Cheers! Kevin. http://www.ve3syb.ca/ |"Nerds make the shiny things that distract Owner of Elecraft K2 #2172 | the mouth-breathers, and that's why we're | powerful!" #include <disclaimer/favourite> | --Chris Hardwick

On Fri, Dec 18, 2015 at 12:17:41PM -0500, Kevin Cozens wrote:
On 12/14/2015 01:36 PM, Lennart Sorensen wrote:
Well deleting a message requires deleting a file, not rewriting the entire mbox after that message, so Maildir is a good idea.
My first exposure to Maildir was via Qmail. I thought it was a better way to handle lots of email than sticking everything in a single file that needs to keep updated as you read and delete messages. Makes you wonder why someone thought it was a good idea to just drop all email in to a single file in the first place.
A lot less inodes, and older filesystems didn't like large directories. -- Len Sorensen

On Fri, 2015/12/18 03:34:09PM -0500, Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote: | On Fri, Dec 18, 2015 at 12:17:41PM -0500, Kevin Cozens wrote: | > Makes you wonder why someone | > thought it was a good idea to just drop all email in to a single file in the | > first place. | | A lot less inodes, and older filesystems didn't like large directories. And a lot less mail, And messages were much smaller, pre-MIME. John

In the days of !path email a single file made a lot of sense. It was just much easier. That is defiantly not true today. You have to remember a very famous man once said "who needs more than 640K" and someone once predicted that the world would only need 5 computers. All our good ideas today in 20 or so years will be though of as just plain stupid. On 12/18/2015 06:21 PM, John Sellens wrote:
On Fri, 2015/12/18 03:34:09PM -0500, Lennart Sorensen <lsorense@csclub.uwaterloo.ca> wrote: | On Fri, Dec 18, 2015 at 12:17:41PM -0500, Kevin Cozens wrote: | > Makes you wonder why someone | > thought it was a good idea to just drop all email in to a single file in the | > first place. | | A lot less inodes, and older filesystems didn't like large directories.
And a lot less mail, And messages were much smaller, pre-MIME.
John --- Talk Mailing List talk@gtalug.org http://gtalug.org/mailman/listinfo/talk
-- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

On Fri, Dec 18, 2015 at 09:24:27PM -0500, Alvin Starr wrote:
In the days of !path email a single file made a lot of sense.
It was just much easier.
That is defiantly not true today.
You have to remember a very famous man once said "who needs more than 640K"
Except no one seems able to find any evidence he ever said that.
and someone once predicted that the world would only need 5 computers.
Given what a computer was when that was stated, it was probably true. For that type of computer. Once they became smaller, cheaper, faster, the market got a lot bigger.
All our good ideas today in 20 or so years will be though of as just plain stupid.
Well the ideas of unix seems to be holding up pretty well after 45 years. :) -- Len Sorensen

| From: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> | | On Fri, Dec 18, 2015 at 12:17:41PM -0500, Kevin Cozens wrote: | > On 12/14/2015 01:36 PM, Lennart Sorensen wrote: | > >Well deleting a message requires deleting a file, not rewriting the | > >entire mbox after that message, so Maildir is a good idea. | > | > My first exposure to Maildir was via Qmail. I thought it was a better way to | > handle lots of email than sticking everything in a single file that needs to | > keep updated as you read and delete messages. Makes you wonder why someone | > thought it was a good idea to just drop all email in to a single file in the | > first place. | | A lot less inodes, and older filesystems didn't like large directories. I've been using mbox format for almost 40 years. It seems to work fairly well for me. - performance is OK, even for horribly large mbox files - (touch wood) I don't remember anything lost due to "too many eggs in one basket" - "external fragmentation" would seem to be a problem with Maildir: file overhead (including rounding up to a full last block) is probably a significant part of the cost of a mail file. I'm setting up a new mailserver right now and I'm wondering if I should switch. I'm building a CentOS 7 system to replace a CentOS 5 one. Learning about Postfix. (I once more or less understood sendmail.) I don't use IMAP (yet?). Maybe IMAP vs mbox would be a problem. I may set up Dovecot -- does that demand Maildir?

On 23/12/15 08:26 PM, D. Hugh Redelmeier wrote:
| From: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> | | On Fri, Dec 18, 2015 at 12:17:41PM -0500, Kevin Cozens wrote: | > On 12/14/2015 01:36 PM, Lennart Sorensen wrote: | > >Well deleting a message requires deleting a file, not rewriting the | > >entire mbox after that message, so Maildir is a good idea. | > | > My first exposure to Maildir was via Qmail. I thought it was a better way to | > handle lots of email than sticking everything in a single file that needs to | > keep updated as you read and delete messages. Makes you wonder why someone | > thought it was a good idea to just drop all email in to a single file in the | > first place. | | A lot less inodes, and older filesystems didn't like large directories.
I've been using mbox format for almost 40 years. It seems to work fairly well for me.
- performance is OK, even for horribly large mbox files
- (touch wood) I don't remember anything lost due to "too many eggs in one basket"
- "external fragmentation" would seem to be a problem with Maildir: file overhead (including rounding up to a full last block) is probably a significant part of the cost of a mail file.
I'm setting up a new mailserver right now and I'm wondering if I should switch. I'm building a CentOS 7 system to replace a CentOS 5 one. Learning about Postfix. (I once more or less understood sendmail.)
I don't use IMAP (yet?). Maybe IMAP vs mbox would be a problem. I may set up Dovecot -- does that demand Maildir? --- Talk Mailing List talk@gtalug.org http://gtalug.org/mailman/listinfo/talk mbox works well if the person writing the code knows Unix V6 primitives. If not, they can fail (;-)) For messages larger than the atomic-write size of the filesystem, mbox can have a race condition, as it depends on atomicity of writes to end-of-file to append a whole message at a time.
--dave -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain

| From: David Collier-Brown <davec-b@rogers.com> | mbox works well if the person writing the code knows Unix V6 primitives. If | not, they can fail (;-)) | For messages larger than the atomic-write size of the filesystem, mbox can | have a race condition, as it depends on atomicity of writes to end-of-file to | append a whole message at a time. I'm pretty sure that the programs I currently use to access mbox files use locking. procmail uses lockfile(1); I don't know if alpine uses the same locking.

| From: Lennart Sorensen <lsorense@csclub.uwaterloo.ca> | | On Fri, Dec 18, 2015 at 12:17:41PM -0500, Kevin Cozens wrote: | > On 12/14/2015 01:36 PM, Lennart Sorensen wrote: | > >Well deleting a message requires deleting a file, not rewriting the | > >entire mbox after that message, so Maildir is a good idea. | > | > My first exposure to Maildir was via Qmail. I thought it was a better way to | > handle lots of email than sticking everything in a single file that needs to | > keep updated as you read and delete messages. Makes you wonder why someone | > thought it was a good idea to just drop all email in to a single file in the | > first place. | | A lot less inodes, and older filesystems didn't like large directories.
I've been using mbox format for almost 40 years. It seems to work fairly well for me.
- performance is OK, even for horribly large mbox files
- (touch wood) I don't remember anything lost due to "too many eggs in one basket"
- "external fragmentation" would seem to be a problem with Maildir: file overhead (including rounding up to a full last block) is probably a significant part of the cost of a mail file.
I'm setting up a new mailserver right now and I'm wondering if I should switch. I'm building a CentOS 7 system to replace a CentOS 5 one. Learning about Postfix. (I once more or less understood sendmail.)
I don't use IMAP (yet?). Maybe IMAP vs mbox would be a problem. I may set up Dovecot -- does that demand Maildir? --- Talk Mailing List talk@gtalug.org http://gtalug.org/mailman/listinfo/talk I don't think the issue is between mbox and maildir as much as between
On 12/23/2015 08:26 PM, D. Hugh Redelmeier wrote: pop and imap. POP typicality works with a single mail file and has concurrency issues and was not designed with the idea mail folders. IMAP on the other hand is a much newer protocol and has support for concurrency and directory hierarchies and a plethora of other features. POP and IMAP do not require single or multiple files as their backing store. To your point the reliability of mbox. I never had a failure of mbox files but had lots of problems over the years with the windows mbox equivalent and I think that is more telling of the mail reader/client than the mailstore format. I did have problems with mailservers that were tight on space and had a few big mbox files because mbox requires 2x the file space for mail manipulation and that ran me into problems. I moved to IMAP for the features It is a more complex protocol but is typicaly faster and able to handle more clients than an mbox based implementation. I found that some mail users with windows POP clients would download the whole mailbox each time they connected to check mail and that lead to some bandwidth and performance issues. Pushing them to IMAP solved that problem. -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||
participants (7)
-
Alvin Starr
-
Christopher Browne
-
D. Hugh Redelmeier
-
David Collier-Brown
-
John Sellens
-
Kevin Cozens
-
Lennart Sorensen