
On 6/30/20 4:53 AM, D. Hugh Redelmeier via talk wrote:
Warning: it is the middle of the night and I'm going to ramble.
[snip]
The following are some random thoughts about filesystems. I'm interested in any reactions to these.
The UNIX model of a file being a randomly-accessed array of fixed-size blocks doesn't fit very well with compression. Even if a large portion of files are accessed purely as a byte stream. That's perhaps a flaw in UNIX but it is tough to change.
Fixed size disk blocks is not just a UNIX thing. Admittedly I have not seen all the hardware out there but I do not believe that there has been a disk drive that has not been formatted with fixed block sizes outside of some very old or very new stuff. Think of it from a hardware perspective If you have random sized blocks you need to manage fragmentation and then likely some form of free space clean up. And that level of compute power was not available in a disk controller until fairly recently. By which time the standard design was so entrenched that any other layouts were not gain enough traction to be worth designing and trying to sell. For example there are disk drives with Ethernet interfaces and have a native object store. The Segate Kinetic drives never seemed to get beyond the sampling phase.
In modern systems, with all kinds of crazy containerization, I guess de-duplication might be very useful. As well as COW, I think. Is this something for the File System, or a layer below, like LVM? I have a problem with de-duplication. I am not sure how well it actually works in practice. At the file system level its just linking of the 2 identical files until one is changed so you need COW.
At the block level you have to look at the overhead of the hash function and the storage of the hash data. The size of the hash key relates to the likelihood an error duplicate match but the size of the block. The duplicate blocks still need to be compared causing an extra reads. Lets say you use SHA-2 for your hash you have a key of 32 bytes if you use 512 bytes for your block size then your hash table is about a 6% overhead. If you go for larger blocks then you will get less hits because the filesystems want to allocate smaller blocks for small file efficiency. If you use LVM extents then the hit rate drops even more. It may work well where you have a large number of VMs where the disk images tend to start out all the sameĀ and where the OS will tend to stay static leaving large parts of the disk untouched for a long time. It may also be possible to develop file systems that are amenable to de-duplication.
There's something appealing about modularizing the FS code by composable layers. But not if the overhead is observable. Or the composability leaves rough edges.
Here's a natural order for layers: FS (UNIX semantics + ACLS etc, more than just POSIX) de-duplication compression encryption aggregation for efficient use of device?
This appears to be what Redhat is pushing with their VDO(Virtual Data Optimizer )
I don't know where to fit in checksums. Perhaps it's a natural part of encryption (encryption without integrity checking has interesting weaknesses).
nothing beats dd if=/dev/zero of=your_secret_file for security ;)
I don't know how to deal with the variable-sized blocks that come out of compression. Hardware has co-evolved with file-systems to expect blocks of 512 or 4096 bytes. (I remember IBM/360 disk drives which supported a range of block sizes as if each track was a short piece of magnetic tape.)
Move from disks to object stores(Key/Value).
I don't know how to have file systems more respectfully reflect the underlying nature of SSDs and shingled HDDs
I also am still waiting for translucent mounts like Plan 9.
How would translucent mounts compare to overlay mounts?
I think that many or most drives do whole-volume encryption invisible to the OS. This really isn't useful to the OS since the whole volume has a single key.
The most secure encryption is end-to-end. It tends to be less convenient. Maybe my placement of encryption near the bottom of the stack isn't good enough. I would argue that encryption should be as high in the stack as possible. encrypting the disk provides "at rest" security so when the drives are sold to someone at the bankrupcy sale they cannot use the data. It does not help the hacker who has gained access to the system from dumping the database of credit card info.
[snip] -- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||