File larger than partition? (Docker-related)

I ran a backup today and noticed one file because it took so long to back up. I'm using FC25 on this machine, and used the OS packages of Docker. I seem to have a 100G file on a 12G partition: root@toshi7:/var/lib/docker/devicemapper/devicemapper# ls -lh total 34M -rw------- 1 root root 100G Oct 25 22:02 data -rw------- 1 root root 2.0G Oct 25 22:02 metadata root@toshi7:/var/lib/docker/devicemapper/devicemapper# df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sda8 16G 12G 3.8G 75% / tmpfs 7.8G 776M 7.1G 10% /tmp /dev/mapper/home_crypt 237G 211G 24G 91% /home ... /var/ is part of / , not a separate partition. The file appears to be a plain file, not a link or device. How exactly did my backup process (rsync to an external hard drive) manage to back up 100G of data from a 12G partition? "docker info" contains some very interesting output: I've included that below. The most obvious/naive solution would simply be to exclude /var/lib/docker/devicemapper/devicemapper/ from my backups. Because I'm guessing I don't need a backup of the "Data loop file?" Is that correct? To my surprise, the file and its backup share an md5sum value - but then, I haven't touched docker in the intervening time. But I'd still really like to know either where it's storing 100G of data, or what's going on if it's not actually storing that much. It's very weird, and makes it awfully hard to calculate how big your backup will be when your partitions are evidently behaving like Tardises. Containers: 0 Running: 0 Paused: 0 Stopped: 0 Images: 8 Server Version: 1.12.6 Storage Driver: devicemapper Pool Name: docker-8:8-802519-pool Pool Blocksize: 65.54 kB Base Device Size: 10.74 GB Backing Filesystem: xfs Data file: /dev/loop0 Metadata file: /dev/loop1 Data Space Used: 27.79 MB Data Space Total: 107.4 GB Data Space Available: 4.943 GB Metadata Space Used: 618.5 kB Metadata Space Total: 2.147 GB Metadata Space Available: 2.147 GB Thin Pool Minimum Free Space: 10.74 GB Udev Sync Supported: true Deferred Removal Enabled: false Deferred Deletion Enabled: false Deferred Deleted Device Count: 0 Data loop file: /var/lib/docker/devicemapper/devicemapper/data WARNING: Usage of loopback devices is strongly discouraged for production use. Use `--storage-opt dm.thinpooldev` to specify a custom block storage device. Metadata loop file: /var/lib/docker/devicemapper/devicemapper/metadata Library Version: 1.02.136 (2016-11-05) Logging Driver: journald Cgroup Driver: systemd Plugins: Volume: local Network: null host bridge overlay Swarm: inactive Runtimes: oci runc Default Runtime: oci Security Options: seccomp Kernel Version: 4.12.11-200.fc25.x86_64 Operating System: Fedora 25 (Twenty Five) OSType: linux Architecture: x86_64 Number of Docker Hooks: 2 CPUs: 8 Total Memory: 15.56 GiB Name: toshi7 ID: TJAV:PKWI:UYK2:UGAQ:GHUP:2HDO:4RBD:U54O:X7DM:SUPY:WQAM:IFTX Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ Insecure Registries: 127.0.0.0/8 Registries: docker.io (secure) -- Giles https://www.gilesorr.com/ gilesorr@gmail.com

On Mon, Nov 20, 2017 at 05:25:16PM +0000, Giles Orr via talk wrote:
I ran a backup today and noticed one file because it took so long to back up. I'm using FC25 on this machine, and used the OS packages of Docker. I seem to have a 100G file on a 12G partition:
root@toshi7:/var/lib/docker/devicemapper/devicemapper# ls -lh total 34M -rw------- 1 root root 100G Oct 25 22:02 data -rw------- 1 root root 2.0G Oct 25 22:02 metadata root@toshi7:/var/lib/docker/devicemapper/devicemapper# df -h Filesystem Size Used Avail Use% Mounted on ... /dev/sda8 16G 12G 3.8G 75% / tmpfs 7.8G 776M 7.1G 10% /tmp /dev/mapper/home_crypt 237G 211G 24G 91% /home ...
/var/ is part of / , not a separate partition. The file appears to be a plain file, not a link or device. How exactly did my backup process (rsync to an external hard drive) manage to back up 100G of data from a 12G partition?
Probably a sparse file. Telling rsync to handle sparse files efficiently will help a lot. If you don't tell it, it will expand the sparse file with the unwritten parts being all zeros. Often disk images for VMs are sparse files (so unwritten parts are not actually allocated yet). -- Len Sorensen

On Mon, Nov 20, 2017 at 05:25:16PM +0000, Giles Orr via talk wrote:
I ran a backup today and noticed one file because it took so long to back up. I'm using FC25 on this machine, and used the OS packages of Docker. I seem to have a 100G file on a 12G partition:
root@toshi7:/var/lib/docker/devicemapper/devicemapper# ls -lh total 34M -rw------- 1 root root 100G Oct 25 22:02 data -rw------- 1 root root 2.0G Oct 25 22:02 metadata
Try ls -lhs For example: ~> dd if=/dev/zero bs=1M count=1 seek=1000 of=testfile 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00203937 s, 514 MB/s ~> ls -lhs testfile 1.0M -rw-r--r-- 1 lsorensen users 1001M Nov 20 13:56 testfile So 1MB allocated out of 1001MB file size. The first 1000MB are a 'hole' in the file that isn't allocated yet. The -s option to ls makes it show the allocated space in the first column. -- Len Sorensen

On 20 November 2017 at 18:58, Lennart Sorensen <lsorense@csclub.uwaterloo.ca
wrote:
On Mon, Nov 20, 2017 at 05:25:16PM +0000, Giles Orr via talk wrote:
I ran a backup today and noticed one file because it took so long to back up. I'm using FC25 on this machine, and used the OS packages of Docker. I seem to have a 100G file on a 12G partition:
root@toshi7:/var/lib/docker/devicemapper/devicemapper# ls -lh total 34M -rw------- 1 root root 100G Oct 25 22:02 data -rw------- 1 root root 2.0G Oct 25 22:02 metadata
Try ls -lhs
For example:
~> dd if=/dev/zero bs=1M count=1 seek=1000 of=testfile 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00203937 s, 514 MB/s ~> ls -lhs testfile 1.0M -rw-r--r-- 1 lsorensen users 1001M Nov 20 13:56 testfile
So 1MB allocated out of 1001MB file size. The first 1000MB are a 'hole' in the file that isn't allocated yet.
The -s option to ls makes it show the allocated space in the first column.
Wow. I did not know that, thank you. And I see there's a specific switch to rsync for better handling of sparse files. But ... then what exactly does straight-up 'ls' (without the '-s') report? The man page says '-s' "print[s] the allocated size of each file, in blocks." I was under the mistaken impression (for 23 years now) that that was more or less what 'ls' was already doing. Here are some answers: https://en.wikipedia.org/wiki/Sparse_file ... I get the utility of the idea, but it seems to come with some fairly significant hazards. 'ls' can be made to indicate directories (with '/') and links (with '@') and a couple other things with '-F'/'--classify': sparse files would seem to be staggeringly misleading and thus a good target for this kind of marking as well ... Is that possible? Where else am I likely to run into sparse files? Sounds like mostly things that create file systems, like VirtualBox and friends, Docker (obviously) ... anywhere else? Sorry to ask so many questions, but 'ls' seems like one of the most basic commands of Linux and I thought I knew what it did: I'm suddenly feeling like a newbie again and would like to get a handle on this ... -- Giles https://www.gilesorr.com/ gilesorr@gmail.com

You can also try 'du -h'. -- William Park <opengeometry@yahoo.ca> On Mon, Nov 20, 2017 at 11:31:01PM +0000, Giles Orr via talk wrote:
On 20 November 2017 at 18:58, Lennart Sorensen <lsorense@csclub.uwaterloo.ca
wrote: ~> dd if=/dev/zero bs=1M count=1 seek=1000 of=testfile 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00203937 s, 514 MB/s ~> ls -lhs testfile 1.0M -rw-r--r-- 1 lsorensen users 1001M Nov 20 13:56 testfile
So 1MB allocated out of 1001MB file size. The first 1000MB are a 'hole' in the file that isn't allocated yet.
The -s option to ls makes it show the allocated space in the first column.
Wow. I did not know that, thank you. And I see there's a specific switch to rsync for better handling of sparse files.
But ... then what exactly does straight-up 'ls' (without the '-s') report? The man page says '-s' "print[s] the allocated size of each file, in blocks." I was under the mistaken impression (for 23 years now) that that was more or less what 'ls' was already doing.
Here are some answers: https://en.wikipedia.org/wiki/Sparse_file ... I get the utility of the idea, but it seems to come with some fairly significant hazards.
'ls' can be made to indicate directories (with '/') and links (with '@') and a couple other things with '-F'/'--classify': sparse files would seem to be staggeringly misleading and thus a good target for this kind of marking as well ... Is that possible?
Where else am I likely to run into sparse files? Sounds like mostly things that create file systems, like VirtualBox and friends, Docker (obviously) .. anywhere else?
Sorry to ask so many questions, but 'ls' seems like one of the most basic commands of Linux and I thought I knew what it did: I'm suddenly feeling like a newbie again and would like to get a handle on this ...
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com
--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

On 20 November 2017 at 23:31, Giles Orr <gilesorr@gmail.com> wrote:
On 20 November 2017 at 18:58, Lennart Sorensen < lsorense@csclub.uwaterloo.ca> wrote:
On Mon, Nov 20, 2017 at 05:25:16PM +0000, Giles Orr via talk wrote:
I ran a backup today and noticed one file because it took so long to back up. I'm using FC25 on this machine, and used the OS packages of Docker. I seem to have a 100G file on a 12G partition:
root@toshi7:/var/lib/docker/devicemapper/devicemapper# ls -lh total 34M -rw------- 1 root root 100G Oct 25 22:02 data -rw------- 1 root root 2.0G Oct 25 22:02 metadata
Try ls -lhs
For example:
~> dd if=/dev/zero bs=1M count=1 seek=1000 of=testfile 1+0 records in 1+0 records out 1048576 bytes (1.0 MB) copied, 0.00203937 s, 514 MB/s ~> ls -lhs testfile 1.0M -rw-r--r-- 1 lsorensen users 1001M Nov 20 13:56 testfile
So 1MB allocated out of 1001MB file size. The first 1000MB are a 'hole' in the file that isn't allocated yet.
The -s option to ls makes it show the allocated space in the first column.
Wow. I did not know that, thank you. And I see there's a specific switch to rsync for better handling of sparse files.
But ... then what exactly does straight-up 'ls' (without the '-s') report? The man page says '-s' "print[s] the allocated size of each file, in blocks." I was under the mistaken impression (for 23 years now) that that was more or less what 'ls' was already doing.
Here are some answers: https://en.wikipedia.org/wiki/Sparse_file ... I get the utility of the idea, but it seems to come with some fairly significant hazards.
'ls' can be made to indicate directories (with '/') and links (with '@') and a couple other things with '-F'/'--classify': sparse files would seem to be staggeringly misleading and thus a good target for this kind of marking as well ... Is that possible?
Where else am I likely to run into sparse files? Sounds like mostly things that create file systems, like VirtualBox and friends, Docker (obviously) ... anywhere else?
Sorry to ask so many questions, but 'ls' seems like one of the most basic commands of Linux and I thought I knew what it did: I'm suddenly feeling like a newbie again and would like to get a handle on this ...
As a short follow-up: current VirtualBox does use some form of expandable file system for its disk images (.vdi files), but if they're sparse files, they're hidden inside the image file somehow (I could have researched the VDI file format ... but I didn't. Not the point right now). I have quite a number of VDI images that are 8GiB in use, but on disk they're 1-3Gib and use the same amount of disk space as their file size. -- Giles https://www.gilesorr.com/ gilesorr@gmail.com

On Tue, Nov 21, 2017 at 01:52:54PM +0000, Giles Orr via talk wrote:
As a short follow-up: current VirtualBox does use some form of expandable file system for its disk images (.vdi files), but if they're sparse files, they're hidden inside the image file somehow (I could have researched the VDI file format ... but I didn't. Not the point right now). I have quite a number of VDI images that are 8GiB in use, but on disk they're 1-3Gib and use the same amount of disk space as their file size.
Some virtual machine disk formats are dynamically allocated as part of their own format. Some (like raw disk images that you can use with qemu and some others) are usually just sparse files that get allocated as needed. -- Len Sorensen

On Mon, Nov 20, 2017 at 11:31:01PM +0000, Giles Orr wrote:
Wow. I did not know that, thank you. And I see there's a specific switch to rsync for better handling of sparse files.
But ... then what exactly does straight-up 'ls' (without the '-s') report? The man page says '-s' "print[s] the allocated size of each file, in blocks." I was under the mistaken impression (for 23 years now) that that was more or less what 'ls' was already doing.
ls shows the filesize. That is the total size of the file currently. This means if you were to open the file and read it from one end to the other, that is how many bytes you get. If the file was created sparse (ie, you open it for writing, seek somewhere, then write some data but don't write the stuff before the seek), then the parts that have never been written to are not allocated yet and are simply implied to be all zeros. Reading the file will simply return a stream of zeros for the unallocated parts (and will conviniently also read those parts VERY fast). So size of a file and allocated space on disk for a file are not the same thing in some cases (filesize is always greater than or equal to allocated size, not counting the wasted space in the last block if the file is not a multiple of the block size).
Here are some answers: https://en.wikipedia.org/wiki/Sparse_file ... I get the utility of the idea, but it seems to come with some fairly significant hazards.
'ls' can be made to indicate directories (with '/') and links (with '@') and a couple other things with '-F'/'--classify': sparse files would seem to be staggeringly misleading and thus a good target for this kind of marking as well ... Is that possible?
Where else am I likely to run into sparse files? Sounds like mostly things that create file systems, like VirtualBox and friends, Docker (obviously) ... anywhere else?
Sorry to ask so many questions, but 'ls' seems like one of the most basic commands of Linux and I thought I knew what it did: I'm suddenly feeling like a newbie again and would like to get a handle on this ...
Unix file systems simply allow files that are not fully allocated yet. Useful feature, not supported on FAT, but NTFS in fact supports it too. Might have been required for posix compliance in the past. -- Len Sorensen
participants (3)
-
Giles Orr
-
lsorense@csclub.uwaterloo.ca
-
William Park