Compressing an image of a microSD card

I have an Anbernic RG350, a very nice hand-held gaming console: https://retrogame300.com/products/rg350 It uses a Linux distro that was developed for an older hand-held game console. It turns out that the OS and "internal storage" are on an actual microSD card inside the case: four screws and a gentle tug to get the card out, and you can image the "main drive." It's a 16G SD card with two partitions that occupy the whole card. Between them, they have 1.4G of used space. I imaged the whole card: # time dd if=/dev/sdi bs=4M of=./RG350.SD.2020-05-14.img conv=fsync status=progress This produced a roughly 16G image as expected. Then I compressed it: # time xz --threads=5 RG350.SD.2020-05-14.img This produced a 10G image. My assumption was that compression would see the empty partition space, presumably as a bunch of zeroes, and compress the crap out of it so that the final image would be the same size or smaller than the 1.4G of used space. I'm aware that the free space may not be zeroed out ... It's not encrypted, so that's not the issue. Is there a sane way to back this up that would produce a smaller image? I prefer to image the whole card ... and I like 'dd' because I'm familiar with it, although maybe I should move on ... Is somehow zeroing the empty space on the card a possibility? Suggestions welcomed ... I can live with the 10G backup if I have to, but would prefer a "better" solution if it's available. -- Giles https://www.gilesorr.com/ gilesorr@gmail.com

On Thu, May 14, 2020 at 12:10:53PM -0400, Giles Orr via talk wrote:
I have an Anbernic RG350, a very nice hand-held gaming console:
https://retrogame300.com/products/rg350
It uses a Linux distro that was developed for an older hand-held game console. It turns out that the OS and "internal storage" are on an actual microSD card inside the case: four screws and a gentle tug to get the card out, and you can image the "main drive."
It's a 16G SD card with two partitions that occupy the whole card. Between them, they have 1.4G of used space. I imaged the whole card:
# time dd if=/dev/sdi bs=4M of=./RG350.SD.2020-05-14.img conv=fsync status=progress
This produced a roughly 16G image as expected. Then I compressed it:
# time xz --threads=5 RG350.SD.2020-05-14.img
This produced a 10G image.
My assumption was that compression would see the empty partition space, presumably as a bunch of zeroes, and compress the crap out of it so that the final image would be the same size or smaller than the 1.4G of used space. I'm aware that the free space may not be zeroed out ... It's not encrypted, so that's not the issue. Is there a sane way to back this up that would produce a smaller image? I prefer to image the whole card ... and I like 'dd' because I'm familiar with it, although maybe I should move on ... Is somehow zeroing the empty space on the card a possibility? Suggestions welcomed ...
I can live with the 10G backup if I have to, but would prefer a "better" solution if it's available.
Any idea what filesystem it uses? Certainly something like clonezilla knows how to deal with filesystems and partitions and ignore the parts of the filesystem that are not in use (even if not zeroed). If you can mount the saved image (using loop device) you might be able to zero the unused space which would reduce the compressed size a lot). -- Len Sorensen

| From: Lennart Sorensen via talk <talk@gtalug.org> | Certainly something like clonezilla knows how to deal with filesystems | and partitions and ignore the parts of the filesystem that are not in use | (even if not zeroed). If you use fstrim on a filesystem, what happens when a dd accesses an unallocated block? Is that well-defined? An I/O error? Are all unallocated blocks identical? Is this up to the manufacturer or some standard? (I'm too lazy or too busy to experiment.)

On Thu, May 14, 2020 at 02:10:26PM -0400, D. Hugh Redelmeier via talk wrote:
If you use fstrim on a filesystem, what happens when a dd accesses an unallocated block? Is that well-defined? An I/O error? Are all unallocated blocks identical? Is this up to the manufacturer or some standard?
(I'm too lazy or too busy to experiment.)
According to wikipedia the ATA standard allows choices: There are different types of TRIM defined by SATA Words 69 and 169 returned from an ATA IDENTIFY DEVICE command: 1) Non-deterministic TRIM: Each read command to the Logical block address (LBA) after a TRIM may return different data. 2) Deterministic TRIM (DRAT): All read commands to the LBA after a TRIM shall return the same data, or become determinate. 3) Deterministic Read Zero after TRIM (RZAT): All read commands to the LBA after a TRIM shall return zero. Apparently 1 is pretty much only seen on budget garbage drives 2 is most common on consumer SSDs 3 is most common on enterprise SSDs and required by many NAS systems if you want to run RAID on the drives. -- Len Sorensen

On 2020-05-14 12:10 p.m., Giles Orr via talk wrote:
It uses a Linux distro that was developed for an older hand-held game console. It turns out that the OS and "internal storage" are on an actual microSD card inside the case: four screws and a gentle tug to get the card out, and you can image the "main drive."
It's a 16G SD card with two partitions that occupy the whole card. Between them, they have 1.4G of used space. I imaged the whole card:
# time dd if=/dev/sdi bs=4M of=./RG350.SD.2020-05-14.img conv=fsync status=progress
This produced a roughly 16G image as expected. Then I compressed it:
# time xz --threads=5 RG350.SD.2020-05-14.img
This produced a 10G image.
My assumption was that compression would see the empty partition space, presumably as a bunch of zeroes, and compress the crap out of it so that the final image would be the same size or smaller than the 1.4G of used space. I'm aware that the free space may not be zeroed out ... It's not encrypted, so that's not the issue. Is there a sane way to back this up that would produce a smaller image? I prefer to image the whole card ... and I like 'dd' because I'm familiar with it, although maybe I should move on ... Is somehow zeroing the empty space on the card a possibility? Suggestions welcomed ...
I can live with the 10G backup if I have to, but would prefer a "better" solution if it's available.
Hi Giles, Firstly, there's a good chance that if the space is unused, it can be made compressible. If it were zeroed, it would have compressed very well. Since the unit is linux-based, you have a very good chance of being able to loop-mount it. If, furthermore, you're lucky enough to be dealing with an old-style partition table on the card: * Start by seeing if you get a partition table: 'fdisk -l RG350.SD.2020-05-14.img' * Loop mount, e.g. 'mount -o loop,offset=$((512 * xxxx)) RG350.SD.2020-05-14.img tmp/' where xxx is the number of, e.g., 512-byte sectors * explicitly zero empty space: 'dd if=/dev/zero of=tmp/zero.dat bs=1M' * rm tmp/zero.dat; umount tmp/ * recompress If the partitioning isn't obvious, I would still look for a filesystem superblock in there somewhere, and then loopmount etc. as above. Good luck! Cheers, Mike

On Thu, May 14, 2020 at 02:26:53PM -0400, El Fontanero via talk wrote:
Firstly, there's a good chance that if the space is unused, it can be made compressible. If it were zeroed, it would have compressed very well. Since the unit is linux-based, you have a very good chance of being able to loop-mount it. If, furthermore, you're lucky enough to be dealing with an old-style partition table on the card:
* Start by seeing if you get a partition table: 'fdisk -l RG350.SD.2020-05-14.img' * Loop mount, e.g. 'mount -o loop,offset=$((512 * xxxx)) RG350.SD.2020-05-14.img tmp/' where xxx is the number of, e.g., 512-byte sectors * explicitly zero empty space: 'dd if=/dev/zero of=tmp/zero.dat bs=1M' * rm tmp/zero.dat; umount tmp/ * recompress
If the partitioning isn't obvious, I would still look for a filesystem superblock in there somewhere, and then loopmount etc. as above.
If you enable partition support on your loop driver you don't need to deal with the offset calculations and such. It is unfortunately not enabled by default for legacy reasons I believe. modprobe -r loop modprobe loop max_part=16 Then losetup /dev/loop0 imagefile and you should have /dev/loop0p1 /dev/loop0p2 etc. If the filesystem is ext based, zerofree is a nice tool to zero unused space that seems to run faster than using dd and rm with a zero filled file. -- Len Sorensen

On Thu, May 14, 2020 at 03:38:37PM -0400, Lennart Sorensen via talk wrote:
* Loop mount, e.g. 'mount -o loop,offset=$((512 * xxxx)) RG350.SD.2020-05-14.img tmp/' where xxx is the number of, e.g., 512-byte sectors
If you enable partition support on your loop driver you don't need to deal with the offset calculations and such. It is unfortunately not enabled by default for legacy reasons I believe.
modprobe -r loop modprobe loop max_part=16
Then losetup /dev/loop0 imagefile
and you should have /dev/loop0p1 /dev/loop0p2 etc.
If the filesystem is ext based, zerofree is a nice tool to zero unused space that seems to run faster than using dd and rm with a zero filled file.
I had to do this kind of thing before, but I don't remember loading "loop" module with options. I think "losetup" does it or with some options. -- William Park <opengeometry@yahoo.ca>

On Thu, May 14, 2020 at 04:50:12PM -0400, William Park via talk wrote:
I had to do this kind of thing before, but I don't remember loading "loop" module with options. I think "losetup" does it or with some options.
Some distributions may setup default arguments for loop to enable partitions automatically. Certainly looking on debian, with loop loaded, max_part and max_loop are 0 when I look at /sys/module/loop/parameters. However I do see if I run losetup -P /dev/loop0 test.img, it does in fact enable partitions so it seems things have been improved in the kernel. If max_part is non zero however, it auto scans for partitions, without needing the -P option to losetup. But saves reloading the module. -- Len Sorensen

| From: Lennart Sorensen via talk <talk@gtalug.org> | If the filesystem is ext based, zerofree is a nice tool to zero unused | space that seems to run faster than using dd and rm with a zero filled | file. If your filesystem lives on some form of flash (SSD, SD card, USB stick, ...) this can reduce the lifetime and performance of your hardware. The wear-levelling firmware of the drive will think that every block of the drive is "live" (contains valuable information). This will increase "write amplification". In any case, if you do do this, be sure to use fstrim afterwards. (I'm not sure that SD cards and USB sticks support trim.

On Thu, May 14, 2020 at 05:20:18PM -0400, D. Hugh Redelmeier via talk wrote:
If your filesystem lives on some form of flash (SSD, SD card, USB stick, ...) this can reduce the lifetime and performance of your hardware.
The wear-levelling firmware of the drive will think that every block of the drive is "live" (contains valuable information). This will increase "write amplification".
In any case, if you do do this, be sure to use fstrim afterwards. (I'm not sure that SD cards and USB sticks support trim.
I think the idea was to do it on the image loop back mounted, not on the original device. -- Len Sorensen

On Thu, 14 May 2020 at 18:04, Lennart Sorensen via talk <talk@gtalug.org> wrote:>
On Thu, May 14, 2020 at 05:20:18PM -0400, D. Hugh Redelmeier via talk wrote:
If your filesystem lives on some form of flash (SSD, SD card, USB stick, ...) this can reduce the lifetime and performance of your hardware.
The wear-levelling firmware of the drive will think that every block of the drive is "live" (contains valuable information). This will increase "write amplification".
In any case, if you do do this, be sure to use fstrim afterwards. (I'm not sure that SD cards and USB sticks support trim.
I think the idea was to do it on the image loop back mounted, not on the original device.
What all this discussion has made me realize is that I have to either A) modify the original SD card or B) loop mount and modify the backup. Option A was what I was initially proposing, but forcing a write (zeroing out most of the contents) across the whole card isn't a great idea. Not terrible, but not great. Option B involves modifying the backup, and I think this is a worse idea because 1) a backup should be an accurate recreation of the source, and 2) modifying the backup effectively means you'd be restoring something different, and do I really feel comfortable counting on that? The more I think about it, the less I want to modify either of them. Thanks for clarifying the process: it solidified my opinion on what should be done (although not in the direction I expected). ie. 10G really ain't that big, I'll just keep the backup as is. Thanks everyone. -- Giles https://www.gilesorr.com/ gilesorr@gmail.com

On 2020-05-14 6:20 p.m., Giles Orr via talk wrote:
The more I think about it, the less I want to modify either of them. Thanks for clarifying the process: it solidified my opinion on what should be done (although not in the direction I expected). ie. 10G really ain't that big, I'll just keep the backup as is. Thanks everyone.
Operate / experiment on a copy of your backup! I do this sort of thing a fair bit (as originally described) when backing up the state of yet another raspberry pi experiment :-)
participants (5)
-
D. Hugh Redelmeier
-
El Fontanero
-
Giles Orr
-
lsorense@csclub.uwaterloo.ca
-
William Park