GNU tar incremental backup bug?
I've had quite a struggle with tar incremental backups recently. Feels like a bug, even if only in the documentation. Curious to hear the wisdom of this crowd. Synopsis: I want to make an archive. Then, incrementally archive some time later. Extracting the archives should leave destination with results that match source (at time of last archive), including the *absence* of files captured in earlier archives then *deleted* from source file system prior to subsequent archive. This would not work as I anticipated, nor as the docs indicated (by my reading), nor as ChatGPT 4o-mini or Claude Haiku 3.5 all agreed should work. The key is using --listed-incremental=snapshot.file: # touch file1 file2 # tar --listed-incremental=snapshot.file 1.tar file? file1 file2 # rm file2 # touch file3 # tar --listed-incremental=snapshot.file 2.tar file? file3 # mkdir untar # cd untar # tar --listed-incremental=../snapshot.file -xvvf ../1.tar file1 file2 # tar --listed-incremental=../snapshot.file -xvvf ../2.tar file3 # ls -l file1 file2 file3 ^-- file2 should *not* exist at this point, per docs: https://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html
The option ‘--listed-incremental’ instructs tar to operate on an incremental archive with additional metadata stored in a standalone file, called a snapshot file. The purpose of this file is to help determine which files have been changed, added or *deleted* since the last backup
Turns out, it will *only* work if the source to tar does not contain a file glob, like `file?`. The snapshot file will contain no info about what was archived and tar won't therefore know to remove file2. Does this make sense, or sound like a bug? Turns out, it can be made to work: Extracting ../2.tar with --incremental-listed: drwxr-xr-x root/root 15 2025-10-26 21:48 ./source/ tar: Deleting ‘./source/file2’ -rw-r--r-- root/root 0 2025-10-26 21:48 ./source/file3 If I run `tar ... file*` then no incremental archive. If I run `tar ... $folder_name` then everything works as desired. Very weird, counter-intuitive, and not documented (IMHO). What do you think?
On 10/27/25 12:54 AM, Ron via Talk wrote: [snip]
https://www.gnu.org/software/tar/manual/html_node/Incremental-Dumps.html
The option ‘--listed-incremental’ instructs tar to operate on an incremental archive with additional metadata stored in a standalone file, called a snapshot file. The purpose of this file is to help determine which files have been changed, added or *deleted* since the last backup
Interesting feature. I did not know about this.
Turns out, it will *only* work if the source to tar does not contain a file glob, like `file?`.
The snapshot file will contain no info about what was archived and tar won't therefore know to remove file2.
Does this make sense, or sound like a bug?
Turns out, it can be made to work:
Extracting ../2.tar with --incremental-listed: drwxr-xr-x root/root 15 2025-10-26 21:48 ./source/ tar: Deleting ‘./source/file2’ -rw-r--r-- root/root 0 2025-10-26 21:48 ./source/file3
If I run `tar ... file*` then no incremental archive.
When tar is actually invoked at the system call level the result is not "tar","file*" It is more like "tar","file1","file2","file3","filebob"... If you delete a file. Lets say file3 your next invocation of tar will be "tar","file1","file2","filebob"... If you backup a directory your invocation to tar looks like "tar","/etc/dirtobackup" You next invocation of tar will look exactly the saem even though you may have removed /etc/dirtobackup/file3 At that point if tar keeps track of the directory information in /etc/dirtobackup it can use that to determine what files have been deleted.
If I run `tar ... $folder_name` then everything works as desired.
Very weird, counter-intuitive, and not documented (IMHO).
What do you think?
Well it is arguably working exactly as documented so long as you are also reading the shell and system call documents. I get your frustration and the tar documentation might be a bit more user friendly if it said you should only backup directories to use the incremental feature. But I would say it works correctly. As a general rule a program that provides synchronization like adds,changes and deletions work on the level of directories or filesystems and not in discrete file lists. The trouble with adding features to a program is that each new feature adds more corners for the cases to get lost in. -- Alvin Starr || land: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||
Alvin Starr via Talk wrote on 2025-10-27 07:23:
When tar is actually invoked at the system call level the result is not "tar","file*" It is more like "tar","file1","file2","file3","filebob"... If you delete a file. Lets say file3 your next invocation of tar will be "tar","file1","file2","filebob"...
Right, glob expansion happens in shell before invoking program.
If you backup a directory your invocation to tar looks like "tar","/ etc/dirtobackup" You next invocation of tar will look exactly the saem even though you may have removed /etc/dirtobackup/file3 At that point if tar keeps track of the directory information in /etc/ dirtobackup it can use that to determine what files have been deleted. This part took me far too long to "get it".
You led me right to the answer, and I still didn't see the implications. Restating your point: When passing files instead of directories, tar will not know if any glob was entered, nor what the values of those globs were. So, it sees file1, file2. Then it sees file1 file3. `tar` has no idea whether file2 was removed or just did not match any glob pattern, or was intentionally omitted. Thus, it'd be impossible to know whether to delete it from the target, and the behaviour is entirely expected. Damn, you led me right up to the water and I stared at it thinking, "I'm thirsty, got anything to drink" for quite some time. Duh. Probably should be stated in the docs for incremental backups the implications of passing directories vs files though. I'm calling it a mild bug in the docs, and may file an issue. Possibly `tar` itself could issue a warning when --incremental and parameters are files, not folders. Thanks Alvin! I got there eventually.
From: Ron via Talk <talk@lists.gtalug.org>
Right, glob expansion happens in shell before invoking program.
Furthermore, I would hope that tar doesn't do globbing too, so quoting the argument (to prevent glob expansion by the shell) should not work. If you listed a pathname of a deleted file, would tar notice and treat it as deleted? I would hope so. If something does globbing, it must have a quoting notation to prevent globbing. Double globbing would require double quoting, something that is almost impossible to get right. I infer scp has this problem on the far side but I try to avoid provoking it. Double globbing happens in a few places. For example, in find(1)'s --name. The simple rule is to always quote the argument to --name. If you don't it may work or may produce confusing results.
D. Hugh Redelmeier via Talk wrote on 2025-10-28 19:02:
If you listed a pathname of a deleted file, would tar notice and treat it as deleted? Good question.
------------------------------------------------------ Creating 2.tar WITH "--listed-incremental=file.list" tar: ./source/file2: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors ------------------------------------------------------ So, tar doesn't mark file2 as deleted in the archive, it exits.
Furthermore, I would hope that tar doesn't do globbing too, so quoting the argument (to prevent glob expansion by the shell) should not work. Another good question.
First, passing the source folder as "./source/*" (minus quotes) makes the incremental restore *fail*, as file2 exists when done. Only file3 gets added to archive 2.tar, yet the extraction is broken, and the --listed-incremental=file.list has invalid contents. This is ... interesting. Another case of tar receiving a list of files instead of a folder. And, no, `tar` does not do its own globbing: ------------------------------------------------------ Creating 2.tar WITH "--listed-incremental=file.list" tar ${incr_type} -cvvf ${tarball_2} "${tar_source}*" tar: Cowardly refusing to create an empty archive Try 'tar --help' or 'tar --usage' for more information. ------------------------------------------------------ I'm still pondering if `tar` should be breaking the file.list contents when it receives parameters that are files, not folders. Seems it should throw up a warning, *at least*. Incremental archives when parameters are files, not directories, can lead to unexpected results!
From: Ron via Talk <talk@lists.gtalug.org> To: talk@lists.gtalug.org Cc: Ron <ron@bclug.ca> Date: Tue, 28 Oct 2025 19:21:02 -0700 Subject: [GTALUG] Re: GNU tar incremental backup bug?
D. Hugh Redelmeier via Talk wrote on 2025-10-28 19:02:
If you listed a pathname of a deleted file, would tar notice and treat it as deleted? Good question.
------------------------------------------------------ Creating 2.tar WITH "--listed-incremental=file.list" tar: ./source/file2: Cannot stat: No such file or directory tar: Exiting with failure status due to previous errors ------------------------------------------------------
So, tar doesn't mark file2 as deleted in the archive, it exits.
OK, I can create a matching mental model.
Furthermore, I would hope that tar doesn't do globbing too, so quoting the argument (to prevent glob expansion by the shell) should not work. Another good question.
First, passing the source folder as "./source/*" (minus quotes) makes the incremental restore *fail*, as file2 exists when done.
Right. Isn't that how we started? ./source/* (without quoting) isn't a folder, it is a list of files. Except if ./source has no files -- then you will be passing './source/*' which won't work because it is a filename for which there is no file.
And, no, `tar` does not do its own globbing:
A makes sense. Except GNU utilities sometimes do have weird extensions.
------------------------------------------------------ Creating 2.tar WITH "--listed-incremental=file.list" tar ${incr_type} -cvvf ${tarball_2} "${tar_source}*" tar: Cowardly refusing to create an empty archive Try 'tar --help' or 'tar --usage' for more information. ------------------------------------------------------
Interesting message. I don't know/remember what strings are in those variables.
I'm still pondering if `tar` should be breaking the file.list contents when it receives parameters that are files, not folders.
What do you mean by "breaking"?
Incremental archives when parameters are files, not directories, can lead to unexpected results!
I would hope that they work according to a reasonable model. ================ I glanced at tar(1). It seems that the idea of "levels" is only partially supported. To create incremental archives of non-zero level N, you need a copy of the snapshot file created for level N-1, and use it as FILE To create a level 0 backup, use FILE to reference a non-existent file. Or use --level=0, in which case the FILE will truncate FILE. Note: the word "truncate" doesn't nail down what will happen. Perhaps the file is truncated to 0 length. The level parameter is kind of silly since any value other than 0 seems to make it a no-op. How is --level=0 better than >FILE or rm -f FILE before the tar command?
participants (3)
-
Alvin Starr -
D. Hugh Redelmeier -
Ron