On the subject of backups.

I am hoping someone has seen this kind of problem before and knows of a solution. I have a client who has file systems filled with lots of small files on the orders of hundreds of millions of files. Running something like a find on filesystem takes the better part of a week so any kind of directory walking backup tool will take even longer to run. The actual data-size for 100M files is on the order of 15TB so there is a lot of data to backup but the data only increases on the order of tens to hundreds of MB a day. Even things like xfsdump take a long time. For example I tried xfsdump on a 50M file set and it took over 2 days to complete. The only thing that seems to be workable is Veeam. It will run an incremental volume snapshot in a few hours a night but I dislike adding proprietary kernel modules into the systems. -- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

Hi Alvin, On a 2TB dataset, with +-600k files, I have piped tree to less with limited joy, it took a few hours and at least I could search for what I was looking for... - 15TB and 100M is another animal though and as disk i/o will be your bottleneck, anything will take long, no? now, for my own info/interest, can you tell me which fs is used for this ext3? On Mon, 4 May 2020 09:55:51 -0400 Alvin Starr via talk <talk@gtalug.org> wrote:
I am hoping someone has seen this kind of problem before and knows of a solution. I have a client who has file systems filled with lots of small files on the orders of hundreds of millions of files. Running something like a find on filesystem takes the better part of a week so any kind of directory walking backup tool will take even longer to run. The actual data-size for 100M files is on the order of 15TB so there is a lot of data to backup but the data only increases on the order of tens to hundreds of MB a day.
Even things like xfsdump take a long time. For example I tried xfsdump on a 50M file set and it took over 2 days to complete.
The only thing that seems to be workable is Veeam. It will run an incremental volume snapshot in a few hours a night but I dislike adding proprietary kernel modules into the systems.

I bet no one would want this advice, but it seems to me that the implementation needs to change i.e. that one big (possibly shallow) filesystem on xfs is unworkable. The best answer of course depends on the value of the data. One obvious approach is to use a filesystem/NAS with off-site replication. Typically a commerical product. At relatively modest cost, I like the truenas systems from ixsystems.com. ZFS based, HA versions available, replication can be done. The HA versions are two servers in one chassis, with dual-ported SAS disks. For do-it-yourselfers, I like using ZFS and ZFS replication of snapshots. For example, I do much (much) smaller offsites from my home to work using ZFS and zfs-replicate. You can also do freenas (non-commercial truenas) but without the HA hardware and code. Hope that helps - cheers John On Mon, 2020/05/04 09:55:51AM -0400, Alvin Starr via talk <talk@gtalug.org> wrote: | The actual data-size for 100M files is on the order of 15TB so there is a | lot of data to backup but the data only increases on the order of tens to | hundreds of MB a day.

Sadly this one is a bit of a non-starter. The client really only wants to use Centos/RHEL and ZFS is not part of that mix at the moment. The data is actually sitting on a replicated Gluster cluster so trying to replace that with an HA NAS would start to get expensive if it were a commercial product. On 5/4/20 11:27 AM, John Sellens wrote:
I bet no one would want this advice, but it seems to me that the implementation needs to change i.e. that one big (possibly shallow) filesystem on xfs is unworkable.
The best answer of course depends on the value of the data.
One obvious approach is to use a filesystem/NAS with off-site replication. Typically a commerical product.
At relatively modest cost, I like the truenas systems from ixsystems.com. ZFS based, HA versions available, replication can be done. The HA versions are two servers in one chassis, with dual-ported SAS disks.
For do-it-yourselfers, I like using ZFS and ZFS replication of snapshots. For example, I do much (much) smaller offsites from my home to work using ZFS and zfs-replicate.
You can also do freenas (non-commercial truenas) but without the HA hardware and code.
Hope that helps - cheers
John
On Mon, 2020/05/04 09:55:51AM -0400, Alvin Starr via talk <talk@gtalug.org> wrote: | The actual data-size for 100M files is on the order of 15TB so there is a | lot of data to backup but the data only increases on the order of tens to | hundreds of MB a day.
-- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

On Mon, 2020/05/04 12:03:19PM -0400, Alvin Starr <alvin@netvel.net> wrote: | The client really only wants to use Centos/RHEL and ZFS is not part of that | mix at the moment. Well, one could argue that zfs on centos is fairly well supported ... | The data is actually sitting on a replicated Gluster cluster so trying to | replace that with an HA NAS would start to get expensive if it were a | commercial product. Of course "expensive" depends on the client. An HA truenas that size, all flash is (I believe likely well) under $15K USD. Ah - you didn't mention Gluster. In theory, Gluster has geographic replication. And if your bricks are on LVM storage, you can use gluster snapshots as well: https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Snapshot... to guard against accidental removals, etc. (I've not used either, and my glusters are quite old versions at the present time.) Depending on how it's all configured, you may get better performance backing up the bricks, rather than backing up gluster itself. I have a two-node gluster, mirrored, so I can backup the bricks on one of the servers and get everything. Obviously that's a very simple "cluster". Traditionally, gluster filesystem performance with large numbers of small files in a directory is horrible/pathetic. If you're backing up the gluster filesystem, you would almost certainly get better performance if your file structure is deeper and narrower, if that's possible. Cheers John

On 5/4/20 12:52 PM, John Sellens wrote:
On Mon, 2020/05/04 12:03:19PM -0400, Alvin Starr <alvin@netvel.net> wrote: | The client really only wants to use Centos/RHEL and ZFS is not part of that | mix at the moment.
Well, one could argue that zfs on centos is fairly well supported ... If it were purely my choice I would agree but the general rule is software that can be gotten from Centos/RH and EPEL where necessary.
| The data is actually sitting on a replicated Gluster cluster so trying to | replace that with an HA NAS would start to get expensive if it were a | commercial product.
Of course "expensive" depends on the client. An HA truenas that size, all flash is (I believe likely well) under $15K USD.
If things moved to a commercial NAS it would likely be something like Equalogic or Netapp and in that world if you have to ask the price you cannot afford it.
Ah - you didn't mention Gluster.
In theory, Gluster has geographic replication.
It does have replication but replication is not backup. a little oops like "rm -rf * somefile" will make for a very bad day if you don't catch it before it gets replicated.
And if your bricks are on LVM storage, you can use gluster snapshots as well: https://docs.gluster.org/en/latest/Administrator%20Guide/Managing%20Snapshot... to guard against accidental removals, etc.
Gluster snapshots require thinly provisioned LVM. Which is doable but will require rebuilding the systems with a different LVM config.
(I've not used either, and my glusters are quite old versions at the present time.)
Depending on how it's all configured, you may get better performance backing up the bricks, rather than backing up gluster itself. I have a two-node gluster, mirrored, so I can backup the bricks on one of the servers and get everything. Obviously that's a very simple "cluster".
Traditionally, gluster filesystem performance with large numbers of small files in a directory is horrible/pathetic. If you're backing up the gluster filesystem, you would almost certainly get better performance if your file structure is deeper and narrower, if that's possible.
We are actually backing up the bricks but the they are BIG bricks. Yes gluster has a high per file open cost but in this application that is not an issue during operation. A backup using a gluster mount would move from the world of days to weeks because of the synchronization overhead. Even with snapshots the length of time for a backup can be outrageously long.
Cheers
John
-- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||

On 2020-05-04 09:55, Alvin Starr via talk wrote:
I am hoping someone has seen this kind of problem before and knows of a solution. I have a client who has file systems filled with lots of small files on the orders of hundreds of millions of files. Running something like a find on filesystem takes the better part of a week so any kind of directory walking backup tool will take even longer to run. The actual data-size for 100M files is on the order of 15TB so there is a lot of data to backup but the data only increases on the order of tens to hundreds of MB a day.
Even things like xfsdump take a long time. For example I tried xfsdump on a 50M file set and it took over 2 days to complete.
The only thing that seems to be workable is Veeam. It will run an incremental volume snapshot in a few hours a night but I dislike adding proprietary kernel modules into the systems.
If you have a list of inodes on the filesystem you can use xfs_db directly: xfs_db> inode 128 xfs_db> blockget -n xfs_db> ncheck 131 dir/. 132 dir/test2/foo/. 133 dir/test2/foo/bar 65664 dir/test1/. 65665 dir/test1/foo 65666 dir/test3/foo/. 142144 dir/test2/. 142145 dir/test3/foo/bar/. 142146 dir/test3/foo/bar/baz 196736 dir/test3/. I don't know how that will perform relative to something like find though. Cheers, Jamon

I haven't used Gluster personally, but have you tried turning performance.parallel-readdir on? https://docs.gluster.org/en/latest/release-notes/3.10.0/#implemented-paralle... It seems there's a reason why it's on by default ( https://www.spinics.net/lists/gluster-devel/msg25518.html) but maybe it'd still be worth it for you? On Mon, May 4, 2020 at 9:55 AM Alvin Starr via talk <talk@gtalug.org> wrote:
I am hoping someone has seen this kind of problem before and knows of a solution. I have a client who has file systems filled with lots of small files on the orders of hundreds of millions of files. Running something like a find on filesystem takes the better part of a week so any kind of directory walking backup tool will take even longer to run. The actual data-size for 100M files is on the order of 15TB so there is a lot of data to backup but the data only increases on the order of tens to hundreds of MB a day.
Even things like xfsdump take a long time. For example I tried xfsdump on a 50M file set and it took over 2 days to complete.
The only thing that seems to be workable is Veeam. It will run an incremental volume snapshot in a few hours a night but I dislike adding proprietary kernel modules into the systems.
-- Alvin Starr || land: (647)478-6285 Netvel Inc. || Cell: (416)806-0133 alvin@netvel.net ||
--- Post to this mailing list talk@gtalug.org Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk
participants (5)
-
ac
-
Alvin Starr
-
Greg Martyn
-
Jamon Camisso
-
John Sellens