Watching a network folder: is there a smart way of doing this?

I need to watch a folder on a network share (a scanner) and see when new files are created. There are a couple of special things about this location: * It's over CIFS. There's nothing I can do about that. I think that means I can't use Inotify. * It doesn't have an accurate clock for timestamping. There's nothing I can do about that, either. * If the share is being written to by the device, it effectively disappears: stat() complains loudly. * It reuses the lowest available file name of the form “EPSON%03d.(PDF|JPG)”, so synching the share to a sensible filesystem might overwrite files accidentally. * I don't need to watch it very often; a few times per hour would be fine. I tried using Perl's File::ChangeNotify::Watcher, which is supposed to pick the smartest method available. Unfortunately, this seems to be calling stat() every two seconds, resulting in: 1. Messages (one per existing file on the share) saying that the scanner has deleted them. This happens when someone starts scanning 2. An error message from stat() every two seconds while the device is scanning 3. Messages saying that all of the old files (plus the new one) have been created. I was naïvely hoping for a mechanism that would report “Hey, you have a new file called $filename” but that seems to have been a pipedream. What do People Who Actually Know What They're Doing use, please? cheers, Stewart

On 17 February 2017 at 15:41, Stewart C. Russell via talk <talk@gtalug.org> wrote:
I need to watch a folder on a network share (a scanner) and see when new files are created. There are a couple of special things about this location:
* It's over CIFS. There's nothing I can do about that. I think that means I can't use Inotify.
* It doesn't have an accurate clock for timestamping. There's nothing I can do about that, either.
* If the share is being written to by the device, it effectively disappears: stat() complains loudly.
* It reuses the lowest available file name of the form “EPSON%03d.(PDF|JPG)”, so synching the share to a sensible filesystem might overwrite files accidentally.
* I don't need to watch it very often; a few times per hour would be fine.
I tried using Perl's File::ChangeNotify::Watcher, which is supposed to pick the smartest method available. Unfortunately, this seems to be calling stat() every two seconds, resulting in:
1. Messages (one per existing file on the share) saying that the scanner has deleted them. This happens when someone starts scanning
2. An error message from stat() every two seconds while the device is scanning
3. Messages saying that all of the old files (plus the new one) have been created.
I was naïvely hoping for a mechanism that would report “Hey, you have a new file called $filename” but that seems to have been a pipedream. What do People Who Actually Know What They're Doing use, please?
How's your coding? I'm visualizing this in Bash on a cron job. This is purely theory, and I apologize if it's not useful. It seems like it would work if I've understood the circumstances correctly ... - every fifteen minutes get a checksum on all the remote files - get checksums on the files in the local copy of the folder - if a checksum exists on the remote but not the local ('uniq -c ...' may help), copy the file and datestamp it (ie. 'cp -vi //CIFS/a.tif /localcopy/a.2017-02-17.1610.tif'). - hmm - might work to embed the checksum in the local filename for ease of access ... There's a bit of juggling to be done on comparing checksums, but it shouldn't be too hard. Would this work? -- Giles http://www.gilesorr.com/ gilesorr@gmail.com

Are you copying files over, or are you moving them? 1. Moving them is easiest. Just move whatever you find, and use a sensible target filename. 2. Copying them is harder. Filename, timestamp, size, md5sum can be used to decide if you already have them. -- William On Fri, Feb 17, 2017 at 03:41:39PM -0500, Stewart C. Russell via talk wrote:
I need to watch a folder on a network share (a scanner) and see when new files are created. There are a couple of special things about this location:
* It's over CIFS. There's nothing I can do about that. I think that means I can't use Inotify.
* It doesn't have an accurate clock for timestamping. There's nothing I can do about that, either.
* If the share is being written to by the device, it effectively disappears: stat() complains loudly.
* It reuses the lowest available file name of the form ???EPSON%03d.(PDF|JPG)???, so synching the share to a sensible filesystem might overwrite files accidentally.
* I don't need to watch it very often; a few times per hour would be fine.
I tried using Perl's File::ChangeNotify::Watcher, which is supposed to pick the smartest method available. Unfortunately, this seems to be calling stat() every two seconds, resulting in:
1. Messages (one per existing file on the share) saying that the scanner has deleted them. This happens when someone starts scanning
2. An error message from stat() every two seconds while the device is scanning
3. Messages saying that all of the old files (plus the new one) have been created.
I was naïvely hoping for a mechanism that would report ???Hey, you have a new file called $filename??? but that seems to have been a pipedream. What do People Who Actually Know What They're Doing use, please?
cheers,
Stewart
--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

On Feb 17, 2017 3:41 PM, "Stewart C. Russell via talk" <talk@gtalug.org> wrote: I need to watch a folder on a network share (a scanner) and see when new files are created. There are a couple of special things about this location: <snip> I was naïvely hoping for a mechanism that would report “Hey, you have a new file called $filename” but that seems to have been a pipedream. What do People Who Actually Know What They're Doing use, please? If all you need is a count, maybe you could tee the scanner output and drop one copy in the bit bucket. At the same time you grep ps's output for the process writing to /dev/null. Snarf the PID to a count file, one PID to a line. You don't get a timestamp but you do get unique ID and you can keep a running total. Russell Sent from semi-resurrected Moto XT1060 Almost ready to mod. cheers, Stewart --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

On 2017-02-19 05:10 PM, Russell Reiter wrote:
If all you need is a count, maybe you could tee the scanner output and drop one copy in the bit bucket.
Maybe I should have explained better: the scanner only scans to this CIFS share. I can't do ps or tee as my computer has no control over the scanner: all I see is a filesystem on another machine, or - if I'm in process of scanning - no filesystem. cheers, Stewart

Maybe I should have explained better: the scanner only scans to this CIFS share. I can't do ps or tee as my computer has no control over the scanner: all I see is a filesystem on another machine, or - if I'm in process of scanning - no filesystem.
Hi Stewart, you could just watch the file listing (adjusting n seconds to whatever is suitable) watch --differences -n 10 ls -l </path/to/shared/dir> I have no idea what may or may not happen when your actually scanning since you say the file system is no longer visible, but give this a shot and see. Good luck. Aruna

On 2017-02-20 12:47 AM, Aruna Hewapathirane wrote:
Hi Stewart, you could just |watch| the file listing (adjusting n seconds to whatever is suitable)
|watch --differences -n 10 ls -l </path/to/shared/dir>|
I hadn't heard of watch before, so thanks! watch *started* to work really well, but then went into a terminal sulk after the FS disappeared during a scan, and refused to show any updates. It's also an interactive program, so doesn't pipe or notify changes in any useful way. I suspect I'll just have to go with William Park's suggestion of using rsync to a local folder that I have more control over. I still have to correct for the scanner FS's wandering clock, but that's less important. cheers, Stewart

On Fri, Feb 24, 2017 at 9:24 PM, Stewart C. Russell via talk <talk@gtalug.org> wrote:
On 2017-02-20 12:47 AM, Aruna Hewapathirane wrote:
Hi Stewart, you could just |watch| the file listing (adjusting n seconds to whatever is suitable)
|watch --differences -n 10 ls -l </path/to/shared/dir>|
I hadn't heard of watch before, so thanks! watch *started* to work really well, but then went into a terminal sulk after the FS disappeared during a scan, and refused to show any updates. It's also an interactive program, so doesn't pipe or notify changes in any useful way.
I suspect I'll just have to go with William Park's suggestion of using rsync to a local folder that I have more control over. I still have to correct for the scanner FS's wandering clock, but that's less important.
Here's a quick and dirty bash script similar to the watch command; adjust to taste. :-) [- watchdir.sh snippet starts -] #!/usr/bin/env bash # # USAGE # watchdir.sh <rate> <path> # # Determine our sampling rate WATCHRATE=${1} if [[ "" = "${WATCHRATE}" ]]; then WATCHRATE=10 fi # Determine our watch directory WATCHDIR=${2} if [[ "" = "${WATCHDIR}" ]]; then WATCHDIR="." fi # Describe what we're doing echo "Watching ${WATCHDIR} every ${WATCHRATE}s:" # Render a side-by-side comparison with last snapshot of our directory # note: assumes at least one snapshot already exists function sample_watchdir { cp .current .previous date > .current ls -l ${WATCHDIR} >> .current echo "" diff -y .current .previous } # Begin watching date > .current ls -l ${WATCHDIR} >> .current sample_watchdir # Every WATCHRATE seconds, make the current file listing our previous # listing, take a fresh look at WATCHDIR and compare the changes while sleep ${WATCHRATE}; do sample_watchdir done [- watchdir.sh snippet ends -] -- Scott Elcomb @psema4 http://www.pirateparty.ca/

On 24/02/17 11:21 PM, Scott Elcomb via talk wrote:
On Fri, Feb 24, 2017 at 9:24 PM, Stewart C. Russell via talk <talk@gtalug.org> wrote:
Hi Stewart, you could just |watch| the file listing (adjusting n seconds to whatever is suitable)
|watch --differences -n 10 ls -l </path/to/shared/dir>| I hadn't heard of watch before, so thanks! watch *started* to work really well, but then went into a terminal sulk after the FS disappeared during a scan, and refused to show any updates. It's also an interactive
On 2017-02-20 12:47 AM, Aruna Hewapathirane wrote: program, so doesn't pipe or notify changes in any useful way.
I suspect I'll just have to go with William Park's suggestion of using rsync to a local folder that I have more control over. I still have to correct for the scanner FS's wandering clock, but that's less important. Here's a quick and dirty bash script similar to the watch command; adjust to taste. :-)
[- watchdir.sh snippet starts -] [- watchdir.sh snippet ends -]
I have a similar function named "waitfor", as it's typically used to wait for a file to stop changing, like $ do_something >foo & $ waitfor foo && mailx -s "do_something completed" davecb <foo -- David Collier-Brown, | Always do right. This will gratify System Programmer and Author | some people and astonish the rest davecb@spamcop.net | -- Mark Twain

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Scott: Why do this: #!/usr/bin/env bash and not the usual #! /bin/bash ??? - --Bob. On 2017-02-24 11:21 PM, Scott Elcomb via talk wrote:
On Fri, Feb 24, 2017 at 9:24 PM, Stewart C. Russell via talk <talk@gtalug.org> wrote:
On 2017-02-20 12:47 AM, Aruna Hewapathirane wrote:
Hi Stewart, you could just |watch| the file listing (adjusting n seconds to whatever is suitable)
|watch --differences -n 10 ls -l </path/to/shared/dir>|
I hadn't heard of watch before, so thanks! watch *started* to work really well, but then went into a terminal sulk after the FS disappeared during a scan, and refused to show any updates. It's also an interactive program, so doesn't pipe or notify changes in any useful way.
I suspect I'll just have to go with William Park's suggestion of using rsync to a local folder that I have more control over. I still have to correct for the scanner FS's wandering clock, but that's less important.
Here's a quick and dirty bash script similar to the watch command; adjust to taste. :-)
[- watchdir.sh snippet starts -]
#!/usr/bin/env bash # # USAGE # watchdir.sh <rate> <path> #
# Determine our sampling rate WATCHRATE=${1}
if [[ "" = "${WATCHRATE}" ]]; then WATCHRATE=10 fi
# Determine our watch directory WATCHDIR=${2}
if [[ "" = "${WATCHDIR}" ]]; then WATCHDIR="." fi
# Describe what we're doing echo "Watching ${WATCHDIR} every ${WATCHRATE}s:"
# Render a side-by-side comparison with last snapshot of our directory # note: assumes at least one snapshot already exists function sample_watchdir { cp .current .previous date > .current ls -l ${WATCHDIR} >> .current echo "" diff -y .current .previous }
# Begin watching date > .current ls -l ${WATCHDIR} >> .current sample_watchdir
# Every WATCHRATE seconds, make the current file listing our previous # listing, take a fresh look at WATCHDIR and compare the changes while sleep ${WATCHRATE}; do sample_watchdir done
[- watchdir.sh snippet ends -]
- -- - -- Bob Jonkman <bjonkman@sobac.com> Phone: +1-519-635-9413 SOBAC Microcomputer Services http://sobac.com/sobac/ Software --- Office & Business Automation --- Consulting GnuPG Fngrprnt:04F7 742B 8F54 C40A E115 26C2 B912 89B0 D2CC E5EA -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Ensure confidentiality, authenticity, non-repudiability iEYEARECAAYFAli2j7UACgkQuRKJsNLM5eobKQCfUCOoqN2ncebZcc7mBA2BIYl1 pXIAoI4Ce2/FSphaU28ADlvybH1rbyQE =MXyH -----END PGP SIGNATURE-----

If you know /bin/bash is the right location, then use /bin/bash. If not, let 'env' find it. -- William On Wed, Mar 01, 2017 at 04:09:11AM -0500, Bob Jonkman via talk wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Scott: Why do this:
#!/usr/bin/env bash
and not the usual
#! /bin/bash
???
- --Bob.
On 2017-02-24 11:21 PM, Scott Elcomb via talk wrote:
On Fri, Feb 24, 2017 at 9:24 PM, Stewart C. Russell via talk <talk@gtalug.org> wrote:
On 2017-02-20 12:47 AM, Aruna Hewapathirane wrote:
Hi Stewart, you could just |watch| the file listing (adjusting n seconds to whatever is suitable)
|watch --differences -n 10 ls -l </path/to/shared/dir>|
I hadn't heard of watch before, so thanks! watch *started* to work really well, but then went into a terminal sulk after the FS disappeared during a scan, and refused to show any updates. It's also an interactive program, so doesn't pipe or notify changes in any useful way.
I suspect I'll just have to go with William Park's suggestion of using rsync to a local folder that I have more control over. I still have to correct for the scanner FS's wandering clock, but that's less important.
Here's a quick and dirty bash script similar to the watch command; adjust to taste. :-)
[- watchdir.sh snippet starts -]
#!/usr/bin/env bash # # USAGE # watchdir.sh <rate> <path> #
# Determine our sampling rate WATCHRATE=${1}
if [[ "" = "${WATCHRATE}" ]]; then WATCHRATE=10 fi
# Determine our watch directory WATCHDIR=${2}
if [[ "" = "${WATCHDIR}" ]]; then WATCHDIR="." fi
# Describe what we're doing echo "Watching ${WATCHDIR} every ${WATCHRATE}s:"
# Render a side-by-side comparison with last snapshot of our directory # note: assumes at least one snapshot already exists function sample_watchdir { cp .current .previous date > .current ls -l ${WATCHDIR} >> .current echo "" diff -y .current .previous }
# Begin watching date > .current ls -l ${WATCHDIR} >> .current sample_watchdir
# Every WATCHRATE seconds, make the current file listing our previous # listing, take a fresh look at WATCHDIR and compare the changes while sleep ${WATCHRATE}; do sample_watchdir done
[- watchdir.sh snippet ends -]
- --
- -- Bob Jonkman <bjonkman@sobac.com> Phone: +1-519-635-9413 SOBAC Microcomputer Services http://sobac.com/sobac/ Software --- Office & Business Automation --- Consulting GnuPG Fngrprnt:04F7 742B 8F54 C40A E115 26C2 B912 89B0 D2CC E5EA
-----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Ensure confidentiality, authenticity, non-repudiability
iEYEARECAAYFAli2j7UACgkQuRKJsNLM5eobKQCfUCOoqN2ncebZcc7mBA2BIYl1 pXIAoI4Ce2/FSphaU28ADlvybH1rbyQE =MXyH -----END PGP SIGNATURE----- --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

On Wed, Mar 01, 2017 at 04:09:11AM -0500, Bob Jonkman via talk wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Hi Scott: Why do this:
#!/usr/bin/env bash
and not the usual
#! /bin/bash
???
On Wed, Mar 1, 2017 at 5:41 AM, William Park via talk <talk@gtalug.org> wrote:
If you know /bin/bash is the right location, then use /bin/bash. If not, let 'env' find it.
Basically this; I've been bitten a couple times with a missing /bin/bash (though never /bin/sh) Picked the trick up a few years ago (not sure where) and never looked back. :-) -- Scott Elcomb @psema4 http://www.pirateparty.ca/

It's also handy for your Perls & Pythons if you're running perlbrew or that weird python overlay thing equivalent. Stewart

On 01/03/17 20:53, Scott Elcomb via talk wrote:
On Wed, Mar 1, 2017 at 5:41 AM, William Park via talk <talk@gtalug.org> wrote:
If you know /bin/bash is the right location, then use /bin/bash. If not, let 'env' find it.
Basically this; I've been bitten a couple times with a missing /bin/bash (though never /bin/sh)
Picked the trick up a few years ago (not sure where) and never looked back. :-)
Question: if /bin/bash doesn't exist, but it is defined via an env variable, what kind of system sets things up such that /bin/bash doesn't exist? I use /usr/bin/env for most things, but not for bash. Just curious/looking for a compelling reason to adopt it in future scripts. Cheers, Jamon

On Thu, Mar 2, 2017 at 8:39 AM, Jamon Camisso via talk <talk@gtalug.org> wrote:
On 01/03/17 20:53, Scott Elcomb via talk wrote:
On Wed, Mar 1, 2017 at 5:41 AM, William Park via talk <talk@gtalug.org> wrote:
If you know /bin/bash is the right location, then use /bin/bash. If not, let 'env' find it.
Basically this; I've been bitten a couple times with a missing /bin/bash (though never /bin/sh)
Picked the trick up a few years ago (not sure where) and never looked back. :-)
Question: if /bin/bash doesn't exist, but it is defined via an env variable, what kind of system sets things up such that /bin/bash doesn't exist?
I use /usr/bin/env for most things, but not for bash. Just curious/looking for a compelling reason to adopt it in future scripts.
It may have been a Cygwin instance, but tbh I just don't recall in which environment I first encountered the issue. In my day-to-day tasks, I spend lots of time in mixed environments, managing systems with components that run across Linux, OSX and Windows. Additionally, there's a fair amount of time playing & learning in other operating systems (BSD's, Plan 9, hobbyist OS's, etc) as well. -- Scott Elcomb @psema4 http://www.pirateparty.ca/

Another important question to ask: what are you doing with what you learned from the test? Nagios? Snmptrap? Zabbix? Grapha? Statd ? Email notification? Smoke signals? I wrote a nagios check once called check_rofs which would write a file read it then delete it and report how that went. Nagios handled telling the right person via the correct means and graphing performance data ( like time to write, time to read, time to unlink.) A system like this has the downside of not checking often enough for some i.e. every 5min by default. I suspect that would be fine for your needs. David David Thornton @northdot9 https://www.quadratic.net On Feb 24, 2017 9:24 PM, "Stewart C. Russell via talk" <talk@gtalug.org> wrote:
On 2017-02-20 12:47 AM, Aruna Hewapathirane wrote:
Hi Stewart, you could just |watch| the file listing (adjusting n seconds to whatever is suitable)
|watch --differences -n 10 ls -l </path/to/shared/dir>|
I hadn't heard of watch before, so thanks! watch *started* to work really well, but then went into a terminal sulk after the FS disappeared during a scan, and refused to show any updates. It's also an interactive program, so doesn't pipe or notify changes in any useful way.
I suspect I'll just have to go with William Park's suggestion of using rsync to a local folder that I have more control over. I still have to correct for the scanner FS's wandering clock, but that's less important.
cheers, Stewart
--- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk

In my search for the perfect (or at least, “working”) changed file protocol, Chris Tyler pointed me towards something that other list users might find useful: incron The cron part of the name is a bit misleading, as it's not triggered periodically, but by inotify events. Chris noted that watching for IN_CLOSE_WRITE events (file open for writing was closed) is a fairly efficient way of tracking new files. Each incron command target gets passed the name of the particular file via an environment variable. I'm having to use rsync to move the files from the scanner CIFS share, as its behaviour is just too random for scripting directly. Stewart (props to Chris for giving a compelling Raspberry Pi meetup talk at the end of a 40+ hour software release run, and sticking around to answer questions coherently when he was clearly craving sleep.)

On Feb 24, 2017 9:24 PM, "Stewart C. Russell via talk" <talk@gtalug.org> wrote: On 2017-02-20 12:47 AM, Aruna Hewapathirane wrote:
Hi Stewart, you could just |watch| the file listing (adjusting n seconds to whatever is suitable)
|watch --differences -n 10 ls -l </path/to/shared/dir>|
<snip> I suspect I'll just have to go with William Park's suggestion of using rsync to a local folder that I have more control over. I still have to correct for the scanner FS's wandering clock, but that's less important. Just a suggstion for the future, in case rsync doesn't work out. It is generally a good idea to abstract a problem against a diagnostic tool. I tend to use the OSI model as it was the one I was first exposed to. It describes system in seven functional layers. So your problem looks like this, in my way of thinking. (In reverse of the typically cited order as this is troubleshooting and not design.) Physical All the hardware bits and their attachments, location to location hops. - scanner is CCD. Data - using block image transfers and also mounted as a CIFS - user read write? or r/o in terminal character mode Network - physical, broadcast, distributed ie where are the possible rx / tx collisions Transport - TCP/IP - IPV* etc -paying attention to points which may transliterate values ie. converting ascii to ebdic or other cartograph. Session - Managed by policy - SElinux? other? Presentation - maybe Server Message Block issues? any windows bugs attached? Application - well you dont have access to job control functions in order to receive reports. Thats a pretty poor implimentation of an application. Typically any member of the scanner group would be forwarded the messages root gets, filtered on a need to know basis and by session constraints. Hope this helps Russell cheers, Stewart --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk
participants (11)
-
Aruna Hewapathirane
-
Bob Jonkman
-
David Collier-Brown
-
David Thornton
-
Giles Orr
-
Jamon Camisso
-
Russell Reiter
-
Scott Elcomb
-
Stewart C. Russell
-
Stewart Russell
-
William Park