
To forestall the inevitable suggestion: no, the solution is not to move to git. At least not yet: for various reasons, it can't happen right now. This is the last holdout, all our other repos are already git. I apologize for the length of this dissertation: I've been doing my homework as fast as I can, and I want to provide complete information. I've just moved a previously semi-local (different building but "on campus") SVN server to a cloud instance (running Debian, SVN 1.9.5, and Apache 2.4.25, access via https://). For the most part it went smoothly, but now our "on campus" Jenkins server intermittently loses the network connection on 'svn up' so the deploy fails. That's bad. What's far worse is that while Jenkins initially resumed these broken updates politely when the deployment was re-run, it's now decided that these resumes have left a locked workspace and it has to do a fresh checkout. One of the problems with moving this repository to git is that SVN trunk is ~5G in size: a checkout to a local client takes 5-10 minutes, but for some reason a Jenkins checkout takes about 30 minutes (this is all pretty new to me, and I haven't had time to investigate that time difference yet). An update takes a 60 seconds, but a new checkout takes 30 minutes - you can see where that causes delays in deployment. One possible mitigation is to trap the SVN failure, do a clean-up on the directory and re-run. I may have to try this, but ... that's just mitigation, not a solution. The Jenkins server is on Windows (I wasn't given a choice) and mostly works well. It uses Cygwin for all the SVN stuff (SVN version 1.11.x). It's also at a different physical location from me with different network rules. The critical lines of the failure error: org.tmatesoft.svn.core.SVNException: svn: E175002: Connection reset svn: E175002: REPORT request failed on '/svn/repo/!svn/vcc/default' (It being Java, the errors run to 40 or 50 lines: I think this is the only part that's important.) Unfortunately, this is one of those errors that Google searches produce lots of questions, lots of speculations ... and no solid answers. At least not that I've found. Likewise, a lot of people want to know, as I did, about the relatively unusual filepath ("!svn/vcc/default") but I've never seen a solid answer as to what that's about either. The logs show Jenkins requests against that filepath with both REPORT and PROPFIND, but Jenkins is only failing on REPORT. Both of these request types are WebDAV extensions. Our staff don't seem to be having any trouble checking out or updating the repository across a mix of Windows and Mac clients. I've so far failed at getting more logging out of SVN and Apache: what I do have doesn't tell me much useful, at least not related to these failures. This problem is intermittent and infrequent. I'm thinking the next step is network sniffing - although I'm hoping someone can suggest something better. I'm relatively inexperienced with Wireshark and tcpdump (and SVN ...), but what experience I do have suggests all I'm going to get is to learn that SVN stopped providing data without finding out why or how to fix it. Any suggestions welcomed, thanks. -- Giles https://www.gilesorr.com/ gilesorr@gmail.com

Silly-assed guess? Maybe the connection is being reset at the firewall level, or by some network nanny/intrusion detection? Maybe it's based on the source IP which has been flagged as bad for some reason? "Intermittant and infrequent" was the phrase that made me wonder. Good luck, William Porquet On Fri, 7 Jun 2019 at 13:14, Giles Orr via talk <talk@gtalug.org> wrote:
To forestall the inevitable suggestion: no, the solution is not to move to git. At least not yet: for various reasons, it can't happen right now. This is the last holdout, all our other repos are already git.
I apologize for the length of this dissertation: I've been doing my homework as fast as I can, and I want to provide complete information.
I've just moved a previously semi-local (different building but "on campus") SVN server to a cloud instance (running Debian, SVN 1.9.5, and Apache 2.4.25, access via https://). For the most part it went smoothly, but now our "on campus" Jenkins server intermittently loses the network connection on 'svn up' so the deploy fails. That's bad. What's far worse is that while Jenkins initially resumed these broken updates politely when the deployment was re-run, it's now decided that these resumes have left a locked workspace and it has to do a fresh checkout. One of the problems with moving this repository to git is that SVN trunk is ~5G in size: a checkout to a local client takes 5-10 minutes, but for some reason a Jenkins checkout takes about 30 minutes (this is all pretty new to me, and I haven't had time to investigate that time difference yet). An update takes a 60 seconds, but a new checkout takes 30 minutes - you can see where that causes delays in deployment.
One possible mitigation is to trap the SVN failure, do a clean-up on the directory and re-run. I may have to try this, but ... that's just mitigation, not a solution.
The Jenkins server is on Windows (I wasn't given a choice) and mostly works well. It uses Cygwin for all the SVN stuff (SVN version 1.11.x). It's also at a different physical location from me with different network rules.
The critical lines of the failure error:
org.tmatesoft.svn.core.SVNException: svn: E175002: Connection reset
svn: E175002: REPORT request failed on '/svn/repo/!svn/vcc/default'
(It being Java, the errors run to 40 or 50 lines: I think this is the only part that's important.) Unfortunately, this is one of those errors that Google searches produce lots of questions, lots of speculations ... and no solid answers. At least not that I've found. Likewise, a lot of people want to know, as I did, about the relatively unusual filepath ("!svn/vcc/default") but I've never seen a solid answer as to what that's about either. The logs show Jenkins requests against that filepath with both REPORT and PROPFIND, but Jenkins is only failing on REPORT. Both of these request types are WebDAV extensions.
Our staff don't seem to be having any trouble checking out or updating the repository across a mix of Windows and Mac clients.
I've so far failed at getting more logging out of SVN and Apache: what I do have doesn't tell me much useful, at least not related to these failures.
This problem is intermittent and infrequent. I'm thinking the next step is network sniffing - although I'm hoping someone can suggest something better. I'm relatively inexperienced with Wireshark and tcpdump (and SVN ...), but what experience I do have suggests all I'm going to get is to learn that SVN stopped providing data without finding out why or how to fix it.
Any suggestions welcomed, thanks.
-- Giles https://www.gilesorr.com/ gilesorr@gmail.com --- Talk Mailing List talk@gtalug.org https://gtalug.org/mailman/listinfo/talk
-- William Porquet, M.A. ⁂ mailto:william@2038.org ⁂ http://www.2038.org/ "I do not fear computers. I fear the lack of them." (Isaac Asimov)

On 6/7/19 1:16 PM, Giles Orr via talk wrote:
To forestall the inevitable suggestion: no, the solution is not to move to git. At least not yet: for various reasons, it can't happen right now. This is the last holdout, all our other repos are already git.
<snip>
I've so far failed at getting more logging out of SVN and Apache: what I do have doesn't tell me much useful, at least not related to these failures.
You could turn on trace logging for mod_dav, and if you are worried about spamming logs, put some conditionals around the jenkins host's IP. e.g. 'LogLevel info dav:trace3' would turn on trace3 level logging for dav and leave everything else at info.
This problem is intermittent and infrequent. I'm thinking the next step is network sniffing - although I'm hoping someone can suggest something better. I'm relatively inexperienced with Wireshark and tcpdump (and SVN ...), but what experience I do have suggests all I'm going to get is to learn that SVN stopped providing data without finding out why or how to fix it.
First thing I'd look at is MTU between Jenkins and the remote server. If there's some route churn you could conceivably end up with different MTUs which can lead to inconsistent fragmentation or timeouts. With a large SVN repo and lots of propfind requests, the overhead of a bad MTU somewhere along the line would be quite noticeable. Try tracepath & tracepath6 to see what things look like between the hosts. Also check to see if there's some mixed IPv4/IPv6 business going on. I doubt it, but I've seen inconsistent behaviour with dual stack applications that aren't explicitly configured to support one or both. Otherwise, to eliminate whether it is SVN on Windows that's the issue, try rsyncing the underlying repository and bypass SVN entirely. Cygwin has SSH & rsync support, so you can do fast differential rsyncs. Then in the jenkins job, specify whatever svn operations you need to unlock and checkout the correct branch & revision. Let us know what you find!

On 2019-06-08 04:26 PM, Jamon Camisso via talk wrote:
First thing I'd look at is MTU between Jenkins and the remote server. If there's some route churn you could conceivably end up with different MTUs which can lead to inconsistent fragmentation or timeouts. With a large SVN repo and lots of propfind requests, the overhead of a bad MTU somewhere along the line would be quite noticeable. Try tracepath & tracepath6 to see what things look like between the hosts.
IP has been designed to work with different size MTU from the beginning. Many years ago, 576 bytes was common on dial up connections. These days 1492 is common for ADSL, as well as the commonly used 1500 on cable modems, etc.. It's even possible to have 9000 bytes on a network and, back when I was at IBM in the late 90s, we used token ring with 4K bytes MTU, IIRC. Routers would fragment the packets to accommodate the changes in MTU along the path and TCP will negotiate the maximum segment size, based on the smallest MTU at each end. These days fragmentation has been largely replaced with path MTU discovery, where a change to a smaller MTU will cause an ICMP message, back to the source, advising of the maximum usable MTU. PMTUD is mandatory on IPv6. Bottom line, fragments are unlikely to be an issue as all modern OSs use PMTUD on TCP and Linux uses it on everything.

On 6/8/19 4:50 PM, James Knott via talk wrote:
Bottom line, fragments are unlikely to be an issue as all modern OSs use PMTUD on TCP and Linux uses it on everything.
True enough, but it is also easy to check and determine whether it is an issue. I get a ticket or two a month with remote employees who are connecting from strange places, or have issues with VPNs, and quite a few are MTU related. I'm curious about PMTUD now: my understanding is that ICMP needs to be unrestricted between server & client. If something is blocking that traffic how does it work? Also how does PTMUD handle asymmetric paths? Cheers, Jamon

On 2019-06-08 05:08 PM, Jamon Camisso via talk wrote:
On 6/8/19 4:50 PM, James Knott via talk wrote:
Bottom line, fragments are unlikely to be an issue as all modern OSs use PMTUD on TCP and Linux uses it on everything. True enough, but it is also easy to check and determine whether it is an issue. I get a ticket or two a month with remote employees who are connecting from strange places, or have issues with VPNs, and quite a few are MTU related.
I'm curious about PMTUD now: my understanding is that ICMP needs to be unrestricted between server & client. If something is blocking that traffic how does it work? Also how does PTMUD handle asymmetric paths?
The ICMP message would be sent to the source, so asymmetric paths would not be an issue. There is also provisions for when ICMP is blocked. Take a look at IPv4 traffic with Wireshark. You'll see the do not fragment flag is set on TCP in Windows and on everything in Linux. This means routers are not supposed to fragment. https://en.wikipedia.org/wiki/Path_MTU_Discovery.

On Sat, Jun 08, 2019 at 05:08:54PM -0400, Jamon Camisso via talk wrote:
True enough, but it is also easy to check and determine whether it is an issue. I get a ticket or two a month with remote employees who are connecting from strange places, or have issues with VPNs, and quite a few are MTU related.
I'm curious about PMTUD now: my understanding is that ICMP needs to be unrestricted between server & client. If something is blocking that traffic how does it work? Also how does PTMUD handle asymmetric paths?
RFC 4890 explicitly says some types of ICMPv6 must not be filtered. They are: - Destination Unreachable (Type 1) - All codes - Packet Too Big (Type 2) - Time Exceeded (Type 3) - Code 0 only - Parameter Problem (Type 4) - Codes 1 and 2 only They also suggest echo request/response should be allowed. Anyone that filters the first ones will break IPv6. Some ISPs are unfortunately that crappy. -- Len Sorensen

On Fri, 7 Jun 2019 at 13:16, Giles Orr <gilesorr@gmail.com> wrote:
To forestall the inevitable suggestion: no, the solution is not to move to git. At least not yet: for various reasons, it can't happen right now. This is the last holdout, all our other repos are already git.
I apologize for the length of this dissertation: I've been doing my homework as fast as I can, and I want to provide complete information.
I've just moved a previously semi-local (different building but "on campus") SVN server to a cloud instance (running Debian, SVN 1.9.5, and Apache 2.4.25, access via https://). For the most part it went smoothly, but now our "on campus" Jenkins server intermittently loses the network connection on 'svn up' so the deploy fails. That's bad. What's far worse is that while Jenkins initially resumed these broken updates politely when the deployment was re-run, it's now decided that these resumes have left a locked workspace and it has to do a fresh checkout. One of the problems with moving this repository to git is that SVN trunk is ~5G in size: a checkout to a local client takes 5-10 minutes, but for some reason a Jenkins checkout takes about 30 minutes (this is all pretty new to me, and I haven't had time to investigate that time difference yet). An update takes a 60 seconds, but a new checkout takes 30 minutes - you can see where that causes delays in deployment.
One possible mitigation is to trap the SVN failure, do a clean-up on the directory and re-run. I may have to try this, but ... that's just mitigation, not a solution.
The Jenkins server is on Windows (I wasn't given a choice) and mostly works well. It uses Cygwin for all the SVN stuff (SVN version 1.11.x). It's also at a different physical location from me with different network rules.
The critical lines of the failure error:
org.tmatesoft.svn.core.SVNException: svn: E175002: Connection reset
svn: E175002: REPORT request failed on '/svn/repo/!svn/vcc/default'
(It being Java, the errors run to 40 or 50 lines: I think this is the only part that's important.) Unfortunately, this is one of those errors that Google searches produce lots of questions, lots of speculations ... and no solid answers. At least not that I've found. Likewise, a lot of people want to know, as I did, about the relatively unusual filepath ("!svn/vcc/default") but I've never seen a solid answer as to what that's about either. The logs show Jenkins requests against that filepath with both REPORT and PROPFIND, but Jenkins is only failing on REPORT. Both of these request types are WebDAV extensions.
Our staff don't seem to be having any trouble checking out or updating the repository across a mix of Windows and Mac clients.
I've so far failed at getting more logging out of SVN and Apache: what I do have doesn't tell me much useful, at least not related to these failures.
This problem is intermittent and infrequent. I'm thinking the next step is network sniffing - although I'm hoping someone can suggest something better. I'm relatively inexperienced with Wireshark and tcpdump (and SVN ...), but what experience I do have suggests all I'm going to get is to learn that SVN stopped providing data without finding out why or how to fix it.
Any suggestions welcomed, thanks.
On Monday morning we had a catastrophic failure of Jenkins (incurred, inevitably, by a software upgrade performed by yours truly). I've been firefighting ever since. I hope to follow up on the several excellent suggestions given here once that particular fire is under control. Thanks all. -- Giles https://www.gilesorr.com/ gilesorr@gmail.com
participants (5)
-
Giles Orr
-
James Knott
-
Jamon Camisso
-
lsorense@csclub.uwaterloo.ca
-
William Porquet