I have a weird one.
My company has been in Azure for about six months. We have a site-to-site VPN from Azure to one of our data centers.
For no apparent reason, our VMs are supposedly closing connections to remote clients.
It's especially painful for Linux VMs. The SSH connection will simply die in the middle of doing work. No warning, just all of a sudden you're not seeing keystrokes.
I've used multiple SSH clients for this and get exactly the same results.
RDP also disconnects, though it's not as intrusive, as RDP tends to recover.
In examining Wireshark captures, it's clear that the client is receiving a disconnect initiated by the server -- or at least that's how the capture looks. There's a [FIN, ACK] sent by the SSH client and the session dies.
I cannot see that the VM is actually sending this [FIN, ACK]. I know that's what the Wireshark captures say, but it's hard to credit the idea that all of our Linux VMs randomly disconnect.
I assume something similar is going on with the RDP side, but I've not had time to troubleshoot the Windows side.
SSH timeouts are random. I've had sessions last as briefly as 25 minutes; to date, I've had no session last longer than 56 minutes.
SSH timeouts are not the result of inactivity, and in any case the client is sending keepalives. Disconnection can occur at any time, including while we're working. In fact, we've taken to running screen at login so that we can easily continue our work when the session inevitably dies.
To make matters worse, I've collected some statistics over the course of a week. We are taking errors on the VPN. Over time, it averages out to about 1% of the packets. However, the first hop router out of our VPN can peak at about 50% packet loss. In fact, all of the packet loss that I can pinpoint appears to be within Microsoft's infrastructure.
I'm flummoxed.
Comments, questions, or nasty remarks are invited. :)