VMware Tanzu Application Service for VMs

 View Only

 PAS for windows: api.sys tcp: i/o timeout error

Canvas BT's profile image
Canvas BT posted Feb 27, 2020 01:25 PM

Hi,

 

I am trying to install PAS for windows (version: 2.8.4, stemcell: windows2019 2019.17) following these instructions, however when we are running the Errand `install-hwc-buildpack` or `smoke tests` I get this error;

Instance windows_diego_cell/a5127526-1fa2-48a3-8658-65f5dd62180d Exit Code 1 Stdout Setting api endpoint to https://api.sys.<url>... FAILED Stderr Request error: Get https://api.sys.<url>/v2/info: dial tcp: i/o timeout TIP: If you are behind a firewall and require an HTTP proxy, verify the https_proxy environment variable is correctly set. Else, check your network connection.

We have not had issues like this on any other tiles.

 

We are running PAS version 2.8.3 on AWS and was wondering if anyone else has had this issue.

 

thanks,

Jim

Daniel Mikusa's profile image
Daniel Mikusa

Did you install this tile into a different AZ/Network than your core PAS tile? It is a networking issue. The connection cannot be made to your Cloud Controller. I asked if you used a different AZ/network, because that would explain how it works for some tiles but not others.

 

I would suggest you check that out. You can do a basic test by accessing one of your deployed Windows Cell VMs and initiating a HTTP request to Cloud Controller. You should be able to `bosh ssh` in and fire up powershell. From there, you can run something like `Invoke-WebRequest -UseBasicParsing https://api.sys.<url>/v2/info` and confirm that you get a valid response. When that works, try applying changes again and I suspect it will work.

 

Hope that helps!

Daniel Mikusa's profile image
Daniel Mikusa

A couple more suggestions:

 

  • Check the DNS configuration on the Cells having issues & compare to the working Cell. I believe `Get-DnsClientServerAddress` in Powershell should show the DNS info.
  • Try `nslookup api.sys.<url>` in addition to `Invoke-WebRequest`. It may give you some additional details. You can also do `nslookup api.sys.<url> <other-dns-server>` to force `nslookup` to talk to a specific DNS server. It would be useful to see if you can get a resolution perhaps using something other than the default DNS server on the Cell.
  • Check the Windows firewall. I believe that is enabled on. You can also do something like this with Powershell to see if the firewall is blocking traffic.
Get-NetFirewallProfile # in the output find and expand LogFileName into $logfile variable Set-NetFirewallProfile -All -LogAllowed $true -LogBlocked $true Get-Content $logfile -Wait -Tail 100 | Where {$_ -match "<SOME-IP>" }
  • Check any anti-virus software you may have installed on your Windows Cells to see if it is blocking traffic. A/V software is the number one cause of unexpected issues with Windows Cells.

 

Hope that helps!

Canvas BT's profile image
Canvas BT

We have deployed it on eu-west-2 region and it is all running on as single network called PAS which is split across the 3 AZs in London. We've checked the security groups that are attached and it has TCP access to the entire VPC cidr block.

 

We are using route53 private zones to resolve our DNS but when we run the command you provided we get the result;

Invoke-WebRequest : The remote name could not be resolved: 'api.sys.<URL>' At line:1 char:1 + Invoke-WebRequest -UseBasicParsing https://api.sys.<URL> ... + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (System.Net.HttpWebRequest:HttpWebRequest) [Invoke-WebRequest], WebException + FullyQualifiedErrorId : WebCmdletWebResponseException,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

I have also tried pinging the cloud controller directly using its private IP address and getting TCP connect errors.

 

I've run the errands manually using the bosh command and it looks like one of the instances can communicate 😕 but the other 2 can't. the one that can connect is running on eu-west-2a, we have cloud controllers running on eu-west-2a and 2b. It's weird it seems like the ones running on 2b and 2c can't resolve the private hosted zone.

 

Canvas BT's profile image
Canvas BT

Thanks for the help @Daniel Mikusa - Tanzu Support​ ! We managed to track it down to an issue with our BOSH config. We were using the 2nd IP address on each AZ rather than the 2nd IP address of the full VPC cidr. We've managed to get away with this because all the Linux VMs put the DNS for AZ 1 as another option to try if the connection fails whereas windows only supports 1 DNS (i think?). Anyway thanks very much for the help :)