VMware Tanzu Kubernetes Grid Integrated Edition

 View Only

 Superfluous float IP allocations (possible SNAT issue?)

David Holder's profile image
David Holder posted Jul 26, 2019 06:50 PM

Hi all,

 

I have PKS 1.4.1, Bosh for vSphere 2.6.3 and NSX-T 2.4

 

When I create a cluster, it does so successfully, however, I notice it sends sirca 46 allocation requests from the float IP pool in doing so.

 

PKS is configured in no-nat mode.

 

After digging in the audit logs, I see the PKS user is effectively requesting a SNAT rule be implement for each namespace's T1 router (kube-public default Kube-system pks-system) 11 times, all unsuccessful (example below)

 

<182>1 2019-07-26T14:49:15.582Z SRV-NSX-01 NSX 4826 ROUTING [nsx@6876 audit="true" comp="nsx-manager" reqId="e100e75b-4a00-4d13-9320-8d089dd759e8" subcomp="manager"] UserName="pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef", ModuleName="Nat", Operation="AddNatRule", Operation status="failure", New value=["1a80880c-14fb-4c29-812c-7f36532c98cf" {"rule_priority":102"scope":"ncp/version","tag":"1.2.0"},{"scope":"ncp/cluster","tag":"pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef"},{"scope":"ncp/project","tag":"pks-system"},{"scope":"ncp/snat","tag":"true"},{"scope":"ncp/extpoolid","tag":"5677c899-dd97-4711-aa48-abc1bbd55d13"}],"_protection":"UNKNOWN"}] <182>1 2019-07-26T14:49:48.588Z SRV-NSX-01 NSX 4826 ROUTING [nsx@6876 audit="true" comp="nsx-manager" reqId="60912889-64cd-400d-834e-8d7471b37323" subcomp="manager"] UserName="pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef", ModuleName="Nat", Operation="AddNatRule", Operation status="failure", New value=["1a80880c-14fb-4c29-812c-7f36532c98cf" {"rule_priority":102"scope":"ncp/version","tag":"1.2.0"},{"scope":"ncp/cluster","tag":"pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef"},{"scope":"ncp/project","tag":"pks-system"},{"scope":"ncp/snat","tag":"true"},{"scope":"ncp/extpoolid","tag":"5677c899-dd97-4711-aa48-abc1bbd55d13"}],"_protection":"UNKNOWN"}] <182>1 2019-07-26T14:50:49.583Z SRV-NSX-01 NSX 4826 ROUTING [nsx@6876 audit="true" comp="nsx-manager" reqId="f9dc620e-9cf9-4659-89af-0674d7da2287" subcomp="manager"] UserName="pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef", ModuleName="Nat", Operation="AddNatRule", Operation status="failure", New value=["1a80880c-14fb-4c29-812c-7f36532c98cf" {"rule_priority":102"scope":"ncp/version","tag":"1.2.0"},{"scope":"ncp/cluster","tag":"pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef"},{"scope":"ncp/project","tag":"pks-system"},{"scope":"ncp/snat","tag":"true"},{"scope":"ncp/extpoolid","tag":"5677c899-dd97-4711-aa48-abc1bbd55d13"}],"_protection":"UNKNOWN"}] <182>1 2019-07-26T14:51:50.633Z SRV-NSX-01 NSX 4826 ROUTING [nsx@6876 audit="true" comp="nsx-manager" reqId="49b4fa2f-9049-4e98-a4ce-aee9db3e244e" subcomp="manager"] UserName="pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef", ModuleName="Nat", Operation="AddNatRule", Operation status="failure", New value=["1a80880c-14fb-4c29-812c-7f36532c98cf" {"rule_priority":102"scope":"ncp/version","tag":"1.2.0"},{"scope":"ncp/cluster","tag":"pks-a8c5928e-94a8-4f9c-b2f3-dd9fece5dbef"},{"scope":"ncp/project","tag":"pks-system"},{"scope":"ncp/snat","tag":"true"},{"scope":"ncp/extpoolid","tag":"5677c899-dd97-4711-aa48-abc1bbd55d13"}],"_protection":"UNKNOWN"}]

I have noticed, however, that the PKS t1 routers are not configured to use a edge cluster. As A edge cluster is required for NAT services, I can see why this is failing.

 

The edge cluster is healthy, and is accommodating other T1's including the Loadbalancer T1Routers

 

Anyone got any idea as to why this is happening?

Kyle Roberts's profile image
Broadcom Employee Kyle Roberts

Hi David,

 

Can you confirm you do 'not' have NAT mode selected, from Opsmanager -> PKS Tile -> Networking?

 

Also, wrt these allocation requests: are you seeing any of them actually being allocated?

 

Also, check the NSX-T container plugin (ncp) logs on your cluster master.

David Holder's profile image
David Holder

Hi Kyle,

 

Can you confirm you do 'not' have NAT mode selected, from Opsmanager -> PKS Tile -> Networking?

 

Confirmed:

 

nonat

 

Also, wrt these allocation requests: are you seeing any of them actually being allocated?

 

Yes, NSX-T is allocating them:

 

IPPOOL

 

 

 

Also, check the NSX-T container plugin (ncp) logs on your cluster master.

 

1 2019-07-26T18:24:46.642Z c36c4556-1a9e-41ee-b73a-756fcfe887cf NSX 10262 - [nsx@6876 comp="nsx-container-ncp" subcomp="ncp" level="WARNING"] nsx_ujo.common.controller NamespaceController worker 0 failed to sync pks-system because of nsx manager exception: Unexpected error from backend manager (['SRV-NSX-01.virtualthoughts.co.uk']) for POST api/v1/logical-routers/1a80880c-14fb-4c29-812c-7f36532c98cf/nat/rules: Found errors in the request. Please refer to the related errors for details. relatedErrors: [NAT] NAT action SNAT is NOT supported on the HaMode ACTIVE_ACTIVE of Logical Router 1a80880c-14fb-4c29-812c-7f36532c98cf.

Looks like it doesn't like my T0 being in active-active. Will reconfigure and report back.

David Holder's profile image
David Holder

Had to configure my edge cluster as active/passive, not active-active.

Kyle Roberts's profile image
Broadcom Employee Kyle Roberts

Thanks David for the update. Question: Once you reconfigured your edge cluster to the supported Active/Standby configuration, what happened to the IP allocation numbers for your floating IP pool?

(eg: did it remain at 45? Decrease?)

I should write a KB article for this and that information could be helpful