Cleaning Up Stale PKS Kubernetes LoadBalancer IP allocations in NSX-T

Posted On // Leave a Comment
I was working in a proof of concept environment using NSX-T where we didn't have a lot of IPs in the Floating IP pool for k8s clusters provisioned by PKS. We had two clusters deployed and we were trying to start up a few pods with a LoadBalancer service. The problem we hit was that the pods wouldn't startup, and were failing in the "Init" status. We weren't seeing enough details via kubectl, so we found the node that was trying to start the Pod's containers, and checked the kubelet.log for more details. Interestingly, we noticed some messages about NSX-T right before the pods failed. This got us thinking that although these particular pods didn't have any special initialization, NSX-T was doing some work to try and allocate resources for the Pod to expose it via a LoadBalancer service.
On a hunch, I checked the NSX-T Manager, and went to Inventory -> Groups -> IP Pools section, and noticed that the Floating IP Pool had all the IPs allocated! Come to find out, someone had deployed another cluster without our knowledge, and we had run out of Floating IPs. We deleted this extra cluster, and that got us going again for the time being. However, I noticed that there were a bunch of IPs allocated from the pool for only two clusters. I could understand 1 IP for the masters (NSX-T gets a Load Balancer configured for the masters in the cluster), and each cluster had 1 additional LoadBalancer service provisioned, but that didn't account for all the extra IPs that had been allocated. We tried doing traceroutes to all those allocated IPs and found that many of them were unresponsive. We found out that also someone during testing had deleted some clusters with BOSH without using the PKS CLI. They had cleaned up all the objects they could find in NSX-T, but hadn't taken care of the IPs from the Floating IP Pool.
To clean those up, I simply called traceroute against each of the allocated addresses in the Floating IP Pool, and then called the NSX-T API to release the ones that weren't responding. The API call to remove IPs allocated to an IP Pool. This was the call I was able to make against NSX-T 2.3:
curl -k -u : -X POST 'https:///api/v1/pools/ip-pools/?action=RELEASE' -H "X-Allow-Overwrite: true" -d '{"allocation":""}' -H "Content-Type: application-json"
Replace the parts in angle brackets above with your info.

0 comments: