Issue:
In TCA manager UI, the node pool was showing as processing for a prolonged duration. Based on the errors in the logs, I am going to share the troubleshooting that fixed the issue that was applicable to the scenario.
There may be several factors that can cause the node pool to be stuck in processing state. One of them is intermittent time drift causes inconsistencies with a node pool's state and the UI which is a known issue with Telco Cloud Automation (TCA) 2.1.1 and older which can be referred here.
Issue Description:
- In the CaaS Infrastructure view in TCA you see any cluster instance that shows Node Pools as processing state.
2. Click on the cluster and navigate to the Node Pools to identify which Node Pool has issue.
3. Based on the logs from TCA and the events in the specific cluster were pointing to “ipvlan not found vmconfig status failed” the issue was identified to be related to the networking.
4. Go to Node Customizations and check if that has all correct settings. For the issue that I was working on, it was found that the network was not properly assigned to the Node Pool.
Resolution:
The portgroup associated with the node pool was showing incorrect configuration, "ipvlan" instead of the corresponding dvportgroup- {id}. To fix this, terminate the CNF and manually instantiate it with the correct port group association. This resulted in the successful instantiation of the NF and the node pool transitioning to the provisioned state.
Fix:
- Make sure that you have details for the values like the CNF instance Name, cloud Name, catalog name from the General Properties and the yaml file from Instantiation properties ,repository details and ipvlan details from the Init Params page of the CNF instance page. This is required for instantiating the CNF. Make a note of these values.
Download the values.yaml file from the global values section.
2. Go to NF inventory and terminate the CNF.
3. Once the terminate action is completed, delete the CNF.
4. Go to Network Function Catalog. Search for the catalog that was used to instantiate this NF, click on the 3 dots and click Instantiate.
5. Instantiate using the correct values from the values that you previously made a note of.
6. Select the correct cloud name/cluster name and the Node-pool name that shows as processing status.
7. Enter Namespace and harbor repository and press next
8. Press next on Network Function Properties
9. On the Inputs page:
- Upload the correct values.yaml file in Global Values that were previously downloaded when taking backup of the values.
- Browse to the correct ipvlan value that this node pool must be configured with.
10. On the Review page, validate the details and click on Instantiate.
11. You can check the progress of the instantiation by navigating to the Network Function Inventory section. Wait for the NF state to change to Instantiated and State Complete.
12. To validate the status; go to CaaS infrastructure and check the node pool status, It should be now in Provisioned state.
Conclusion:
Since the errors were pointing to the networking issues, instantiating a new NF with the correct port group association fixed the issue. This resulted in the successful instantiation of the NF and the node pool transitioning to the provisioned state in TCA.