VMware NSX

 View Only

 NSX LB returns 502 when forms are submitted

Brandon Stricker's profile image
Brandon Stricker posted Dec 04, 2024 04:05 PM

Hi all,

I am using the built-in NSX load balancer in v4.2.1.0. I have a layer-7 virtual service configured. All was working ok until the upgrade to the NSX v4.1 versions and upgrading to vSphere v8. The configuration I have worked fine before the NSX and vSphere upgrades. Now, even on 4.2.1, when I browse to a site behind the load balancer, the site works ok except when a user types anything into any form and clicks submit. The web browser will spin a while and then a 502 Bad Gateway will be returned. I can switch the server pool to use a layer 4 load balancer and the published site works ok when forms are submitted. I know using Avi would be preferred, but this is for a seldom-used DR site and the built-in load balancer has worked ok until now. Any guidance would be much appreciated!

Martin Kiefer's profile image
Martin Kiefer

Is there any specific reason for using L7 HTTP  LB and not just the L4 TCP?

Brandon Stricker's profile image
Brandon Stricker

Hi Martin. Yes, we have limited public IPs and multiple backend sites to publish. It's more of a reverse proxy than true load balancing, but has worked out well for us in the past. We use request forwarding phase rules to send the traffic to the proper server pool, depending on the incoming _host variable. 

Sulaiman Lodewyk's profile image
Sulaiman Lodewyk

502 is usually an indication that the backend server doesnt know how to respond or uses a different route back to the Load Balancer.
Does anything change when you switching from L7 to L4, in relation to SNAT, Gateway, etc

Brandon Stricker's profile image
Brandon Stricker

Hi Sulaiman. No, nothing changes when switching from L7 to L4 in the backend server's configuration. Browsing the website hosted by the backend server works fine through L7, except when a user enters data into any form field and tries to submit the form. It's very strange behavior. 

Sulaiman Lodewyk's profile image
Sulaiman Lodewyk

I would suggest doing a packet capture.
Something could be changing on how the L7 and L4 handles the routing of the traffic as such not giving the same results. 

Sadath Khan's profile image
Sadath Khan

Hi Brandon, have you validated communication between the pool members and the backend Database servers!! 

Also running pktcaps should be helpful in these scenarios and probably edge failover can be tried to isolate if any issues with the active edge node..

Bogdan28's profile image
Bogdan28

Here’s the Python script designed to troubleshoot the specific issue you’re experiencing with the NSX load balancer in Layer 7 (L7) mode after the upgrade. The script automates checks related to server health, HTTP profiles, SSL/TLS configuration, and traffic analysis.

import requests
import subprocess
import logging

# Enable logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(message)s')
logger = logging.getLogger()

# Constants (replace with your details)
NSX_API_URL = "https://<NSX_MANAGER>/policy/api/v1/"
USERNAME = "admin"
PASSWORD = "password"
BACKEND_SERVERS = ["<server1_ip>", "<server2_ip>"]  # Add your backend server IPs
FORM_TEST_URL = "https://<load_balancer_virtual_service>/test-form"
HTTP_PROFILE_SETTINGS = {"max_header_size": 8192, "max_body_size": 1048576}


def authenticate_nsx():
    """
    Authenticate with NSX Manager and verify API connectivity.
    """
    logger.info("Authenticating with NSX Manager...")
    try:
        response = requests.get(f"{NSX_API_URL}infra", auth=(USERNAME, PASSWORD), verify=False)
        response.raise_for_status()
        logger.info("Authentication successful.")
    except requests.exceptions.RequestException as e:
        logger.error(f"Authentication failed: {e}")
        exit(1)


def check_backend_health():
    """
    Check the health of backend servers by sending HTTP requests.
    """
    logger.info("Checking backend server health...")
    for server in BACKEND_SERVERS:
        try:
            response = requests.get(f"http://{server}", timeout=5)
            if response.status_code == 200:
                logger.info(f"Server {server} is healthy.")
            else:
                logger.warning(f"Server {server} returned status code {response.status_code}.")
        except requests.exceptions.RequestException as e:
            logger.error(f"Server {server} is unreachable: {e}")


def validate_http_profile():
    """
    Validate HTTP profile settings in NSX Manager.
    """
    logger.info("Validating HTTP profile settings...")
    try:
        response = requests.get(f"{NSX_API_URL}infra/tier-1s/<tier_id>/lb-http-profiles", auth=(USERNAME, PASSWORD), verify=False)
        response.raise_for_status()
        profiles = response.json()
        for profile in profiles.get("results", []):
            logger.info(f"Profile: {profile['id']}, Max Header Size: {profile.get('max_header_size', 'N/A')}, Max Body Size: {profile.get('max_body_size', 'N/A')}")
    except requests.exceptions.RequestException as e:
        logger.error(f"Failed to retrieve HTTP profiles: {e}")


def test_form_submission():
    """
    Test form submission to verify if the issue is reproducible.
    """
    logger.info("Testing form submission...")
    data = {"name": "test", "email": "test@example.com"}
    try:
        response = requests.post(FORM_TEST_URL, data=data, timeout=10)
        if response.status_code == 200:
            logger.info("Form submission successful.")
        else:
            logger.warning(f"Form submission returned status code {response.status_code}.")
    except requests.exceptions.RequestException as e:
        logger.error(f"Form submission failed: {e}")


def analyze_http_traffic():
    """
    Capture HTTP traffic for analysis using tcpdump.
    """
    logger.info("Capturing HTTP traffic (requires tcpdump)...")
    try:
        subprocess.run(["sudo", "tcpdump", "-i", "eth0", "port 80 or port 443", "-c", "100", "-w", "http_traffic.pcap"], check=True)
        logger.info("Traffic captured in http_traffic.pcap. Analyze with Wireshark.")
    except subprocess.CalledProcessError as e:
        logger.error(f"Failed to capture traffic: {e}")


def check_ssl_tls():
    """
    Verify SSL/TLS configuration using OpenSSL.
    """
    logger.info("Validating SSL/TLS configuration...")
    try:
        result = subprocess.run(["openssl", "s_client", "-connect", "<load_balancer_virtual_service>:443", "-showcerts"], capture_output=True, text=True)
        logger.info(f"SSL/TLS Configuration:\n{result.stdout}")
    except subprocess.CalledProcessError as e:
        logger.error(f"Failed to check SSL/TLS configuration: {e}")


if __name__ == "__main__":
    """
    Run the diagnostic checks sequentially.
    """
    logger.info("Starting NSX L7 Load Balancer Diagnostics...")
    authenticate_nsx()
    check_backend_health()
    validate_http_profile()
    test_form_submission()
    analyze_http_traffic()
    check_ssl_tls()
    logger.info("Diagnostics completed.")

    Authenticates with NSX Manager: Ensures connectivity to NSX API for configuration checks.
    Checks Backend Server Health: Confirms that the backend servers are responding correctly.
    Validates HTTP Profiles: Retrieves and logs HTTP profile settings to ensure proper configuration.
    Tests Form Submissions: Reproduces the form submission issue and logs the response.
    Captures HTTP Traffic: Captures network traffic for detailed analysis of requests and responses.
    Checks SSL/TLS Configurations: Verifies SSL termination settings to rule out certificate or protocol issues.

    Replace placeholders (<NSX_MANAGER>, <server1_ip>, etc.) with your environment details.
    Run the script on a machine with access to the NSX API and the load balancer.
    Use the captured http_traffic.pcap file for further analysis in tools like Wireshark.
    Review the log output to identify potential misconfigurations or issues.