Automic Workload Automation

 View Only
  • 1.  AAKE deployment failed to load initial data

    Posted Nov 15, 2022 09:10 AM
      |   view attached
    Hello,

    I am trying to deploy AAKE V21.0.4 and AAKE deployment failed to load initial-data POD.

    Initial-data job log shows that the DB host defaults to "ae-db" and it is not picking the one configured in "ae-db" secret.



    Below is the host configured in "ae-db" secret:



    The ae secret auto build by the automic-automation deployment prepares the ODBC and JDBC connection with "ae-db" as host, which is failing to load initial data.



    I am using V21.0.4 (latest build), does any one encountered this porblem? Any clue how can I override those parameters during deployment?

    Thanks in advance.

    ------------------------------
    Thanks,
    - Reddy
    ------------------------------


  • 2.  RE: AAKE deployment failed to load initial data

    Posted May 12, 2023 10:12 AM

    Hi,
    I am facing almost similar issue. Using AAKE deployment in AWS - 21.0.5

    ae-initialdata:21.0.5.4. The intialdata pod always hangs in Error state however could not find any error in logs too. Attaching the environment variables section from values.yaml file and the log file of initialdata pod. (Masked all confidential data in the files).

    It will be helpful if we have any documentation on what are mandatory env vars to be set for AE and AWI in the values.yaml file and also troubleshooting docs for issues in AAKE deployment.

    Thanks


    Attachment(s)

    txt
    initialdata_log.txt   11 KB 1 version
    txt
    values.yaml.txt   1 KB 1 version


  • 3.  RE: AAKE deployment failed to load initial data

    Broadcom Employee
    Posted May 15, 2023 03:08 AM

    Hi,

    In the Automic documentation you can find the steps to prepare for the Container-based installation, including all the values.yaml parameters that can/should be set before the deployment. These are for the most part the same as for an on-prem installation and include DB connection (configured as a secret in values.yaml), predefined client 0 user (manually created as a secret before), system name (as environment variables in values.yaml), JCP/REST endpoints (as environment variables in values.yaml), certificates, enable/disable Ingress (in values.yaml).
    Please also be aware that the DB service requires kube/core dns to resolve the database address.

    Preparing for the Container-based installation, see also DB section for examples how to create the DB secret:
    https://docs.automic.com/documentation/webhelp/english/ALL/components/DOCU/21.0.5/Automic%20Automation%20Guides/Content/Installation_Containers/containers_InstallingPreparing.htm?tocpath=Installing%7CContainer-Based%20Installation%7CPreparing%20for%20the%20Container-Based%20Installation%7C_____0

    On the Academy you can also find an example for a deployment to AWS using Fargate:

    https://www.software.broadcom.com/hubfs/ESD/ESD_Academy/ESD_FY21_Academy/ESD_FY21_Academy_Files/ESD_FY21_Academy_Files_AIOps/Installing%20Automic%20Automation%20Kubernetes%20Edition%20on%20Amazon%20AWS%20(v1.1).pdf?hsCtaTracking=76fee8c8-14f4-434c-8db1-9013318956d9%7Ca49b31f5-2323-42f9-a5bc-19911018456d

    Additionally, we have a FAQ page for AAKE that is being regularly updated:

    https://academy.broadcom.com/blog/aiops/faqs-on-configuration-when-moving-to-aake

    Hope this helps,

    Oana




  • 4.  RE: AAKE deployment failed to load initial data

    Posted May 16, 2023 12:43 PM

    Hi Oana,

    Yes we have referred to all the mentioned documents already and configured all the values correctly and the secret too.
    The DNS resolution of RDS DB from core-dns also happens.
    Yet we still see the job fails and does not provide any error log.
    Any other mandatory thing apart from these that needs to be verified or checked?

    Thanks.




  • 5.  RE: AAKE deployment failed to load initial data

    Broadcom Employee
    Posted May 17, 2023 02:08 AM

    Hi,

    As far as I can see in the attached values.yaml, you set the DB connection strings as environment variables:

     AUTOMIC_JDBC_SQLDRIVERCONNECT: "jdbc:postgresql://aake-engine-dev.xxxx.us-xx-1.rds.amazonaws.com:5432/aedb"
      AUTOMIC_ODBC_SQLDRIVERCONNECT: "ODBCVAR=NNJNIORP,host=aake-engine-dev.xxxx.us-xx-1.rds.amazonaws.com port=5432 dbname=aedb user=automic password=--10433Exxxxxx17E connect_timeout=10 client_encoding=LATIN9"

    This is not necessary since they are being generated from the ae-db secret during the deployment. So if you created and configured all the DB information in the secret, all the required information is included there.

    BR,
    Oana




  • 6.  RE: AAKE deployment failed to load initial data

    Posted May 17, 2023 12:02 PM

    Hi Oana,
    As you have mentioned if I am not specifying the SQLDRIVERCONNECT for ODBC and JDBC the initial-data pod fails as it takes the default sqldriverconnect (Microsoft SQL) details as below:

    ...<skipped>
    [ODBC]
    ..<skipped>
    sqlDriverConnect=ODBCVAR=NNNNNNRN,DSN=UC4;UID=uc4;PWD=--1037B2E22BF022EBE2;Mars_Connection=Yes
    ..<skipped>
    


    But if I specify those values the initial-data job picks up and is able to connect with DB.

    ...<skipped>
    [ODBC]
    ...<skipped>
    ..sqlDriverConnect=ODBCVAR=NNJNIORP,host=app-xxxx-aake-engine-dev.xxxxx.us-xxx-1.rds.amazonaws.com port=5432 dbname=aedb user=automic password=??? connect_timeout=10 client_encoding=LATIN9
    ..<skipped>



    So I had to add the details in environment variables after which the initial-data pod job completes successfully. But we still see some issue in the jcp-ws, jcp-rest and jwp pods where the sqldriverconnect for jdbc is taken incorrectly (host is taken as the secret name ae-db and not the host specified in the secret or env var) as below:

    [ODBC]
    ......<skipped>
    sqlDriverConnect=ODBCVAR=NNJNIORP,host=ae-db port=5432 dbname=aedb user=automic password=??? connect_timeout=10 connect_timeout=10 client_encoding=LATIN9
    ....
    ....<skipped>
    [JDBC]
    ...<skipped>
    sqlDriverConnect=jdbc:postgresql://ae-db:5432/aedb

    When analyzed further, the deployment files for jcp,jwi,jwp found that the secreRef is for 'ae' secret which has the SQLDRIVERCONNECT values incorrect with the host value as the secret name 'ae-db' and not the value inside the secret. Wanted to know how this secret gets this data as its managed by the install operator. 

    It will be really helpful if we get a working session on resolving this issue. Please let us know how we can get a technical guidance on call/working session.

    Thanks,
    Sebasti. S




  • 7.  RE: AAKE deployment failed to load initial data

    Broadcom Employee
    Posted May 17, 2023 12:15 PM

    Hi Sebasti. S,

    The ae secret is generated during the deployment from the ae-db secret that you created with the DB information. These should not be changed manually. We then use an ExternalName service (also called ae-db) to resolve the address of the database, that's why you need to have a DNS service running in the cluster. If you use Fargate on AWS, you need to iclude the core DNS pods and remove the ec2 annotations. You can find an example of how to deploy to AWS here https://www.software.broadcom.com/hubfs/ESD/ESD_Academy/ESD_FY21_Academy/ESD_FY21_Academy_Files/ESD_FY21_Academy_Files_AIOps/Installing%20Automic%20Automation%20Kubernetes%20Edition%20on%20Amazon%20AWS%20(v1.1).pdf?hsCtaTracking=76fee8c8-14f4-434c-8db1-9013318956d9%7Ca49b31f5-2323-42f9-a5bc-19911018456d

    If you need more help with the configuration or deployment please contact support.

    BR,
    Oana




  • 8.  RE: AAKE deployment failed to load initial data

    Posted May 17, 2023 12:33 PM

    Hi Oana,
    Thanks for you immediate reply.
    As I had mentioned already, the 'ae-db' secret has correct values from which all initial-data pods, awi, ae-wp pods are getting created as expected and connects to DB also (thereby proving no issue with coredns and also verified).
    The jcp-rest, jwp, jcp-ws pods are taking 'ae' secret reference. We did not change anything manually. When I checked the 'ae' secret it had the host value incorrect for which I have attached the logs etc. That is why wanted to understand how that happens.  Hope you understand the issue.
    'ae-db' secret values are correct and works as expected for few pods and incorrectly referenced in 'ae' secret which is picked up by the other pods which fails.

    Thanks