Automic Workload Automation

 View Only

PSA: $LD_LIBRARY_PATH (again), now with bonus ZDU troubles

  • 1.  PSA: $LD_LIBRARY_PATH (again), now with bonus ZDU troubles

    Posted Jun 21, 2018 06:04 AM

    This post obviously only concerns any AE UNIX users. Windows AE operators may safely skip this and live a happy, fulfilling life nonetheless. For the UNIX folks, this may, however, be very vital:


    Oh, $LD_LIBRARY_PATH, old friend. We meet again, and once again the old saying holds true: $LD_LIBRARY_PATH problems are the B.E.S.T problems (B.E.S.T. = "Bugs Extraordinarily ***tty to Trace"):


    Bonus Complication: ZDU


    We recently had a fun ZDU that generated funky errors ("no worker processes of the new version active") and even corrupt database entries, leading to zombified AE processes. And as a side effect, also prompted various opinions about the correct ways of ZDU'ing, and highlighted, in my opinion, that the documentation needs a serious overhaul. Because it's partly filled with ambiguity, made-up terms that have no unambigous definition ("Distributed Installation" anyone?), and leads to even Automic personel communicating things like "you need to start with two distinct installations of the same binaries", which is NOT required.


    I'd like to page ainda02 on this.


    But above all, it doesn't contain that one clear bit of information that I will tell you today that came out of a nearly three hour WebEx debugging session with an Automic developer:


    Do not set an $LD_LIBRRAY_PATH (other than ".") that does include any Automic directories for the Automation Engine, ever!


    Here's why:

    I learned during the debugging session that Automation Engine implicitly looks for shared objects in the directory that it got started in (no, that's not neccessarily the directory the binaries are in, it's the CWD, the directory you "cd" into before you start the binary, but that should be the same directory the binaries are in). But it prioritizes $LD_LIBRARY_PATH higher.


    Automic shared objects, like, may differ in version between the old and new version when you do a ZDU, and one set of processes may not work with the shared objects of the other version. Worse, if your $LD_LIBRARY_PATH includes the utilities' shared objects, these may be another version than even the matching AE again. If you get a miss-match here, very exciting and mysterious problems will happen!


    If you would like to see proper error messages instead of mysterious problems, please sign my petition.


    Bonus Complication: systemd


    Why did we have an $LD_LIBRARY_PATH in the first place? Because it seemed right (when you want to start things properly without silly cd'ing), and at some point we did the (we believed) right thing and made a centralized baseline config for all Automic things, which included that $LD_LIBRARY_PATH. This was used for Service Manager, which inherited it to the server processes, and it was wired into the environment file that we fed to systemd - because as much as I dislike systemd myself, we're using the current RedHat init system, not the legacy one that will go away - the later incidentally the only one Automic still bases all it's advise and procedures upon.


    So here's the update to any previous systemd information on how the setup should be:


    • do NOT use an $LD_LIBRARY_PATH with Automic stuff (you may put the Oracle directories in there though)
    • launch smgr via your systemd unit file
    • use the WorkingDirectory= directive in the unit file, have it point to your smgr's bin directory. This tells systemd to cd to that directory, hence smgr finds its own shared objects

      You need systemd > version 227 for this, or a backported feature. If in doubt, consult the man page. If your systemd is < 227, you probably need to launch a shell script which cd's to the directory, then starts smgr. This approach will probably break many things related to systemd, and I highly recommend you try to avoid this, and instead get a systemd > 227.

    • smgr will then be used to start your server processes. Those have a "working directory" in smgr dialog. That is the directory the smgr will make the cwd to before starting the processes. Make sure it's correct (i.e. points to the processes bin directory), so server processes will also find their proper shared objects.


    Now someone please remind me again, why am I writing Automic documentation !?