The following post describes a currently ongoing potential issue with ZDU, and a general issue I have with the process as it stands today. I will ask several questions. This post is half "for your information", half looking for answers, and ... erm ... half rant. Nevertleless, please provide any input you may have.
We have done several ZDU over the last year (12.1 to various service packs of 12.1). The whole ZDU process feels a bit tacked-on and improvised, and I wonder how they'll ever ZDU the service manager. But apart from some hickups (due to now known issues, such as missing Oracle sequences, and a botched shared library path), these ZDU actually worked quite okay - especially the later ones.
On Monday then, we upgraded 12.1.1+hf3 to 12.1.2, and that worked. Then we attempted to upgrade a 12.1.1+hf2 to 12.1.2, and that failed. The installation is now in a limbo state, half old, half new version. After some digging arround, we found this message at the root of the issue (that nobody else here ever had before, it seems):
U00005134 Component 'UCSJ' is newer than the registered one. Start your system again using mode 'cold start'.
ucsj.so is a shared object (or a DLL on the Windows AE), 50 MB is size, and apparently contains vital AE code that also depends on the DB schema. It is, for all intents and purposes, one of the main components of the engine. And it's ironic, because some shared objects are not versioned, which is a problem, and some (like ucsj.so) apparently are versioned. Apparently this works if you're already on the newer ucsj.so from 12.1.1+hf3, but fails if you're on the older one from 12.1.1+hf2.
Question 1: Does anyone have any insights into where or how this versioning of ucsj.so happens? Is the "registred" version stored somewhere in the DB?
I opened an incident for this, and it's still open, I'm waiting desperately to hear back from CA since Monday, because we're running out of time with the upgrade schedule. And the installation, despite being a test system, is a very vital one. People shout at me if it doesn't work.
But the one thing Automic did quickly say confused us. Automic claimed that we were (probably) doing ZDU wrong. Thus far, we installed the new server and utility version in separate folders, launched the ZDU wizard, updated the DB as prompted by the wizard in step 1, then halved (or duplicated) the processes (CP/WP etc.), launched half of the processes anew from the new version (also all in accordance with the ZDU wizard) and took it from there. As I said, until Monday, that worked fine for us.
Automic now claims one needs to duplicate the old version's bin folder first, and ideally always keep two of the same versions of the Engine arround, and then ZDU one of these folders starting not with a bin folder of the new version, but with a cloned bin folder of the previous version, which then gets it's binaries updated at some point during the process. We are certain that this was NOT the method demonstrated when ZDU was show-cased at several occasions by Automic.
I scoured their documentation in German and English, it was seemingly heavily updated for 12.x, so 11.x is different again. The documentation (I'll only be looking at 12.x) appears (it's really not clear to me) to reference the above procedure as a distributed installation, which it calls "best practice" (which to me means recommended, but also optional). Then, later on, they say under the same heading ("Distributed Installation") that it's a rarely used thing in huge environments. It also says that:
you have to set up two separate installations in separate bin directories, one instance of the version you want to upgrade from (base), another of the version you want to upgrade to (target).
So, is this target version meant to be the target version right away, or a cloned target folder with the current version that becomes the target version at some point during the ZDU process?
If you're still with me at this point (kudos - I know I would probably not be), let's break this down to:
Question 2: Did you (unlike me) know you are (probably) supposed to start off with a cloned bin folder for the original version for the ZDU?
Question 3: Are you also (before or after this thread) feeling that the whole ZDU documentation is severaly lacking, confusing, in part made up of unhelpful terms and should be scrapped and re-done properly from scratch?
Question 4 (probably rhetorical): Do you feel just as confused about ZDU as I do now?
Then, we get to the limbo state. My vital installation is currently all back to 12.1.1 processes and it runs, but since I have started the ZDU, that includes updating the database schema (unhelpfully called "loading initial data"). In fact, that's the first step of the ZDU (which also seemed to come as somewhat of a surprise to the Automic person I spoke to).
Automic claims that there is no way back from this, and when I asked at AutomicWorld, also made it clear that the "Rollback" in a ZDU is not in fact a rollback: Once you started a ZDU, you have to finish it. There is no rollback. There is only delaying the inevitable.
Question 5: Were you aware that there is no such thing as a ZDU rollback (short of downtime and loading an old database backup)?
And in light of all this:
Question 6: Would you agree or disagree that ZDU feels a bit tacked-on, and that clients are left to some extend alone figuring out how to work it?
Question 7: Will you be using ZDU for production systems at present time, or will you be using the "classical" way of upgrading?
p.s. no, the Youtube video was no help either. It also seems to hint at cloning your original binaries, but in such an ambigous way that my colleague doesn't believe me when I say it does.
p.p.s. I don't even think duplicating the original "bin" folder makes a difference to the ucsj.so issue, it's just adding extra confusion to the case.