Hi,
I think I
can give some explanation on this topic, in general there are 2 steps a WP
needs to accomplish to become primary:
- Bind the primary port on os level (default
is 2270).
- Register as primary in the database
(table MQSRV).
This registering need to be renewed continues.
During a regular
startup situation the very first WP started can easily catch both and will
become PWP.
The second WP
started on the same AE note cannot bind the primary port, it will continue as regular
WP.
Another WP started
on a different AE note can bind the primary port, but cannot register as
primary to the database (first WP did already), it will continue as regular WP.
In case the
current PWP somehow crashes it’s difficult to forecast which WP will become PWP.
It depends on the system setup (e.g. 1 active AE note, 2 active-active AE
notes, 3 notes, etc.) and on the way the PWP disappeared.
Ways PWP
can become unresponsive are:
- Process stopped
– that means the ucsrvwp process was stopped on os level; regular stop
- Process
crashed – that means the ucsrvwp process aborts on os level or was killed
- Process hang
up – that means the ucsrvwp process is still running on os level, but has
somehow stopped acting within the AE system; e.g. waiting endless for the
database, “dead-end” in code, etc.
- Process loops
– that means the ucsrvwp process is still running on os level; consuming lot of
CPU, typically one core.
In all 4 situation
the primary cannot continue to update the PWP registering according to 2) above.
For the PWP
port it’s different, simplified we can merge 1. and 2., in both situation the PWP
port will become available again. We can also merge 3. and 4., in in both situation
the PWP port will not become available again.
Knowing
this, think about some examples:
Example A )
System setup: 1 AE node & PWP hand up / loop situation:
- No other WP will become PWP, because
the PWP port is not released.
Example B )
System setup: 2 AE nodes & PWP hand up / loop situation:
- In this case we can forecast, the WP
which will become PWP. It will be the WP on the other note, which has already binded
the PWP port.
Example C )
System setup: 3 AE nodes & PWP hand up / loop situation:
- In this case it’s between 2 WPs: On
note 2 and 3 we have a WP which has already binded the PWP port. Which one it
will make is randomly, it will be the one which is faster to register in MQSRV.
Example D )
System setup: n AE nodes & PWP crash situation:
- In this case we cannot forecast, the
WP which will become PWP. Most likely it will be a WP which has already binded
the primary port on another note, because it has already accomplished 1 of 2
steps to become primary. However due to timing constrains it can also happen another
WP on the first note become PWP, even it need to do 2 steps.
I hope this
internals helps to understand how the Engine works.
Enjoy
working with the AE.
KR, Josef