Automic Workload Automation

Expand all | Collapse all

AE v9: "Distributed load" host group for tasks starting simultaneously

  • 1.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 14, 2015 07:51 PM
    I've got an agent group with 12 hosts in it. I've got 10 jobs that need to start at (nearly) the same time. I want them to run on 10 different hosts. If any one host goes down, I want the job to choose one of the two idle hosts and seamlessly run on it.

    I tried "distributed load" and "next" setups for the host group, but in either case, the jobs all choose the same host.

    I tried adding  :WAIT commands to each task's pre-process to stagger their start times (all tasks generate at runtime);  same result (only one host gets assigned all 10 jobs). 

    I found that staggering the tasks with a pre-condition (re-evaluate in X minutes) will do the trick, but I don't want it to take 10 minutes to start my 10 jobs.

    Is there any elegant way to actually load balance in a host group?


  • 2.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 15, 2015 11:56 AM
    Jessica:

    We don’t use Host Groups for a couple of reasons; they weren't available when we started using UC4 at V3.1 and they currently have too many restrictions on our V8 release.

    We have written our own “grouping” routine that serves us well.  This was initially developed for the Oracle CC&B product that has “multi-threaded” jobs.  Multiple “threads” can be executed concurrently and spread across a number of Agents.  For us, this is routinely 6 hosts and up to 400 jobs (i.e. threads).  Each job’s Pre Process tab uses a common Include that determines which Agent to use and then uses a PUT_ATT to set the desired host.  The Include’s logic has different methods that it can select for distributing the jobs.  There are provisions to ensure that the host is active before performing the PUT_ATT and other considerations to ensure the desired distribution occurs.

    These methods are (a clip from our documentation):

    NEXT  
    The jobs are distributed across multiple hosts sequentially.  If, for example, there are three hosts available, thread 1 is on the first host, thread 2 is on the second host and thread 3 in on the third host.  Then the distribution for thread 4 starts again on the first host, thread 5 on the second host and on and on until the requested number of threads on the batchcode_THREADS keyword has been reached. 

    GROUP 
    The jobs are distributed in groups of continuous threads across multiple hosts.  If, for example the batchcode_THREADS keyword requests 50 threads and 3 hosts are available, threads 1-50 are executed on the first host, 51-100 on the second host and threads 101-150 will execute on host 3.

    SINGLE 
    The jobs are all executed on a single host as specified in the @HOST_4_SINGLE Keyword.  This contains the Host/Agent name to be used for all batch-codes that use the SINGLE method on their batchcode_THREADS keyword.  The other batchcode_THREADS parameters, such as minimum and maximum hosts are not to be specified.  


    We do not “hard-code” the Host and Login attributes of Job objects and set them at activation in a manner similar to above.  This and a few other techniques allow us to develop and test objects in one Client and then Transport them to other Clients without modification all the way to production. 

    We have found this to be quite successful for us and perhaps some variation of something similar could be employed by you if you can’t find a satisfactory solution for your issue.



  • 3.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 15, 2015 12:13 PM
    Thank you, Mark! I always appreciate your clear and relevant answers. It's an interesting approach to keep the configuration info in a different (presumably more centralized) place from the job objects; if I had the whole implementation to do over again, I would consider it seriously. 

    I will leave the question open for other input, but it seems like your concept could work for us. Do you determine the members of the host list with SQL, or some other way? I'm thinking to keep the host group intact and assign it to the objects, but then use SQL to determine the active members of the assigned group & choose one for each job. My biggest concern is transparency/simplicity; not everybody using the tool is going to be able to follow what's going on. (And someday I may get hit by a bus or otherwise not be here.) 


  • 4.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 15, 2015 12:33 PM
    We keep all of the attribute information in Variable objects.  There is one per Client and their contents are Client specific.  For example there is a ATTRIBUTE_VALUES_CLIENT1 (production) and ATTRIBUTE_VALUES_CLIENT12 (development), etc.  The Include's logic just does something like:
    :SET &client = SYS_ACT_CLIENT()
    :SET &client = FORMAT(&client)
    :SET &settings = "ATTRIBUTE_VALUES_CLIENT&client" 
    and then:
    :SET &host = GET_VAR(&settings,validity Keyword)
    We tend to not use anything but available Script Functions and avoid querying the UC4 tables.  Hope this helps.

    p.s. Couldn't you use the PREP_PROCESS_AGENTGROUP function to determine the members of the group (I've not tried it)?  Assuming that works, it might be a bit more straight forward and obvious than SQL.


  • 5.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 17, 2015 04:21 AM
      |   view attached
    Hello Jessica,

    I've checked with v9 and v10 for the HostGroup, there's  "Load dependent" option that might fit your requirements. Please check more details in the documentation: http://docs.automic.com/documentation/AE/9_SP11/english/AE_WEBHELP/uc4.htm#ucaclg.htm?Highlight=%22AgentGroup%20Tab%22.



  • 6.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 17, 2015 10:44 AM
    Hi Bin,
    Unfortunately, if the jobs are requested all at the same time, they don't respect each others' resources. 
    ~Jessica


  • 7.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 17, 2015 11:31 AM
    Interesting... so the "Load dependent" feature does not work for events that are in a generating state. 

    I believe that :WAIT requests in the preprocess would also be part of generate phase...

    I sometimes use SYNC objects to force different processes to run single-threaded.  What if you added a dummy step to the beginning of each workflow, and this dummy step has a SYNC object attached to it so no two dummy steps can run at the same time?  This might give the "Load dependent" feature a chance to work as desired?



  • 8.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 17, 2015 02:28 PM
    Inspired by Mark's ideas, I wrote a pre-process include which runs through the appropriate agent group and assigns the Nth active host in the group to the job ("N" being declared as a variable on the job object or task properties). If there are fewer than N hosts in the group, the assignment will roll over to an active host, essentially using the "next" model. 

    Because I want to track which agent groups are used for which jobs (with simple "search for use" and without using SQL), I want the agent group specified on the job objects (as their assigned agent) rather than in variables. Unfortunately, I don't think there's any way to know the relevant agent group using only scripting -- because once the job gets to pre-process, the individual agent is already selected, and you can't determine the original agent group.

    So I made a SQL variable to get the original agent group name from the job's definition, then used PREP_PROCESS_AGENTGROUP to loop through it, assigning host #N as requested, or another host if there aren't enough hosts. As a bonus, when there aren't as many hosts as expected, I can toss out a notification that more hosts should be added.

    This is a rather tight use case that might not work for other shops, but I think it will work for our environment: we tend to have jobs so large that they are split up to run the same job across multiple hosts, the only difference being the data segment. And we have pools of VMs which are dedicated to each job, but the VMs are rather unstable, and come and go frequently.  So we have JOB_1 through JOB_10, and a pool with ~about~ 10 hosts.

    I have a strong feeling this might be overkill and a simpler way will appear. :blush:  But just in case anyone else has the same challenge, here's what I've got so far...

    &HOST_GROUP_LNR# is assigned on the job object, as the requested host number


    =====PRE-PROCESS INCLUDE=====


    !Identify the original host group assigned to the job :SET &MY_HOST_GROUP# = GET_VAR(VARA.SQL.MY_HOST_GROUP) !Loop through the host group to find out how many active hosts it has :SET &hnd1# = PREP_PROCESS_AGENTGROUP(&MY_HOST_GROUP#,,"ALL") :SET &ct#=0 :PROCESS &hnd1# :  SET &IS_ACTIVE# = GET_PROCESS_LINE(&hnd1#,2) :  IF &IS_ACTIVE# = "Y" :    SET &ct#=&ct#+1 :  ENDIF :ENDPROCESS :SET &ct#=FORMAT(&ct#,"9") !Now check if we have enough hosts to assign the host number we want :PRINT "Requested line number &HOST_GROUP_LNR# in &MY_HOST_GROUP#" :IF &HOST_GROUP_LNR# > &ct# !  If not, assign a mod of the host number and send a notification that we're short of hosts :  SET &MOD_LNR# = MOD(&HOST_GROUP_LNR#,&ct#) :  IF &MOD_LNR# = 0 :    SET &SELECTED_LNR# = &ct# :    ELSE :    SET &SELECTED_LNR# = &MOD_LNR# :  ENDIF :  SET &SELECTED_LNR# = FORMAT(&SELECTED_LNR#,"9") :  PRINT "Host group &MY_HOST_GROUP# has &ct# active members" :  PRINT "Not enough hosts to assign line number &HOST_GROUP_LNR#" :  PRINT "Selected new line number &SELECTED_LNR#" !  Some variables for the notification :  PUT_READ_BUFFER MY_HOST_GROUP# = &MY_HOST_GROUP# :  PUT_READ_BUFFER CT# = &ct# !  Send the notification :  SET &ACT# = ACTIVATE_UC_OBJECT(INSUFFICIENT_HOSTS_NOTIFICATION,,,,,PASS_VALUES) :ELSE :  SET &SELECTED_LNR# = &HOST_GROUP_LNR# :ENDIF !Go through the host group and find the Nth active host, and assign the job to it :SET &hnd2# = PREP_PROCESS_AGENTGROUP(&MY_HOST_GROUP#,,"ALL") :SET &ct#=0 :PROCESS &hnd2# :  SET &IS_ACTIVE# = GET_PROCESS_LINE(&hnd2#,2) :  IF &IS_ACTIVE# = "Y" :    SET &ct#=&ct#+1 :    IF &SELECTED_LNR# = &ct# :      SET &AGENT_NAME#=GET_PROCESS_LINE(&hnd2#,1) :      PRINT "&AGENT_NAME# is active host number &SELECTED_LNR# in host group &MY_HOST_GROUP#" :      PRINT "Assigning this job to &AGENT_NAME#" :      PUT_ATT HOST = &AGENT_NAME# :    ENDIF :  ENDIF :ENDPROCESS


    =====VARA.SQL.MY_HOST_GROUP====


    SELECT hgoh.oh_name FROM oh joh inner join jba on jba_oh_idnr = joh.oh_idnr INNER JOIN oh hgoh ON jba_hostdst = hgoh.oh_name AND hgoh.oh_client = &$CLIENT# WHERE joh.oh_name = '&$NAME#' AND joh.oh_deleteflag = 0 AND joh.oh_client = &$CLIENT#




  • 9.  AE v9: "Distributed load" host group for tasks starting simultaneously
    Best Answer

    Posted Apr 17, 2015 04:40 PM
    Jessica:

    I like parts of your solution and may adapt it for our use.  Since, as I said, we avoid SQL on the UC4 tables, I used the following instead to get the Agent Group name.  This is just a proof of concept and appears to function as needed.

    :SET &rid = PREP_PROCESS_REPORT(,,ACT,"* U0005014 *")
    :PROCESS &rid 
    : SET &agrp = GET_PROCESS_LINE(&rid)
    : SET &here = STR_FIND(&agrp," '")
    : SET &here = ADD(&here,2)
    : SET &agrp = SUBSTR(&agrp,&here)
    : SET &here = STR_FIND(&agrp,"',")
    : SET &here = SUB(&here,1)
    : SET &agrp = SUBSTR(&agrp,1,&here)
    : ENDPROCESS 
    :PRINT "Agrp=&agrp"
    :SET &hid = PREP_PROCESS_AGENTGROUP(&agrp,,ALL)
    :PROCESS &hid 
    : SET &host = GET_PROCESS_LINE(&hid,1)
    : PRINT "Host=&host"
    : ENDPROCESS
    :STOP MSG,55,"Testing"

    Thanks for the "inspiration"!   B)


  • 10.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 20, 2015 12:17 PM
    >@</p>Pete Wirfs" , the first time I looked at the documentation for SYNC, it scared me off, and I haven't looked since! But I think the equivalent (simpler than getting the host group right in the first place, though not as elegant) is something we've resorted to in cases: a simple Unix "sleep N" job just before each instance of the main job. 10 seconds for one, 20 for the next, etc.

    But I'd really like it if the group just worked as it intuitively should, distributing the load or at least rotating the assignments -- even when the jobs are requested simultaneously.


  • 11.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 20, 2015 01:19 PM
    The fact that "distributed load" doesn't just work is what was bothering me.  This suggests it checks for load balance during generate when it should be checking for load balance at the time the job starts to run.  If this hypothesis is correct, then I would consider it to be a product design flaw.  And this would mean my SYNC idea wouldn't help either.  (This is intriguing to me, but I don't have time to run any tests.)

    Regarding SYNC, I had to spend hours playing with it and testing it before I felt I could trust it.  But now that I have that time under my belt, we are using it in multiple production solutions.  (I wanted to attach my model SYNC object, but I'm in IE10 and the "attach image/file" button isn't functioning.)


  • 12.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 20, 2015 01:28 PM
      |   view attached
    OK, I was able to attach my model SYNC object by using Chrome.  This model has four actions;
       START_EXCLUSIVE
       END_EXCLUSIVE
       START_SHARE  -- configured to allow 100 concurrent shares
       END_SHARE

    An example usecase;
    if an application updates an important datastore, I'll connect it to a SYNC object with the START_EXCLUSIVE/END_EXCLUSIVE actions.  The applications that want to use that datastore are connected to the same SYNC object with the START_SHARE/END_SHARE actions.  This avoids any attempts to read the datastore while an update is in flight.

    Attachment(s)

    xml
    ModelSync.xml   1 KB 1 version


  • 13.  AE v9: "Distributed load" host group for tasks starting simultaneously

    Posted Apr 20, 2015 01:47 PM
    Jessica:

    I think that you would be well served to gain experience with Sync objects.  They have a multitude of uses.  We started using them before External Dependencies, the Tasks running parallel option and Group objects (that actually worked) existed.  We still use them even though the product now has the aforementioned features.  They also have a number or other uses and it's a good tool to have in the toolbox.  As Pete said, spend a bit of time to exploring their capabilities, you won't be sorry!  ;)

    Pete:

    You might see if "compatibility" mode for this site fixes your attach issue.  You can just search "IE  10 compatibility mode" in the engine of your choice.