AutoSys Workload Automation

Expand all | Collapse all

CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

  • 1.  CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Mar 23, 2015 03:53 PM


    We are running primarily Linux and Windows agents on several hundred servers.  We have few servers specifically that seem to get this error once in a while.  Autosys will automatically retry the jobs in the following minute and they take off and run without any problem.  Here is what we have noticed so far:

    • Sometimes, this happens when another jobs tries to launch at the same exact second on the same server (i.e. we see these errors in pairs)
    • The agents are Linux Red Hat servers
    • We may go 3, 4 or 5 weeks between errors

    Anyone have any ideas how to shoot this problem, have you see it before and know how to address it.



  • 2.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Mar 23, 2015 08:37 PM

    Hi Mike,

    getpwnam() system call could fail if the max open files/file descriptors are exhausted.

    These commands show the "open files" limits.

    # ulimit -Hn (hard limit)

    # ulimit -Sn (soft limit)

    Please see if any message is logged in the /var/log/messages file on Linux when this happens.

    Are the user's (job owner) local to the Linux server, or from LDAP/NIS? getpwnam() could fail due to a temporary failure to lookup LDAP/NIS as well.

    I believe cron jobs use getpwnam() as well, if you have test system, it might be a useful troubleshooting step to setup multiple cron jobs to kick off at the same time as the same user and see if the issue occurs there too.

     

    Thank you,

    Chandru



  • 3.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 20, 2017 10:42 AM

    never ran job in production chandru have you ? 

    this is a sure way of getting fired in many firms.

    all servers are not created equal. on the mainframe you can never blame the computer, mostly.

    windows/unix/macs  every machine can/may behave different. if it is only happening on one server it is very difficult to replicate the same way.

    assuming cron does the same is not the same as knowing and proving.. 

    just my 3 cents on this comment.. answers need to me a wee bit more thoughtful. 

     

    Thank you -- and yes i am in a special mood today... 



  • 4.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 20, 2017 05:57 PM

    Never had the luxury of waiting on UNIX team to do the research and fix, rather. Not all gets the privilege to work for a company that has dedicated AutoSys SMEs and not all get to spend all their career on one technology and  claim authority over it (Autosys, scheduling, whatever). Glad it was that way for me, for I could learn what I have.

     

    Coming to this topic, if one gets fired for simply checking ulimit values (Not changing) and logs (if permitted) or running tests on "test machine", then I am lucky not to be working for one of such companies.



  • 5.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 21, 2017 07:24 AM

    Chandru,

     

    No one said luxury. and as for SME etc, with a tag of ca and answering autosys questions the perception is you are an SME. I am a programmer first, junior SA/DBA second i know enough about networking to get answers. I support a myriad of products in the workload suite. and I learn, new tech  as well. 52 years on this planet and more than half that in technology has earned me to right to CAUTION certain advice or process. 

    To this particular issue at hand, it is either a fault with the product (HIGHLY UNLIKELY, BUT possible) or a fault with the single machine and may not replicate on a dev server.

    Replicating it in the manner you request is proving a deficiency on the machine, which is fine but it may not occur on the dev server, then what? It still could be the machine as we indicated.

     

    Scientific method dictates:

     

    1.) Rule out the software

    2.) Rule out the process being run 

    3.) and with decentralized machines (NON MAINFRAME) rule out the machine. 

     

    You may not remember this one BUT Scott and the rest will tell you that Autosys 4.5 had a pam issue occur due to a version of redhat where remote agent couldn't su(setuid) to another ID due to security changes that were undone in it's very next release. BUT it broke the product.

    IN that case it was BOTH the OS and the product, but how is one to find that ??? it wasn't easy. 

    The point of the comments herewith were to caution giving specific instructions to someone that may or may not have the expertise to follow or who has the privilege to do so. 

    Antony was explaining,  as was I, that many firms do not allow someone to run as root, access a machine etc. There is one firm in particular that is complete hands off, but i am NOT at liberty to divulge it due to old NDAs that I still honor though I am no longer a contractor. (My ethics won't allow it, unless of course that firm is no longer in existence)

    and many of CA's clients are setup that way and you work for a company that needs to support that ideology, and you are not alone, present company included that  wonders why certain firms behave the  way they do. Unfortunately as cogs many of us do not have a choice.

    We do not assume that you are not technical, do NOT assume we ONLY know Autosys, we explained that in the REAL world NOT EVERYONE is able to do as suggested. If you look at answers non ca people have given we give caveats and alternatives based on knowing how MANY different companies behave and there endeth the lesson. 

    As, we have just learned a little about you, please accept this as information to help you grow as well and better understand those in this forum

    This by no means was an argument, in my mind, but a well discussed technical and wisdom based discussion. Something that has been lost, but as my time wanes in the field , I feel it necessary to bring back the Programmer analyst's right to a technical argu.. discussion.. especially when it leads to better understanding and a better solution..

     

    Steve C.



  • 6.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 23, 2017 12:21 AM

    "We do not assume that you are not technical, do NOT assume we ONLY know Autosys, we explained that in the REAL world NOT EVERYONE is able to do as suggested. If you look at answers non ca people have given we give caveats and alternatives based on knowing how MANY different companies behave and there endeth the lesson. "

     

    Thank you, Steve! I do respect the expertise all community members bring in, regardless of their area of expertise; could be an expert DBA, Sys Admin, Network and even good-at-all-of'em like a few in this forum. CA Communities is a great source of learning for me. And I agree it is a different world out there from what we at CA have, but that does not give us the excuse to say "Sorry, you live in a different world, please fix it yourself". We try our best to understand the differences and bridge the gaps. In this sense, we at CA probably see more variety of environments (worldwide) than one SME sitting in one particular shop or having implemented/managed at a few shops. There is no substitute for hands-on, yes?

     

    Let's look at it from CA's perspective,  if you will, please....

     

    Customers first raise a question in the communities and most cases get answers that address that specific problem, some times they get guidance (like the one in this thread) and that guidance isn't authoritative (by the nature of Communities). These customers then raise a CA Support Case looking for a specific/an authoritative answer/response. They are not going to settle for  "This is not AutoSys, this is machine, talk to your SA or network or DBA". 

    So, the lesson does not end there for us at CA. We go on a process of elimination which may take days, weeks, months to deduce the root cause involving SysAdmins, DBA, network admins and who not. We own it till there is Along this journey we learn a few things specific to the environment and many things that are common and we are only trying to share what we learn from other customers, here .

    Of course, there is no one-size-fits-all  in this game.

     

    Thanks again for all your contributions to this forum and the CA-WAAE world in general!



  • 7.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 20, 2017 10:39 AM

    Instead of someone throwing you answers let's ask some questions: 

     

    Your machine may be under powered and over burdened. 

    How many processes are running at one time?

    what job types are running at the time this happens?

     

    ulimit may play a part but then again depending on the job type it maybe a parameter needed in agentparm.



  • 8.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 20, 2017 11:06 AM

    Ask your UNIX team to look into the problem.  And, don't get palmed off with "well it looks OK now".



  • 9.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 20, 2017 11:09 AM

    now that is good advice. antony. it's obviously the machine. 

    they can run a sar report and see whats up also. if it was resources that may capture it. 

    but this is a system issue more than it is autosys.



  • 10.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 20, 2017 11:30 AM

    Thanks to misguided cost cutting, managed services, off shoring and general dilution of skills, the days when your average AE admin was also a part time 'NIX/Windows admin, Networking expert, Security guru and DBA, are gone.  There are teams looking after everything now. 

    In this brave new world of IT, you have to draw the line around where your responsibilities end, dig your heels in fight to get the help you need.

     

    • If getpwnam() is failing, get the Sys admins to look.
    • Database problem? Call the DBAs.
    • Flaky connectivity, get the network folks to investigate.

     

    Don't try and fix it yourself - those days are over, you'll only get fired if you do.



  • 11.  Re: CAUAJM_E_20174 Could not find the user from UNIX function "getpwnam()". Exiting!

    Posted Apr 20, 2017 11:35 AM

    preaching to the choir Antony. I fight that every day when i was a contractor and when i am not. :-)

     

    no truer words: "Don't try and fix it yourself - those days are over, you'll only get fired if you do."

    (unless of course you were told you could ) 

    hence my discussion on book versus reality etc. :-)