Mainframe Cybersecurity & Compliance

 View Only

 Is there a restriction to issue a "TSS LIST" command at a time

  • Mainframe Cybersecurity & Compliance
  • Top Secret
Necmettin ILGIN's profile image
Necmettin ILGIN posted Mar 25, 2021 12:48 PM
Hello,

We have taken the IRONSPHERE product to one of our productions LPAR. Once we start the checks, HZSPROC HC user is issuing lots of "TSS LIST, WHOH, WHOOWNS" commands causing us not to be able to issue TSS commands on that LPAR. It makes the system busy and we can not see the output of our tss commands for a while. Again I submited the commands "tss list(hzsproc) segment(certdata), tss whoh trace" commnands as a batch job while the checks were running on one LPAR, this time it took 1.30 minutes for the execution. 

Can we only issue one "TSS LIST" command at a time? Or what causes to this problem ?

Thanks..
#TopSecret
Joe Denison's profile image
Joe Denison

Root cause of performance issues with TSS commands can sometimes be difficult to pinpoint.

I’d start with performing the TSS MODIFY ST command, and looking for some key items related to performance.  Please run that, and then report back with a reply that includes the following components, and then I’ll comment further:

  1. SHRFILE (it will be either YES or NO)
  2. CACHE (it will be either ON or OFF).
  3. CACHE STATISTICS. These include numbers about the size of cache, and number of times cleared.
  4. COMMAND PROCESSOR WORKLOAD BALANCE.   This shows how many command processors you have active (default is 5).
  5. STATISTICS BY COMMAND. This reports how many of each TSS command has been processed since TSS was last started.
  6. CPF (this will show as INACTIVE unless you are using CPF).

In case you are not comfortable with sharing your values, here is an overview of what I’d be looking for:

  1. SHRFILE. This is one of the biggest factors of TSS SECFILE performance in my experience.  If you are sharing the TSS SECFILE across multiple LPARs, then pretty much every TSS LIST and WHOHAS command will result in secfile I/O – nearly 3x as compared to without it.   This is true even if you are not really sharing the secfile across multiple systems but just have the setting set to YES.
  2. CACHE. Make sure you see CACHE(ON) (it appears early in the ST output).  This ensures that TSS LIST and WHOHAS commands don’t cause physical I/O (except for the 1st time a given ACID is accessed).   Generally, if you run a TSSCFILE during a batch cycle, during off hours, this results in all ACIDs being LISTed and therefore loaded into cache … subsequent TSS LIST/WHOHAS should require no I/O and perform extremely well.
  3. CACHE STATISTICS. This appears a couple of pages down in the MODIFY ST output.   Look to see the value in the “Cleared” field.  If your MAXSIZE is adequate, then the “Cleared” value should be 000000000.  If it is not zero, increase your MAXSIZE to accommodate your entire TSS SECFILE contents.  With today’s z/OS configurations, there’s almost never a reason *not* to do this.
  4. COMMAND PROCESSOR WORKLOAD BALANCE. This appears just below the Cache Statistics.   You’ll see the total commands issued, and how many command processors are available.  You’ll also see how much of the workload that each command processor has handled.   Most local commands are accommodated by the “Cmd 03” processor, but can spill over to other command processors when more than one user is issuing a command at the same time.  Also, CPF’d commands (even local ones if you specify a TARGET command) are always handled by the “Cmd 01” processor so that they can be single-threaded.  Note that CPF’d commands are handled sequentially, and any number of them may be queued up to run via the CPF Recovery File, so it is also possible that when you enter a command, it may have to wait for other commands that are queued for processing.  In that case, it’s possibly not your command that is taking a long time, but rather it is other commands that your commands are queued behind.  (Note:  these are just observations on my part … there may be other factors in play).

Note that the TSS WHOHAS TRACE (or WHOHAS on any attribute, facility, etc.) is a pretty resource-intensive function, as there are no indexes for these items.  (that’s just my experience…possibly they have improved this??).  So, the TSS address space needs to crunch through a lot of data to collect this information.  And because of that, you’ll want to be sure that your TSS address space is in a service class that gives it adequate CPU resources to perform this processing.  If you are running short on CPU capacity, that can be problematic and may cause other tasks to step out of the way for this to be completed.  Another control option to check in your MODIFY ST output regarding this is SWAP … make sure this is set to SWAP(NO).   I’ve never seen it set to SWAP(YES).   Generally TSS is so critical to responding timely for system throughput that you’d want it to be running as non-swappable. 

If you don’t find one of these items looking suspect on your system, please post the information that you’ve seen and I can give additional pointers.   


#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hello Joe,

Thanks a lot for your answer. Now I am sharing the control option displays of the desired outputs to clarify the problem,
1)First, we are sharing our security file accross the other lpars ,"SHRFILE(SECURITY,AINDXPER)";

2) CACHE(ON)

3) ------- CACHE STATISTICS -----
MAXSIZE ( 000040000K ) Size ( 000026758K )
Calls ( 006539436 ) Satisfied ( 006429390 )
Cleared ( 000000000 ) 

4)------ COMMAND PROCESSOR WORKLOAD BALANCE -----
Total Commands Issued = 0000000935
Cmd 01 = 000.00% Cmd 02 = 000.00%
Cmd 03 = 099.89% Cmd 04 = 000.10%
Cmd 05 = 000.00% 

5)------ STATISTICS BY COMMAND -------
CREATE (000000000) DELETE (000000000)
ADD (000000056) REPLACE(000000001)
RENAME (000000000) REMOVE (000000006)
PERMIT (000000125) REVOKE (000000004)
WHOOWNS (000000000) WHOHAS (000000064)
LIST (000000653) HELP (000000000)
LOCK (000000000) UNLOCK (000000000)
WHOAMI (000000000) MODIFY (000000009)
ADMIN (000000007) DEADMIN(000000000)
MOVE (000000001) REFRESH(000000010)
GENCERT (000000000) GENREQ (000000000)
EXPORT (000000000) CHKCERT(000000000)
MLWRITE (000000000) REKEY (000000000)
ROLLOVER(000000000) 

Lastly CPF is inactive and SWAP(NO).

Kind Regards,
Necmettin.
#TopSecret
Joe Denison's profile image
Joe Denison
Hi Necmettin,

Thank you for posting these values.  Next I would turn attention to the secfile I/O since you are using the SHRFILE option to share between multiple systems.  Can you please run the TSS MODIFY ST again and locate the items that show the numbers associated with performance activity (identified by message ID TSS9590I-TSS9595I in the output) and post those values in a reply.   

Also, if you are using SECCACHE, please post the info related to that (it will appear in the CACHE STATISTICS output of the MODIFY ST command).
​​
#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hi Joe,

Messages between TSS9590I-TSS9595I are:
 
TSS9590I Init(010470480) Xreq(018384305) Mvs(906270763)
TSS9591I Viol(000245959) Exec(081146873) Smf(619018205)
TSS9592I Chng(000000000) Recv(000000209) Aud(000000000)
TSS9593I Read(000000000) Writ(000007290) Hwm(000000026)
TSS9594I ÖReq(000000000) Lock(000000001) Lwt(000000000)
TSS9595I Mwrt(000000000)

We are using seccache, below is the info related to it:

------------ SECCACHE STATISTICS ------------
  Data Size 00094371840        In Use 00017186640 % Used 018
 Index Size 00000030000       In Use 00000003859  % Used 012
 SHR Wait  00000000000 EXCL Wait 00000000008
          Gets 00007476413     Satisfied 00007442679 % Found 099
         Adds 00000013476       Deletes 00000000000 Exp Unt 000
NOSPACE 00000000000  NOINDEX 00000000000 Warn % 090
   Low Rcd 00000000624    High Rcd 00000265552 Avg Rcd 00000004298

I am waiting for your valuable comments..

Regards,
Necmettin.
#TopSecret
Joe Denison's profile image
Joe Denison
SECCACHE looks good, but the other values are not what I would have expected.   When using SHRFILE, I would expect to see high numbers (other than 0) in the READ and the LOCK items of those stats, and substantially higher WRIT numbers.  The LOCK(1) is what I would expect to see on a system that has SHRFILE(NO)?  The READ(0) is not a value I've encountered before on a system that's been up and running for a while ... which it seems to have been given the 10 million+ inits and 619 million+ (wow!) SMF records written.  And I suppose Chng(0) could indicate simply that on this specific system, no administrator changes have been performed?  Do any other systems that share your SECFILE have other stats to report on these items?

Are you using the coupling facility for the SECFILE?  (that may explain the numbers above, I don't have experience with utilizing the coupling facility and the resulting effects on these stats). 

I was turning focus to SHRFILE, as using it requires substantial overhead to check for changes on (other shared systems) with each TSS command and each INIT.   There are various options to improve performance of this overhead with caching and the coupling facility, but *eliminating* the overhead altogether (i.e., turning off SHRFILE and using CPF to keep the SECFILES in sync) will always win in performance benchmarks of TSS, in my experience.   But implementing that change is not trivial.
#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hi Joe,

I have checked the lpar which we've run the HZSPROC checks, it seems totally different. Above, I had shared the different lpar statistics considering it does not change so much. Now I am sharing the cache statistics of the lpar which checks are run on:

TSS9590I Init(021954518) Xreq(037341716) Mvs(586843680)
TSS9591I Viol(000745240) Exec(191643087) Smf(146808770)
TSS9592I Chng(000000000) Recv(000000050) Aud(000000000)
TSS9593I Read(000000012) Writ(000007875) Hwm(000000022)
TSS9594I ÖReq(000000000) Lock(000000001) Lwt(000000000)
TSS9595I Mwrt(000000000)

TSS9573I ----- COUPLING FACILITY STATISTICS -----

Maximum Size ( 80000K) Size ( 80896K)
Max Entries (000006314) Entries (000005760)
Requests (017744437) Hits (017744425)
Writes (000005579) Clears (000000000)
Locks (021996204) Waits (000102711)
Unlocks (021893493) EXCPs (001217317)

------ CACHE STATISTICS -----
MAXSIZE ( 000040000K ) Size ( 000026971K )
Calls ( 013918684 ) Satisfied ( 013584180 )
Cleared ( 000000000 )
------------ SECCACHE STATISTICS ------------
 Data Size 00125829120        In Use 00033873832 % Used 026
Index Size 00000040000       I n Use 00000009492 % Used 023
SHR Wait 00000000001  EXCL Wait 00000000007
        Gets 00015379908      Satisfied 00014879960 % Found 096
       Adds 00000458011        Deletes 00000000000 Exp Unt 000
NOSPACE 00000000000 NOINDEX 00000000000 Warn % 090
 Low Rcd 00000000624     High Rcd 00000265552 Avg Rcd 00000003484

------ COMMAND PROCESSOR WORKLOAD BALANCE -----
Total Commands Issued = 0000006978
Cmd 01 = 004.51% Cmd 02 = 001.28%
Cmd 03 = 065.67% Cmd 04 = 021.53%
Cmd 05 = 006.97%

------ STATISTICS BY COMMAND -------

CREATE (000000001) DELETE (000000000)
ADD (000000004) REPLACE(000000000)
RENAME (000000000) REMOVE (000000005)
PERMIT (000000002) REVOKE (000000000)
WHOOWNS (000000045) WHOHAS (000002123)
LIST (000004695) HELP (000000000)
LOCK (000000000) UNLOCK (000000000)
WHOAMI (000000000) MODIFY (000000073)
ADMIN (000000000) DEADMIN(000000000)
MOVE (000000000) REFRESH(000000024)
GENCERT (000000000) GENREQ (000000000)
EXPORT (000000000) CHKCERT(000000000)
MLWRITE (000000000) REKEY (000000000)
ROLLOVER(000000000)
MODIFY FUNCTION SUCCESSFUL


This time you can see how much greater  "the list and whoh" commands are from other lpars. This time I have shared the coupling facility statistics also and yes we use the coupling facility structure for the security database. And while some values stay the same, others are so much different. We can see command processor workload balance is much more different from other lpars in Production system. Commands are scattered among different processors. Again seccache has a greater "% used" value from other lpar. And Read(12), Write(7875) this time. What do you think about these statistics say ?
#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
I have checked another lpar which our security admin performs some administration tasks like creating and deleting acids, I've seen that Chng(000000136). I mean we are doing some administrator changes to our system sometimes :)
#TopSecret
Joe Denison's profile image
Joe Denison
Thanks for these additional stats.  Looks like the coupling facility is taking on the lock/unlock processing ... as you can see the 42 million + lock+unlock requests.   That *seems* high, but over what timeframe were these stats accumulated?  If just a day or so, that may be more significant than if you've had this system up a couple of weeks or longer.   Using the CF in generaly should provide a marked improvement over SECFILE I/O, but again, I lack the hands-on with the coupling facility, so I'm not sure of the significance of the waits stat.  It doesn't seem high as a percentage, but I would guess the lower the better, and it depends on how long of a duration each wait condition is.  It may be worthwhile to have someone assess the config of your coupling facility to make sure there isn't a bottleneck there, at least so that you can verify whether or not the lock/unlock processing is the bottleneck.

With so many command processors showing usage (given that the overall TSS command usage is not that high), I would be curious what types of commands are being run.   Some TSS LIST commands are very short and quick (listing a single ACID, for example) and result in minimal overhead.   Others, where TSS LIST(ACIDS) is used, can generate an extremely long-running command, causing other TSS commands to overflow to other command processors.  

And again, I would suggest someone check the service class that TSS is running in ... if it's taking the back seat on CPU resources to other processing, that may be a factor as well.   And, low CPU DP on your TSO user ID can be an as well, but if waiting for a TSS command response is the only issue you notice on your TSO session, you can probably rule that out.

Like I said at first ... there are soooo many factors in play, it can be difficult to pinpoint.
#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
I don't know the actual timeframe for these statistical accumulations, but I know that it is more than a couple of weeks of course. And you are right, there are 13 number of TSS LIST(acids) commands in the checks as you guessed besides the single acid list commands. I will take consideration your evaluations. Thank you very much for your support.
#TopSecret
Joseph Porto's profile image
Broadcom Employee Joseph Porto
Necmettin,

To answer your question, you can issues more than one TSS LIST at a time.

I believe Joe Denison has found the answer to your TSS performance issue.

Delay in the processing of admin commands can be used by a large amount of work being done by TSS like processing reports, processing a large number of TSS admin commands and processing TSS LIST(ACIDS) commands.

TSS LIST(ACIDS) require a lot of I/O, memory and CPU. You are essentially dumping the contents of the security file. The more acids the more I/O, memory and CPU will be required to process the TSS LIST request. TSS LIST(ACIDS) DATA(ALL,PROFILE) is the most CPU and I/O intensive of the TSS LIST(ACIDS) commands since it dumps every acid, lists the contents of every acid and TSS LISTs every PROFILE attached to the acid.

A few TSS LIST(ACIDS) DATA(ALL,PROFILE)  back to back on a large security file could bring TSS to a crawl, so these types of commands should not be issued frequently.

I suspect the 13 TSS LIST(ACIDS) commands Joe Denison found played a part in your TSS performance issues. 

Would you happen to know how often IRONSPHERE issues TSS LIST(ACIDS) and what is the exact syntax of the command they use?

Regards,

Joseph Porto - Broadcom Level 1 Support


Regards,
Joseph Porto - Broadcom Level 1 Support
#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hello Joseph,

HC user HZSPROC issues "TSS LIST(ACIDS) ...." commands 13 or 16 times and some of them are:

TSS LIST(ACIDS) DATA(ADMIN,BASIC)
TSS LIST(ACIDS) DATA(BASIC)
TSS LIST(ACIDS) DATA(CICS)
TSS LIST(ACIDS) DATA(PASSWORD,EXPIRE) TYPE(USER)
TSS LIST(ACIDS) DATA(XAUTH)
TSS LIST(ACIDS) DATA(XAUTH) RESCLASS(ALT-ACID)
TSS LIST(ACIDS) DATA(XAUTH) RESCLASS(PROGRAM)
TSS LIST(ACIDS) DATA(XAUTH) RESCLASS(VOLUME)
TSS LIST(ACIDS) TYPE(USER) DATA(BASIC)
TSS LIST(ACIDS) TYPE(USER) DATA(NAMES)

As I know we are running checks once a week starting at 06:00 am when it is not rush hours. At that time tss is only busy with the checks so we are not being affected from that. And as you said we try not to issue like that commands if it is not necessary.

Regards,
Necmettin.
#TopSecret
Joseph Porto's profile image
Broadcom Employee Joseph Porto
Necmettin,

Do you have CA LDAP running? TSS LIST(ACIDS) can be issued to TSS through CA LDAP also by 3rd party products and Broadcom products. CA LDAP allows you to talk to TSS to read and update the TSS security file.

Do you have LOG(CMDA) set in your TSS control options file TSSPARM? This will log TSS admin command usage. Run a TSSUTIL CLASS(o) 'lowercase o' to see a report of TSS admin commands being issued and documented here:

https://techdocs.broadcom.com/us/en/ca-mainframe-software/security/ca-top-secret-for-z-os/16-0/administrating/specifying-control-options-to-modify-your-security-environment/log-control-event-and-command-logging.html

TSSUTIL CLASS(o) is documented here:
https://techdocs.broadcom.com/us/en/ca-mainframe-software/security/ca-top-secret-for-z-os/16-0/reporting/tssutil-utility/tssutil-report-selection-criteria.html

If you continue to have problem, please issue a 'F TSS,SVCDUMP' on the console to produce a console dump, open a ticket at support.broadcom.com, and attach the dump to the ticket. Please also request that I be assigned it because I am already aware of the problem, so you dont have to re-explain your problem.

The dump will show why the slow down. More than 90% of similar cases like this are caused by someone issuing a TSS LIST(ACIDS) command.

Regards,
Joseph Porto - Broadcom Level 1 Support




#TopSecret
Terminator's profile image
Terminator
  • Hello
    TSS LIST Cmd can only extract data from the live secfile and may interfere with other tss processing 
    But you may use the TSSCFILE clone   TSSCFBK  to extract data  . I dont know wheter ironsphere can do this instead of tss list
     
     TSSCFBK produces customized reports with information extracted from the backup CA Top Secret security file.
Important!
 When running TSSCFBK, the file you are using must have the same encryption key as the active security file. 
Running TSSCFBK against the backup file (as an alternative to using TSSCFILE) can improve system performance by limiting contention for the primary security file. For example, running TSSCFILE with TSS LIST(ACIDS) DATA(ALL) can produce significant I/O activity against the security file and cause delays in other processes that need the file. Running TSSCFBK avoids imposing this stress on the primary security file.

T.


#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hello Joseph,

Sorry for delay. We are not using CA LDAP but thanks for your informative explanations. We check our admin commands by running PGM=TSSAUDIT from our RECFILE as I know. Also CMDA is enabled. I haven't tried SVCDUMP yet but when I try, I will inform you then...

Hello Terminator,

I will share your advice with my teammates also. Thank you very much..

#TopSecret
Terminator's profile image
Terminator
what is your timelock setting? does it default?
#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hello,

Yes it has default values. But should we change them?
#TopSecret
Terminator's profile image
Terminator
Hello
our timelock setting in production is TIMELOCK(5,320,640,6000) 
this was set 10 years ago, because after migrating to a new cics release ts 4.2 we had locking problems  - longer transaction response times  especially if a tsscfile was running. 
we´ve done various things to tackle this problem.  isolating secfile, put lock into cf (option 61) and cf cache, activate seccache and change timelock values

this was helpful to lower response times

putting lock into cf lowered tsscfile report time list acids password  from 65 to 38 sec.  or  acids basic,password   about 170 seconds to 65 seconds
timelock also made transaction response time better.


the default was lastly changed 2001  - so this was for 2001 dasd speed? can broadcom comment on what values should be set??

PRODUCT: CA-TOP SECRET MVS RELEASE: 5.2
 
 APAR #: QO01995 DATE: 25 SEP 2001
 
 PROBLEM DESCRIPTION: BEA4838: CHANGE DEFAULTS FOR TIMELOCK
  ---------------------------------------------------
 
  STARTRAK PROBLEM: 4838
  ABSTRACT: CHANGE DEFAULTS FOR TIMELOCK
 
  DESCRIPTION:
  Because of improved device speeds, the defaults for the
  TIMELOCK parameter can result in additional performance
  problems if there is contention for the security file lock
  when sharing a file across multiple systems.
 
  The new defaults will be:
 
  TIMELOCK(25,64,128,1200)
 
  SPECIAL INSTRUCTIONS:
  LLA Refresh, Restart TSS
#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hello,

Thanks for the answer. I think we should look at the timelock values again. And you are right It has been 20 years since this timelock values were published. As I understand when we decrease the first value so that the response time reduce, we will increase second, third and fourth values.

Kind Regards,
Necmettin.
​​
#TopSecret
Terminator's profile image
Terminator
yes, in a test system i use even lower values for 1st subparm, TIMELOCK(3,533,1066,9999)      . i think 9999 is the imit for these settings

does anybody use timelock(1,....)   ? 

#TopSecret
Necmettin ILGIN's profile image
Necmettin ILGIN
Hello again after a long time

I want to ask one more question, how could it effect the CPU usage of mainframe systems to decrease the first value of Time Lock parameter in order for the response time to reduce? Did you have possibility to test it?

Thanks...
#TopSecret