|
|||||||||||
|
Chicago-Soft
|
Setting and Reporting TSO Service Levels by Cheryl WatsonThis article is reprinted with permission from Cheryl Watson's Tuning Letter, (c) Watson & Walker, Inc. The second part of the article, "Response Time Using RMF," will appear in our next issue. Setting and reporting TSO service levels is quite easy, because they're easily obtained from measurement sources. For example, RMF provides average response times for all TSO performance group periods in the type 72 (workload) SMF records. TSO/MON from LEGENT Corporation and provides response distributions, which are a better technique to use to report on TSO service levels. More detail on TSO/MON is provided a little later. BASIC MEASUREMENTS
Depending on the applications, each installation will have a different breakout of these transactions. Service level objectives are then set for each period. One site that has a heavy CADAM usage could only get 40% of their transactions completed in first period since over half the transactions were very long CADAM work. Periods are defined by a DUR (duration) parameter specified on the PGN statement in the IEAIPSxx PARMLIB member. The DUR parameter specifies the total number of service units to be accumulated before a transaction moves into second period. Figure 1 - TSO Period Definition PGN=2,(DMN=1,DP=F41,DUR=800) (DMN=1,DP=F4,DUR=2000) (DMN=2,DP=M4) Figure 1 shows a sample TSO definition, where a transaction is considered to be a short transaction (and gets a dispatch priority of F41) until it has accumulated 800 service units. It will then become a period 2 transaction receiving a dispatch priority of F4 for the next 2000 service units when it moves to third period. It will then remain in period 3 at a priority of M4 until it completes. If the duration is increased, transactions will stay in first period longer and therefore more transactions will complete in that period. The response times will also show an increase. Decreasing the DUR value results in transactions moving to the next period faster. Figure 2 (below) shows an RMF Workload extract showing the statistics for a set of TSO transactions. You see that 1,335 transactions completed in first period, or almost 94% (1135 / 1421) of all transactions. The rightmost column shows the average response time of 1.824 seconds, with a standard deviation of 3.669 seconds. If this is considered to be normal processing (based on other time periods) with everyone satisfied, you could set a TSO service objective of 90% (or more) of all transactions completing in first period with an average response time of 1.9 seconds. Actually, I would find it hard to believe that the users on this system were happy. Notice that second period has an average of over 7 seconds. For this system, I would probably decrease the DUR parameter for the first period. Figure 2. Tso Workload Data * PERFORMANCE * ... AVERAGE ABSORPTION, AVERAGE ENDED AVG TRANS GROUP GROUP AVG TRX SERV RATE, TRANS, TRANS, TIME/ STD DEV NUMBER PERIOD WORKLOAD LEVEL MPL #SWAPS HHH.MM.SS.TTT 002 1 ABSRPTN =2,171 .41 1335 000.00.01.824 TRX SERV=1,765 .33 1841 000.00.03.669 002 2 ABSRPTN =1,223 .07 73 000.00.07.123 TRX SERV=1,222 .07 76 000.00.02.948 002 3 ABSRPTN =926 .07 13 000.00.12.283 TRX SERV=926 .07 13 000.00.09.291 002 ALL ABSRPTN =1,833 .56 1421 000.00.02.192 TRX SERV=1,582 .48 1930 000.00.04.015 So how is this data really accumulated? The information for TSO is collected and reported each RMF interval and recorded in the type 72 record. At the end of each transaction, the internal response time for each transaction is added to an accumulator, the square of the response time is added to an accumulator to later estimate the standard deviation, and one is added to the transaction count. When the record is written, the total response times are divided by the total transactions to obtain the average response time. The "sum of the squares" is used to estimate the standard deviation. This is not a true standard deviation, but an indication of the variability of the response times. Large standard deviations indicate a wide variance from the average. Small standard deviations indicate fairly consistent response times. VARIANCES
CNTCLIST: SRM contains a parameter in SYS1.PARMLIB member IEAOPTxx called CNTCLIST. The default value of NO indicates that SRM should treat any CLIST as a single transaction. A value of YES indicates that each command in the CLIST should be treated as a single transaction. As an example, assume a CLIST has 10 commands and takes 10 seconds. If CNTCLIST=NO (default) is specified, RMF would report 1 transactions and an average 10 second response time. If CNTCLIST= YES, RMF would report 10 transactions and an average 1 second response time. Changing this parameter will change both the meaning of "a transaction" and the reported average time. I prefer the default of NO because it more closely represents what the user sees and takes less overhead. Several installations, however, use YES because it improves CLIST response times. DUR: As mentioned earlier, you can increase the number of transactions completing in a period by increasing the value of DUR. Decreasing it decreases the number of transactions completing in that period. You'll want to change this parameter in order to meet your service objective of a specific percent completing in first period. If DUR is changed for another reason, it will affect the percent of transactions completing in first period. MSO: If the MSO service definition coefficient (SDC) is set to anything other than a minimal number (e.g. 0.1), then MSO (main storage occupancy) contributes too heavily to the total service units. Installations using an MSO SDC of 3.0 will accumulate over 50% of their total service units due to memory. If MSO plays a large role in total service units, then any change in the amount of storage can affect how rapidly transactions accumulate service units, and therefore how many transactions complete within first period. To eliminate this variability, you should use a much smaller MSO SDC, such as 0.1, 0.01, or 0.0 (only if at SP 4.2 or later). RTO: Some sites use the TSO RTO (Response Time Option) to restrict the TSO load. Use of RTO carries some political implications and is very controversial, but can be extremely valuable after an upgrade. It serves to provide very consistent response timessystem were happy. Notice that second period has an average of over 7 seconds. For this system, I would probably decrease the DUR parameter for the first period. Figure 2. Tso Workload Data * PERFORMANCE * ... AVERAGE ABSORPTION, AVERAGE ENDED AVG TRANS GROUP GROUP AVG TRX SERV RATE, TRANS, TRANS, TIME/ STD DEV NUMBER PERIOD WORKLOAD LEVEL MPL #SWAPS HHH.MM.SS.TTT 002 1 ABSRPTN =2,171 .41 1335 000.00.01.824 TRX SERV=1,765 .33 1841 000.00.03.669 002 2 ABSRPTN =1,223 .07 7his delay can be due to the physical swap-in time and/or the RTO delay. A common calculation often used for an estimate of swap-in delay is: |
|
||||||||
home · current
articles
· archives · forums · |