The Peningo Tivoli Consultants Blog: Tivoli Business Service Manager Archives

[This article is sponsored by Peningo Systems, Inc., a provider of Tivoli Consulting Services on a nationwide basis. For more information on Peningo Systems, please go to the Peningo Tivoli Consultants page. ]

The IBM DeveloperWorks site is an excellent repository of information and resources regarding various IBM offerings and systems.

We recommend this article, that was recently released on Developerworks, to any Tivoli Consultant who is involved in the implementation and performance tuning of the Tivoli Business Service Manager.

This paper includes performance and tuning recommendations for IBM Tivoli Business Service Manager (TBSM) version 4.2.

· 1 Overview

· 2 TBSM 4.2 and WebSphere Application Server tuning

o 2.1 Identifying current JVM settings within TBSM 4.2

o 2.2 Enabling Java Virtual Machine (JVM) Garbage Collection (GC) logging

o 2.3 Running a representative workload

o 2.4 Analyzing the GC Logs for TBSM

· 3 Additional Dashboard tuning suggestions

· 4 Client side Java Virtual Machine tuning

· 5 PostgreSQL database and the Discovery Library/XML toolkit

o 5.1 Specific PostgreSQL tuning parameters

o 5.2 Vacuuming the TBSM database

· 6 Final thoughts about TBSM 4.2 performance

· 7 Hardware for production environments

· 8 References

· 9 Trademarks

· 10 Copyright and Notices

Overview

IBM® Tivoli® Business Service Manager (TBSM) 4.2 delivers technology for IT and business users to visualize and assure the health and performance of critical business services. The product does this by integrating a logical representation of a business service model with status-affecting alerts that are raised against the underlying IT infrastructure. Using browser-based TBSM Web Consoles, operators can view how the enterprise is performing at a particular time, or how it performed over a given period of time. As a result of this, TBSM delivers the real-time information that you need to respond to alerts effectively and in line with business requirements, and optionally to meet Service Level Agreements (SLAs).

Given the size of today's large business enterprises, TBSM must be able to represent and manage the status and related attributes of very large business service models. To enhance scalability, TBSM 4.2 divides the previous TBSM 4.1.x server architecture into two separate servers, referred to in this paper as the "Data server" for back-end processing, and "Dashboard Server" for front-end operations.

For reference, the Data server maintains the canonical TBSM business service model representation, processing events from various sources, and updating service status based on those events. In this role, it interacts with various data stores.

The Dashboard Server, by contrast, is primarily responsible for supporting the user interface. It retrieves service information from the Data server as needed to support the user interactions.

TBSM 4.2 is primarily processor dependant (the number and speed of processors being two of the key factors) as long as sufficient memory is configured for the TBSM Java_™_ Virtual Machines (JVMs). It is important to be aware of the minimum and recommended hardware specifications (See Section 6) for an optimal user experience.

To that end, the purpose of this paper is to describe some of the performance tuning capabilities available for you to use with the product, how to interpret and analyze the results of performance tuning, and to suggest some recommendations for installing and tuning the product to achieve optimal scale and performance in your own unique TBSM environment.

TBSM 4.2 and WebSphere Application Server tuning

This release of TBSM uses an embedded version of the WebSphere Application Server 6.1 for the Data server and Dashboard Servers. Tuning WebSphere for TBSM 4.2 includes the following actions:

Identifying the current TBSM JVM settings
Enabling JVM Garbage Collection (GC) logging
Running a representative workload
Analyzing the GC log results
Tuning the JVM appropriately
Running the workload again (and again, if needed)
Reviewing the new results

The following statements are from the WebSphere 6.1 documentation on Java memory and heap tuning:

"The JVM memory management and garbage collection functions provide the biggest opportunities for improving JVM performance."

"Garbage collection normally consumes from 5% to 20% of total execution time of a properly functioning application. If not managed, garbage collection is one of the biggest bottlenecks for an application."

The TBSM 4.2 Data server and Dashboard Servers each run in their own JVM; subsequently, each has the capability to be independently tuned.

Of primary consideration is the memory allocation to each of the JVMs, bounded by two key values:

Initial memory (Xms)
Maximum memory (Xmx)

The Data server and Dashboard Server also use the default Garbage Collector (optthruput) for TBSM 4.2 that can be used without modification (with the exception of the Solaris Operating Environment, which uses a generational garbage collector instead). The following statement is from the WebSphere 6.1 documentation:

"optthruput, which is the default, provides high throughput but with longer garbage collection pause times. During a garbage collection, all application threads are stopped for mark, sweep and compaction, when compaction is needed. optthruput is sufficient for most applications."

Based on performance analysis of TBSM 4.2, the default Garbage Collector has proven quite capable, and is recommended in most cases, especially in environments where high event processing rates are needed. (For reference on the Sun Garbage collection algorithms, review the Sun JVM link provided in the reference section of this document.)

Most of the remainder of this paper explains how to efficiently size the TBSM 4.2 JVMs to allow the default garbage collection algorithms to most operate most efficiently.

To determine the Java version and level that is in use, run the following command:

$TIP_HOME/java/bin/java -version

In response to this command, the TBSM server writes information to the command line, including the JVM provider information and level of release. Knowing this up-front directs you to the correct parameters that follow in this document for Java™ Virtual Machine configuration.

A few considerations about JVM sizing and GC activity

Proper JVM heap memory sizing is critical to TBSM 4.2.

Memory is allocated to objects within the JVM heap, so as the number of objects grows, the amount of free space within the heap decreases. When the JVM cannot allocate additional memory requested for new objects as it nears the upper memory threshold (Xmx value) of the heap, a Garbage Collection (GC) is called by the JVM to reclaim memory from objects no longer accessible to satisfy this request.

Depending on the JVM and type of GC activity, this garbage collection processing can temporarily suspend other threads in the TBSM JVM, granting the garbage collection threads priority to complete the GC work as quickly and efficiently as possible. This prioritization of GC threads and pausing of the JVM is commonly referred to as a "Stop the World" pause. With proper heap analysis and subsequent JVM tuning, this overhead can be minimized, thereby increasing TBSM application throughput. Essentially, the JVM spends less time paused for GC activities, and more time processing core TBSM activities.

Identifying current JVM settings within TBSM 4.2

There are several ways to gather the WebSphere JVM settings in a TBSM 4.2 environment. One of the easiest (and safest) ways to do this is by leveraging a WebSphere command to create custom startup scripts for both the TBSM Data and Dashboard Servers.

To do this, run the following command from both the Data server and Dashboard Server /profile/bin directory (the servers can be up or down). For the TBSM Data server, run the following command:

./startServer.sh server1 -username [Dataserver_UserID] -password [Dataserver_UserID_password] -script start_dataserver.sh

The output of this command is a file named start_dataserver.sh in the same /profile/bin directory. Utilizing a custom start-up script allows your original WebSphere configuration files to remain intact, and provides a few unique capabilities you might want to leverage for performance tuning.

The following section is part of the start_dataserver.sh file that was created:

# Launch Command

exec "/opt/IBM/tivoli/tip/java/bin/java"  $DEBUG "-Declipse.security" "-Dosgi.install.area=/opt/IBM/tivoli/tip"

 "-Dosgi.configuration.area=/opt/IBM/tivoli/tip/profiles/TBSMProfile/configuration" "-Djava.awt.headless=true"

 "-Dosgi.framework.extensions=com.ibm.cds" "-Xshareclasses:name=webspherev61_%g,groupAccess,nonFatal" "-Xscmx50M"

 "-Xbootclasspath/p:/opt/IBM/tivoli/tip/java/jre/lib/ext/ibmorb.jar:/opt/IBM/tivoli/tip/java/jre/lib/ext/ibmext.jar"

 "-classpath" "/opt/IBM/tivoli/tip/profiles/TBSMProfile/properties:/opt/IBM/tivoli/tip/properties:

/opt/IBM/tivoli/tip/lib/startup.jar:/opt/IBM/tivoli/tip/lib/bootstrap.jar:/opt/IBM/tivoli/tip/lib/j2ee.jar:

/opt/IBM/tivoli/tip/lib/lmproxy.jar:/opt/IBM/tivoli/tip/lib/urlprotocols.jar:/opt/IBM/tivoli/tip/deploytool/itp/batchboot.jar:

/opt/IBM/tivoli/tip/deploytool/itp/batch2.jar:/opt/IBM/tivoli/tip/java/lib/tools.jar" "-Dibm.websphere.internalClassAccessMode=allow"

 "-Xms256m" "-Xmx512m"

Note that the last 2 arguments passed to the JVM are "-Xms256m" and "Xmx512m". These 2 arguments are responsible for setting the initial JVM size (Xms) to 256 MB of memory, and the maximum JVM size (Xmx) to 512 MB of memory.

Next, issue the startServer.sh command from above; however, this time, run it from the Dashboard Server /profile/bin directory. Also, change the name of the startup script argument to "start_dashboard.sh" as in the following example:

./startServer.sh server1 -username [Dashboard_UserID] -password [Dashboard_UserID_password] -script start_dashboard.sh

The output of this command is a file named start_dashboard.sh in the same /profile/bin directory.

Enabling Java Virtual Machine (JVM) Garbage Collection (GC) logging

To fully understand how the JVM is using memory in your unique TBSM environment, you need to add a few arguments to the start_dataserver.sh script as indicated to log garbage collection (GC) data to disk for later analysis:

# Launch Command: Dataserver

exec "/opt/IBM/tivoli/tip/java/bin/java" "-verbose:gc" "- Xverbosegclog:/holdit/dataserver_gc.log" "-XX:+PrintHeapAtGC"

"-XX:+PrintGCTimeStamps"  $DEBUG "-Declipse.security" "-Dosgi.install.area=/opt/IBM/tivoli/tip"

"-Dosgi.configuration.area=/opt/IBM/tivoli/tip/profiles/TBSMProfile/configuration" "-Djava.awt.headless=true"

"-Dosgi.framework.extensions=com.ibm.cds" "-Xshareclasses:name=webspherev61_%g,groupAccess,nonFatal" "-Xscmx50M"

"-Xbootclasspath/p:/opt/IBM/tivoli/tip/java/jre/lib/ext/ibmorb.jar:/opt/IBM/tivoli/tip/java/jre/lib/ext/ibmext.jar"

"-classpath" "/opt/IBM/tivoli/tip/profiles/TBSMProfile/properties:/opt/IBM/tivoli/tip/properties:/opt/IBM/tivoli/tip/lib/startup.jar:

/opt/IBM/tivoli/tip/lib/bootstrap.jar:/opt/IBM/tivoli/tip/lib/j2ee.jar:/opt/IBM/tivoli/tip/lib/lmproxy.jar:

/opt/IBM/tivoli/tip/lib/urlprotocols.jar:/opt/IBM/tivoli/tip/deploytool/itp/batchboot.jar:

/opt/IBM/tivoli/tip/deploytool/itp/batch2.jar:/opt/IBM/tivoli/tip/java/lib/tools.jar"

"-Dibm.websphere.internalClassAccessMode=allow" "-Xms256m" "-Xmx512m"

Note that the directory for GC log file data must exist prior to launching TBSM with the customized start_dataserver.sh script. For this scenario, a /holdit directory (with read/write access for the TBSM user ID) has already been created.

Important: For the Sun JVM (TBSM 4.2 on Solaris), the syntax for the GC log file location ("-Xverbosegclog:/holdit/dataserver_gc.log")
is different than the one used for the IBM version Use the following argument instead:

./startServer.sh server1 -username [Dashboard_UserID] -password [Dashboard_UserID_password] -script start_dashboard.sh

Repeat this procedure to edit the Dashboard Server custom startup script; however, change the log name from "dataserver_gc.log" to "dashboard_gc.log".

The log file names should be different both to distinguish between the two, and to ensure that the GC log data does not combine into one log if both TBSM Servers are installed on the same system. Combining both logs together renders the GC log file useless for matters of performance analysis and subsequent tuning.

For reference, the TBSM Dashboard Server script should resemble this:

# Launch Command: Dashboard Server

exec "/opt/IBM/tivoli/tip/java/bin/java" "-verbose:gc" "- Xverbosegclog:/holdit/dashboard_gc.log"

"-XX:+PrintHeapAtGC" "-XX:+PrintGCTimeStamps"  $DEBUG "-Declipse.security" "-Dosgi.install.area=/opt/IBM/tivoli/tip"

"-Dosgi.configuration.area=/opt/IBM/tivoli/tip/profiles/TBSMProfile/configuration" "-Djava.awt.headless=true"

"-Dosgi.framework.extensions=com.ibm.cds" "-Xshareclasses:name=webspherev61_%g,groupAccess,nonFatal" "-Xscmx50M"

"-Xbootclasspath/p:/opt/IBM/tivoli/tip/java/jre/lib/ext/ibmorb.jar:/opt/IBM/tivoli/tip/java/jre/lib/ext/ibmext.jar"

"-classpath" "/opt/IBM/tivoli/tip/profiles/TBSMProfile/properties:/opt/IBM/tivoli/tip/properties:

/opt/IBM/tivoli/tip/lib/startup.jar:/opt/IBM/tivoli/tip/lib/bootstrap.jar:/opt/IBM/tivoli/tip/lib/j2ee.jar:

/opt/IBM/tivoli/tip/lib/lmproxy.jar:/opt/IBM/tivoli/tip/lib/urlprotocols.jar:

/opt/IBM/tivoli/tip/deploytool/itp/batchboot.jar:/opt/IBM/tivoli/tip/deploytool/itp/batch2.jar:

/opt/IBM/tivoli/tip/java/lib/tools.jar" "-Dibm.websphere.internalClassAccessMode=allow" "-Xms256m" "-Xmx512m"

Running a representative workload

At this point, start the TBSM Servers (Data server first, then Dashboard Server as soon as the processor quiesces on the Data server). Next, proceed with a common scenario or representative workload in your environment to populate the GC logs for subsequent performance analysis. It can be a simple scenario that you would like to optimize, perhaps TBSM Data server startup.

Or, perhaps you want to tune a representative scenario as the following example illustrates for a steady-state workload captured over a 30 minute span of time.

First, record some notes on the TBSM environment configuration. The following scenario was measured for initial performance and subsequent tuning:

Service Model: 50 000 Service Instances, 4 level hierarchy, no leaf node with more than 50 children.

Initial heap size: (-Xms): 256 MB

Maximum heap size: (-Xmx): 512 MB

Dataserver Started: 9:57:00
Dataserver Started: 9:59:00

Workload Start Time: 10:09:00
Workload End Time: 10:39:00

For this reference scenario, the TBSM Data server was started with GC logging at 9:57:00. After the processor quiesced on the server (indicating that the Data server startup and initial Service Model processing had completed), the Dashboard Server was started and 50 unique TBSM Web Consoles were logged in.

After all consoles were started, each was set to a unique Service Tree and Service Viewer desktop session. Finally, a steady-state event workload using thousands of unique events (send by way of remote EIF probes) was introduced at 10:09:00, and continued until 10:39:00 when the event flow was stopped and GC log files immediately collected.

Also, while this workload was being processed, a "vmstat -n 15 120 >> vmstat_out.txt" command was run (on each TBSM Server), which collected CPU statistics every 15 seconds for a 30 minute period to a local file (for later analysis and review). After the workload was complete, these vmstat_out.txt files were also collected for review.

Analyzing the GC Logs for TBSM

To analyze the resultant GC log files, download the IBM Pattern Matching and Analysis (PMAT) tool from the IBM Alphaworks Web site:

*http://www.alphaworks.ibm.com/tech/pmat

Taken from the PMAT Web site:

"The Pattern Modeling and Analysis Tool for IBM® Java Garbage Collector (PMAT) parses verbose GC trace, analyzes Java heap usage, and recommends key configurations based on pattern modeling of Java heap usage... This information can be used to determine whether garbage collections are taking too long to run; whether too many garbage collections are occurring; and whether the JVM crashed during garbage collection."

Although there is an in-depth tutorial on the same Web site (See: Webcast replay - "How to analyze verbosegc trace with IBM Pattern Modeling and Analysis Tool for IBM Java Garbage Collector"), the following information is provided to expedite utilization of the PMAT tool within a Windows environment.

To analyze the GC log file that you collected, start the IBM PMAT Tool:

"C:\Program Files\Java\jdk1.6.0\bin\java" -Xmx128m -jar "C:\TBSM 4.2\Tools\IBMPMAT\ga31.jar"

Use this example and edit it as needed (substitute the location of your Java executable file and location of PMAT files). Note that the Xmx value of 128m limits the PMAT tool to use no more than 128 MB RAM on the system. If you have a number of very large GC log files, you might want to increase the Xmx value.

Review the PMAT Web site for other configuration details or a more in-depth walk-thru as needed. The following examples assume that the tool is correctly installed and ready for you to use.

Loading the GC log file

The following screen capture shows the initial screen of the PMAT tool.

Click the I folder to load an IBM generated GC log; the IBM version is used across all TBSM 4.2 platforms with the exception of the Solaris Operating environment which uses the Sun JVM. To open a Sun-generated log, click the N folder instead. This document assumes an IBM-generated log is used for the 30 minute steady-state scenario.

Navigate to the GC log that you want to analyze and select it. The PMAT tool processes the log, and displays the analysis and recommendations you can review.

Analyzing the initial Data server results

The following screen capture shows the result after a garbage collection log has been opened within the PMAT tool for the TBSM Data server.

Review the Analysis and Recommendations sections. For this scenario, the Analysis section indicates that no Java heap exhaustion was found, typically indicating that there is sufficient space within the JVM to satisfy required memory allocations. However, the Overall Garbage Collection Overhead metric notes that 20% of the application time was spent performing Garbage Collection activities, most likely indicating a need for tuning the JVM memory parameters.

To minimize the GC overhead, review the Recommendations section and assign additional memory to the JVM for more efficient processing of the workload. As the PMAT tool recommendation is to set the JVM Xmx value to approximately 678 MB or greater (and because the system has plenty of memory), a new value of 1024 MB was chosen as the new Xmx value (recall that the as-provided Xmx setting is 512 MB).

To make this change, do the following steps:

#. Edit the start_dataserver.sh script.

Change the Xmx value from "-Xmx512m" to "-Xmx1024m".
Change the "-Xms256m" to "-Xms512m", which is one half of the new Xmx parameter. Save the changes to the script.

Analysis - Initial Dashboard Server Results

The following screen capture shows the result after a garbage collection log has been opened within the PMAT tool for the TBSM Dashboard Server.

Next, load the dashboard_gc.log file and review the Analysis and Recommendations sections. For the Dashboard, the Analysis section indicates that no heap exhaustion was found. It also reveals that 10% of the application time was spent performing Garbage Collection activities, certainly not excessive, but some slight tuning might be beneficial.

To reduce the GC overhead for the Dashboard Server, again review the Recommendations section. As the PMAT tool advises a maximum JVM size of approximately 375 MB or greater (and the TBSM 4.2 default is already at 512 MB), a change might not be warranted. However, because the system has plenty of memory, an interesting decision is to choose 768 MB as the new Xmx value, with a new initial size (Xms) of 384 MB.

To make these changes, do the following steps:

Edit the start_dashboardserver.sh script.
Change the Xmx value from "-Xmx512m" to "-Xmx768m".
Change the "-Xms256m" to "-Xms384m", which is one half of the new Xmx parameter.
Save the changes to the script.
At this point, restart both servers, and rerun the same scenario as before. After it is complete, review the new GC logs in PMAT to determine changes in TBSM performance.

Reviewing the results after tuning: Data server

The following screen capture shows the result after the new garbage collection log has been opened within the PMAT tool for the TBSM Data server.

After the run is complete, load the new Dataserver_gc.log file into the PMAT tool and review the Analysis and Recommendations sections. For this "tuned" scenario, the analysis section again indicates that no Java heap exhaustion was found. However, Overall Garbage Collection Overhead is now calculated at 5% (down from 20% prior to tuning), representing a reduction of 15%. Less time spent in garbage collection essentially translates to more processer cycles available for application processing.

To illustrate the CPU savings for the Data server, the vmstat data for total processor utilization was collected and plotted in the following chart for the event processing workload of 30 minutes (Note that these CPU comparison results are not a guarantee of service or performance improvement; the performance of each unique TBSM environment will vary):

For the initial event processing workload, the average processor utilization was 43.4% of total CPU on the Data server system. After tuning, the same workload used an average of 18.3% of total processor utilization, a reduction of 57.9% of processor overhead. Also, an extended period of 100% processor utilization late in the run was almost entirely eliminated in the tuned environment.

Reviewing the results after tuning: Dashboard Server

The following screen capture shows the result after the new garbage collection log has been opened within the PMAT tool for the TBSM Dashboard Server.

Review the Analysis and Recommendations sections. For this "tuned" scenario, the analysis section again indicates no Java heap exhaustion. Garbage Collection overhead is now calculated at 9% (down from 10% prior to tuning), which seems to be a minimal gain.

However, an interesting metric to consider is the Total Garbage Collection pause, which is now 252 seconds, down from 288 seconds in the original (untuned) Dashboard Server scenario. As previously stated, application processing is essentially paused while some garbage collection activities occur. Although each of these pauses can range from several milliseconds to several hundred milliseconds each (spread over time and unique to each environment), a reduction in the total number of overall garbage collection time is another worthwhile metric to consider.

Finally, to illustrate the CPU comparison for the Dashboard Server, the vmstat data for total processor utilization was plotted in the following chart for the same event processing workload. Again note that these CPU results are not a guarantee of service or performance improvement; performance for each unique TBSM environment will vary):

For the initial event processing workload, the average processor utilization was 19.1% of total CPU on the Dashboard Server. After tuning, the same workload now used an average of 20.2% of total processor utilization, a minimal increase in total system processor utilization.

This illustrates an important concept: the larger the Xmx value, potentially, the more objects are loaded into JVM memory because there is additional memory space. Therefore, additional CPU processing is most likely needed by the GC threads to purge the larger JVM heap of unreachable objects.

While a minor gain in overall processor utilization was discovered using the larger Xmx setting, the cost savings of 36 fewer seconds (over the 30 minute period) spent in GC pause time might or might not be worth the trade off. However, in a production environment, you might want to consider conducting a longer scenario (perhaps over the course of a day) to determine if the larger setting or smaller setting is a better choice.

This is just a basic, but powerful example of how you can use the PMAT tool to become familiar with your own TBSM 4.2 environment. As each TBSM environment is unique for each customer, making educated tuning decisions by using the PMAT tool is a recommended strategy for performance tuning.

Additional Dashboard tuning suggestions

There are some additional considerations for tuning performance of the TBSM 4.2 Dashboard Server to reduce processing overhead. Review the following areas and consider implementting them based on the needs of your unique environment.

Service tree refresh interval: Changing the automatic Service tree refresh interval might help reduce server side workload (on the TBSM Dashboard Server) related to TBSM Web Console activity. The service tree refresh interval is set to 60 seconds by default.

The service tree refresh interval controls how frequently the TBSM Web Console requests an automatic service tree update from the TBSM Dashboard Server. If every client connected to the TBSM Dashboard is updated every 60 seconds, this might affect the Dashboard Server when there are a large number of concurrent consoles. To help mitigate this, you can increase the interval between refreshes.

To do this, edit the RAD_sla.props file in the $TBSM_DATA_SERVER_HOME/etc/rad/ directory:

#Service Tree refresh interval; in seconds - default is 60
impact.sla.servicetree.refreshinterval=120

Canvas update interval multiplier: The update interval multiplier helps to control the frequency of automatic canvas refresh requests initiated by the service viewer for canvases containing business service models of different sizes. The default multiplier is 30.

For example, loading a larger, 100 item canvas takes longer than loading a smaller, 50 item canvas. Because of this, the refresh intervals of the larger canvas should be spaced apart so that the canvas is not constantly in a refresh state. The TBSM Web Console accomplishes this by computing a dynamic refresh interval by taking the amount of time spent performing the previous load request and multiplying it by the update interval multiplier constant. So, if the large service model in this example takes 5 seconds to load, a refresh of the same model is not attempted for another 2.5 minutes (5 x 30 or 150 seconds).

When considering a change to this parameter, keep in mind that there are lower and upper boundaries of 30 seconds and 180 seconds for the refresh interval. As a result, the update interval multiplier is useful only to a certain point.

Nonetheless, you can easily update the interval multiplier parameter by editing the $TIP_HOME/systemApps/isclite.ear/sla.war/av/canvasviewer_simple.html file on the TBSM Data server and changing the value for all occurrences of the UpdateIntervalMultiplier property to the new value. Because this is a client side property, you do not have to reboot the server for this value to take effect. However, you might need to log out and log on to the TBSM console.

Client side Java Virtual Machine tuning

Within the client Web browser that hosts the TBSM Web Console is a JVM plug-in needed for running client side Java code. Just like a JVM running on either of the TBSM servers, the browser plug-in JVM can also be tuned to specify initial and maximum heap sizes. Typically, an untuned JVM plug-in has an initial heap size of no more than 4 MB and a maximum heap size of 64 MB, though these numbers can vary depending on the platform used.
The graphical Service Viewer is the function most affected by changes to JVM plug-in parameters. It might be possible to improve the performance of the Service Viewer by increasing the initial heap size (-Xms) to 64 MB and the maximum heap size (-Xmx) to 128 MB. Whether this configuration change is really needed or not depends on the size of the business service model that is loaded by the service viewer.
The procedure to change the JVM plug-in tuning parameters can be different depending on the provider of the plug-in (for example IBM or Sun) and also depending on the system (Windows or UNIX). As an example, the following procedure illustrates how to access and set the JVM plug-in parameters for the IBM-provided 1.5.0 plug-in on a Windows system:

Open Control Panel -> Java Plug-in.
Click the Advanced tab.
In the text box under Java Runtime Parameters, type the following value:
Xms64m -Xmx128m
Click the Apply button and then close the window.

After these changes are made, it might be necessary for you to log out and log back in to the TBSM console. For a complete list of the supported combinations of Java plug-ins, Web browsers, and platforms, see the TBSM Installation Guide.
Important: To change the JVM plug-in parameters on a supported UNIX system, navigate to the bin directory under the file system location to which the plug-in was installed and look for a shell script named either ControlPanel or JavaPluginControlPanel, depending on your Java version. Run this shell script to launch a GUI that looks similar to the equivalent interface on the Windows system.

PostgreSQL database and the Discovery Library/XML toolkit

A PostgreSQL database can be a very fast database, but the as-is configuration tends to be rather conservative. A few configuration changes to the postgresql.conf file can improve PostgreSQL performance dramatically. Note that these settings worked well in the performance test environment, and are provided as a starting point for your own unique environments.

Important: Back up your original postgresql.conf file before making any changes.

Specific PostgreSQL tuning parameters

Shared_buffers: Sets the number of shared memory buffers that are used by the database server. The default is typically 1000 X 8K pages. Settings significantly higher than the minimum are usually needed for good performance; values of a few thousand are recommended for production installations. This option can only be set at server startup.

Suggestion: shared_buffers = 16384

If editing this setting, also change the rad_dbconf file pg_buffer parameter in UNIX or Linux systems to the same value.

Work_mem: Non-shared memory that is used for internal sort operations and hash tables. This setting is used to put a limit on any single operation memory-utilization before being forced to use disk.

Suggestion: work_mem = 32000

Effective_cache_size: Sets the planner's assumption about the effective size of the disk cache that is available to a single index scan. This is factored into estimates of the cost of using an index; a higher value makes it more likely that index scans are used, a lower value makes it more likely sequential scans are used.

Suggestion: effective_cache_size = 30000

Random_page_cost: Sets the planner's estimate of the cost of a nonsequentially fetched disk page. This is measured as a multiple of the cost of a sequential page fetch. A higher value makes it more likely that a sequential scan is used, a lower value makes it more likely an index scan is used.

Suggestion: random_page_cost = 2

Fsync: To speed up bulk loads by way of the XML Toolkit, disable the fsync parameter in the postgresql.conf file as follows:

fsync = false # turns forced synchronization on or off

The fsync parameter sets whether you write data to disk as soon as it is committed, which is done through the Write Ahead Logging (WAL) facility. Do this only if you want faster load times; the caveat is that the load scenario mighty need to run again if the server shuts down prior to the completion of processing due to a power failure, disk crash, and so on.

Vacuuming the TBSM database

After completing a large bulk load, you should vacuum the TBSM Data server database to improve performance. A vacuumdb utility is provided with the PostgreSQL database that can be used to clean up database storage. Running this utility periodically or after a significant number of database rows change helps subsequent queries process more efficiently. The utility resides in the $TBSM_HOME/platform/<arch>/pgsql8/bin/vacuumdb directory and can be run as follows:

$TBSM_HOME/platform/arch/pgsql8/bin/vacuumdb -f -z -p 5435 -U postgres rad

The parameters for the vacuumdb command:

-f: The utility does a full vacuum
-z: The utility analyzes and updates statistics that are used by the query planner
5435: The port that the database process is listening on
Postgres: The user ID used to connect to the rad database
Rad: The database name

Important: The TBSM Discovery Library toolkit periodically vacuums the tables that are used by the toolkit. Control of this is handled with the DL_DBVacuum properties in the xmltoolkitsvc.properties file. For more information on these properties, see the Discovery Library toolkit properties. Depending on how often the toolkit imports data, the automatic vacuums might be sufficient.

Final thoughts about TBSM 4.2 performance

To review, TBSM 4.2 is primarily processor dependant (the number and speed of processors are two of the key factors); as long as sufficient JVM memory is configured (use the IBM PMAT tool to assist you in tuning TBSM 4.2 for your own workloads and environments). You must be aware of the minimum and recommended hardware specifications for an optimal user experience. The TBSM 4.2 minimum and recommended hardware tables are supplied in the Hardware for production environments section of this document for easy access and review.

Prior to beginning any in-depth performance tuning for TBSM, it review the trace_*.log files that are created by both the Data and Dashboard Servers. These logs are in the profiles /logs/server1 directory. Review any exceptions or error conditions to remove a functional issue from hampering overall application performance.

After functional processing is observed, two of the primary tuning "knobs" for TBSM 4.2 are the "Xms" and "Xmx" values that control the memory allocation for each of the TBSM JVMs.

For review:

-Xms256m // Sets the initial memory to 256 MB (default)
-Xmx512m // Sets the maximum memory size to 512 MB (default)

After the upper memory setting for Xmx is established (through PMAT analysis), a good rule of Java tuning is typically to set the initial memory allocation to half that of the maximum size. Again, one "size" does not fit all environments, so you might want to try setting the initial value smaller (or larger) and rerun the scenarios. Note that you should not set the Xms value larger than the Xmx value, or the JVM will most likely not start.

After the Data and Dashboard Servers are properly tuned, if Web Consoles using the Service Viewer feel "slow," review the Client side Java Virtual Machine tuning section on tuning the JRE plug-in JVM, and restarting the Console.

Every TBSM Environment is unique; with regard to tuning, one tuning size does not fit all. What this means is that multiple factors come into play, such as number and speed of processors, available RAM, service model size, number of concurrent Web consoles, SLAs, KPIs, to name a few. JVM analysis is the correct way to ensure proper performance tuning is in place.

To do this, a regular schedule for Performance data collections, analysis, and subsequent tuning (as needed) is strongly encouraged. Using the PMAT tool at some regular interval (perhaps monthly) can uncover trends in application throughput on the TBSM Data server, as new Business Services are added to the in-memory model. Also, as additional Web Consoles are added to the Dashboard Server, looking at such metrics as overall garbage collection pause times might be helpful in uncovering tuning areas to reduce application response times while serving a higher number of end-users.

In summary, making performance analysis a proactive subject in your own unique TBSM environments can go a long way to minimizing or preventing future performance concerns.

Hardware for production environments

The following tables summarize the minimum and recommended hardware and configuration for production environments (see the readme file provided with the TBSM 4.2 installation image for the latest information and updates regarding supported hardware).

Table 1: Data server - Recommended hardware and configuration for production

Important: The amount of disk space needed is directly related to how many events are processed by the system and the related logging and log levels configured on the system.

Table 2: Dashboard Server - Recommended hardware and configuration for production

References

TBSM 4.2 Beta Web Conference Series: Performance Tuning: Internal IBM presentation delivered in September 2008 to customers participating in the TBSM 4.2 beta program.
PostgreSQL Online Documentation: http://www.postgresql.org/docs/8.0/static/index.html
Tivoli Business Service Manager 4.2 Installation and Administrator's Guides: http://publib.boulder.ibm.com/infocenter/tivihelp/v3r1/index.jsp?topic=/com.ibm.tivoli.itbsm.doc/
A reference book for everything related to IBM Developer Kit and Runtime Environment, Java 2 Technology Edition, Version 5.0. (In PDF format.): http://download.boulder.ibm.com/ibmdl/pub/software/dw/jdk/diagnosis/diag50.pdf
Tuning Garbage Collection with the 5.0 Java Virtual Machine: http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
IBM Pattern Matching and Analysis (PMAT) tool from IBM Alphaworks: http://www.alphaworks.ibm.com/tech/pmat

Posted by Edward Pellon at 04:28 PM | Permalink | Comments (0) | TrackBacks (0)

The Peningo Tivoli Consultants Blog

June 04, 2009

Tivoli Business Service Manager Performance Tuning Recommendations