This chapter describes how to optimize CPU resources and applications for high performance.
You must configure enough CPU power in your system to meet the performance needs of your users and applications. In addition, you may be able to improve performance by optimizing the CPU and your applications.
A system must be able to efficiently allocate the available CPU cycles among competing processes. In addition to single-CPU systems, DIGITAL supports multiprocessing systems and processors with different speeds.
Multiprocessing systems allow you to expand the computing power of a system by adding processors. Workloads that benefit most from multiprocessing have multiple processes or multiple threads of execution that can run concurrently, such as database management system (DBMS) servers, World Wide Web (WWW) servers, mail servers, and compute servers.
You may be able to improve the performance of a multiprocessing system that has only a small percentage of idle time by adding processors. However, increasing the number of processors may increase the demands on your I/O and memory subsystems and could cause bottlenecks. If your system is metadata-intensive (that is, it opens large numbers of small files and accesses them repeatedly), you may gain an additional performance benefit if you add Prestoserve or use a write-back cache when you add more processors. See Chapter 5 for information about Prestoserve and write-back caches.
Before you add processors, you must ensure that a performance problem is not caused by the virtual memory or I/O subsystems. For example, increasing the number of processors will not improve performance in a system that lacks sufficient memory resources.
let you monitor the memory, CPU, and I/O consumption on your system.
extension to the
debugger allows application developers to monitor the time spent in user
mode, and kernel mode on each of the processors.
This information can help
application developers determine how effectively they are achieving parallelism
across the system.
about using tools to monitor performance.
command to determine CPU usage as follows:
A high percentage of idle time on one or more processors indicates either:
Threads are blocked because the CPU is waiting for some event or resource (for example, memory or I/O)
Threads are idle because the CPU is not busy
A low percentage of idle time is the primary indication of a CPU bottleneck.
A high percentage of system time may indicate a system bottleneck, which can be caused by excessive system calls, device interrupts, context switches, soft page faults, lock contention, or cache missing.
A high percentage of user time can be a characteristic of a well-performing system. However, if the system has poor performance, a high percentage of user time may indicate a user code bottleneck, which can be caused by inefficient user code, insufficient processing power, or excessive memory latency or cache missing.
Use profiling to determine which sections of code consume the most processing time. See the Programmer's Guide for more information on profiling.
extension to display statistics about CPU use, especially for
Statistics include the
percentages of time the CPU spends in the following states:
Running user level code
Running system level code
Running at a priority set with the
Waiting (idle with input or output pending)
See Chapter 2 for information about monitoring systems.
After you configure the appropriate number of CPUs in your system, you may be able to improve system performance by optimizing your CPU resources. Before optimizing the CPU, ensure that the virtual memory or I/O subsystems are not the cause of poor performance. If optimizing the CPU does not solve the performance problem, you must upgrade your CPU to a faster processor or use multiprocessing.
To optimize your CPU resources, use the following methods:
The Class Scheduler allows you to allocate a percentage of CPU time to a task or application. This allows you to reserve a majority of CPU time for important processes, while limiting CPU usage by less critical processes.
To use class scheduling, group together processes into classes and assign each class a percentage of CPU time. You can display statistics on the actual CPU usage for a class. You can also manually assign a class to any process.
command to specify the priority for a command.
command to change the priority of a running
Extremely large programs may run more efficiently if you increase the values of the following system configuration file parameters that control program size limits:
dfldsiz--Default data segment size limit
maxdsiz--Maximum data segment size limit
dflssiz--Default stack size limit
maxssiz--Maximum stack size limit
large programs may not run unless these parameters are adjusted.
For example, an inadequate
size limit may produce
the following error:
Out of process memory...
affect program size limits.
information on changing these parameter values.
You can reduce the static size of the kernel by
deconfiguring any unnecessary subsystems.
To do this, use the
You can also minimize the number of kernel options for your system.
Ensure that lockmode is set to the appropriate value
Optimize your applications
You can use various compiler and linker optimization levels to generate more efficient user code. See the Programmer's Guide for more information on application optimization.
If an application is degrading system performance, use profiling to identify sections of code that consume large portions of execution time. In a typical program, most execution time is spent in relatively few sections of code. To improve performance, concentrate on improving the coding efficiency of those time-intensive sections. See the Programmer's Guide for more information on profiling.
Well-written applications use CPU, memory, and I/O resources efficiently. You may be able to improve system and application performance by following these recommendations:
Use the latest version of the operating system, compiler, firmware, and patches
Check the software on your system to ensure that you are using the latest versions of the compiler and the operating system to build your application program. In general, new versions of a compiler perform advanced optimizations, and new versions of the operating system operate efficiently.
To enhance parallelism, application developers working in Fortran or C should consider using the Kuch & Associates Preprocessor (KAP), which can have a significant impact on SMP performance. See the Programmer's Guide for details on KAP.
Ensure that the application runs without error
Test your application program to ensure that it runs without errors.
you are porting an application from a 32-bit system to DIGITAL UNIX or
developing a new application, never attempt to optimize an application
until it has been thoroughly debugged and tested.
are porting an application written in C, use the
command with the
flag or compile your program using
the C compiler's
identify possible portability problems that you may need to resolve.
Optimizing an application program can involve modifying the build process or modifying the source code. Various compiler and linker optimization levels can be used to generate more efficient user code. See the Programmer's Guide for more information on optimization.
Use shared libraries
Using shared libraries reduces the need for memory and disk space. When multiple programs are linked to a single shared library, the amount of physical memory used by each process can be significantly reduced. However, shared libraries initially result in an execution time that is slower than if you had used static libraries.
Interprocess communication (IPC) is the exchange of information between two or more processes. Some examples of IPC include messages, shared memory, semaphores, pipes, signals, process tracing, and processes communicating with other processes over a network. IPC is a functional interrelationship of several operating system subsystems. Elements are found in scheduling and networking.
In single-process programming, modules within a single process communicate with each other using global variables and function calls, with data passing between the functions and the callers. If you are programming by using separate processes with images in separate address spaces, use additional communication mechanisms.
The DIGITAL UNIX operating system provides the following facilities for interprocess communication:
Pipes--See the Guide to Realtime Programming for information about pipes.
Signals--See the Guide to Realtime Programming for information about signals.
Sockets--See the Network Programmer's Guide for information about sockets.
Streams--See the Programmer's Guide: STREAMS for information about streams.
X/Open Transport Interface (XTI)--See the Network Programmer's Guide for information about XTI.
You may be able to improve IPC performance by modifying the following attributes:
A process will be unable to send a message to a queue if the message will
make the total number of bytes in that queue greater than the limit
When the limit is reached, the
process sleeps and waits for this condition to be resolved.
A process will be unable to send a message if the message will make
the total number of message headers currently in the system greater than the
limit specified by
If the limit is reached, the
process sleeps and waits for a message header to be freed.
You can track the use of IPC facilities with the
By looking at the current number
of bytes and message headers in the queues, you can then determine whether
you need to increase the values of the
You may also want to consider modifying several of the following IPC attributes:
Shared memory attributes:
As a design consideration, consider whether you will get better performance by using threads instead of shared memory.