Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help
Use the Intel® VTune™ Amplifier to analyze Java* applications executed with Oracle* JDK or OpenJDK* (Linux* only).
Even though Java code execution is handled with a Managed Runtime Environment, it can be as ineffective in terms of data management as in programs written using native languages. For example, if you are conscious about performance of your data mining Java application, you need to take into consideration your target platform memory architecture, cache hierarchy and latency of access to memory levels. From the platform microarchitecture point of view, profiling of Java applications is similar to profiling of native applications but with one major difference: to see performance metrics against their program source code, the profiling tool must be able to map metrics of the binary code either compiled or interpreted by the JVM back to the original source code in Java or C/C++.
VTune Amplifier provides a low-overhead analysis of the JIT compiled code that is available for both user-mode sampling and tracing and hardware event-based sampling analysis types. The analysis of the interpreted Java methods is limited.
To enable the Java code analysis with the Intel® VTune™ Amplifier and interpret data:
To configure your performance analysis for Java code, you may use either GUI or command line (amplxe-cl) configuration. You may run Java code analysis using one of the following modes:
Launch Application
You may embed your java command in a batch file or executable script. For example, create a run.sh file with the following command:
java -Xcomp -Djava.library.path=native_lib/ia32 -cp /home/Design/Java/mixed_stacks MixedStacksTest 3 2
Then, you need to specify a path to this run.sh file as an application to launch in the project configuration of the VTune Amplifier:
In addition you may select the Auto managed code profiling and Analyze child processes option.
Similarly, you can configure an analysis with the VTune Amplifier command line interface, amplxe-cl. For example, for the Basic Hotspots analysis run the following command line:
> amplxe-cl -collect hotspots -- run.sh
or directly:
> amplxe-cl -collect hotspots -- java -Xcomp -Djava.library.path=native_lib/ia32 -cp home/Design/Java/mixed_stacks MixedStacksTest 3 2
Attach to Process
In case your Java application needs to run for some time or cannot be launched at the start of this analysis, you may attach the VTune Amplifier to the standalone Java process. You can also attach the VTune Amplifier to a C/C++ application with embedded JVM instance for hardware event-based sampling analysis types. To do this, select the Attach to Process target type and specify the Java process name or PID as follows:
You may use the command line interface to attach the analysis to the Java process. For example, the following command attaches the Basic Hotspots analysis to the Java process:
amplxe-cl -collect hotspots -target-process java
The following command line example attaches the Advanced Hotspots analysis to the Java process by its PID:
amplxe-cl -collect advanced-hotspots -target-pid 1234
Attach to Process Running under Low-privilege Account
For hardware event-based sampling analysis types, you can attach the VTune Amplifier running under the superuser account to a Java process or a C/C++ application with embedded JVM instance running under a low-privileged user account. For example, you may attach the VTune Amplifier to Java based daemons or services.
To do this, run the VTune Amplifier under the root account, select the Attach to Process target type and specify the Java process name or PID.
The dynamic attach mechanism is supported only with the Java Development Kit (JDK).
You may run the Basic Hotspot analysis to get a list of the hottest methods along with their timing metrics and call stacks. The workload distribution over threads is also displayed in the Pane: Timeline. Thread naming helps to identify where exactly the most resource consuming code was executed.
If you are pursuing maximum performance on a platform, consider writing and compiling performance critical modules of your Java project in native languages like C or even assembly. This way of programming helps to employ powerful CPU resources like vector computing (implemented via SIMD units and instruction sets). In this case, compute-intensive functions become hotspots in the profiling results, which is expected as they do most of the job. However, you might be interested not only in hotspot functions, but in identifying locations in Java code these functions were called from via a JNI interface. Tracing such cross-runtime calls in the mixed language algorithm implementations could be a challenge.
To analyze mixed code profiling results, the VTune Amplifier is "stitching" the Java call stack with the subsequent native call stack of C/C++ functions. The reverse call stacks stitching works as well.
Native function |
Mixed native/Java call stack |
||
Native module |
Compiled methods in the Java call stack |
Due to Viewing Data on Inline Functions during the compilation stage, some functions may not appear in the stack by default. Make sure to select the Show inline functions option for the Inline Mode on the filter bar.
VTune Amplifier also provides an advanced profiling option of optimizing Java applications for the CPU microarchitecture utilized in your platform. Although Java and JVM technology is intended to free a developer from hardware architecture specific coding, once Java code is optimized for the current Intel microarchitecture, it will most probably keep this advantage for future generations of CPUs. You may use theHardware Event-based Sampling Collection data collection that monitors hardware events in the CPU's pipeline and can identify coding pitfalls limiting the most effective execution of instructions in the CPU. The CPU Metrics Reference are available and can be displayed against the application modules, functions, and Java code source lines. You may also run the Hardware Event-based Sampling Collection with Stacks when you need to find out a call path for a function called in a driver or middleware layer in your system.
VTune Amplifier supports analysis of Java applications with some limitations:
System-wide profiling is not supported for managed code.
The JVM interprets some rarely called methods instead of compiling them for the sake of performance. VTune Amplifier does not recognize interpreted Java methods and marks such calls as !Interpreter in the restored call stack.
If you want such functions to be displayed in stacks with their names, force the JVM to compile them by using the -Xcomp option (show up as [Compiled Java code] methods in the results). However, the timing characteristics may change noticeably if many small or rarely used functions are being called during execution.
When opening source code for a hotspot, the VTune Amplifier may attribute events or time statistics to an incorrect piece of the code. It happens due to JDK Java VM specifics. For a loop, the performance metric may slip upward. Often the information is attributed to the first line of the hot method's source code. In the example below, a real hotspot line consuming most CPU time is line 35.
Consider events and time mapping to the source code lines as approximate.
For the Basic Hotspots analysis type, the VTune Amplifier may display only a part of the call stack. To view the complete stack, use additional command line JDK Java VM options that change behavior of the Java VM:
Use the -Xcomp additional command line JDK Java VM option that enables the JIT compilation for better quality of stack walking.
On Linux* x86, use client JDK Java VM instead of the server Java VM: either explicitly specify -client, or simply do not specify -server JDK Java VM command line option.
On Linux x64, specify -XX:-UseLoopCounter command line option that switches off on-the-fly substitution of the interpreted method with the compiled version.
Java application profiling is supported for the Basic Hotspots, Advanced Hotspots, and Microarchitecture analysis types. Support for the Concurrency and Locks and Waits analysis is limited as some embedded Java synchronization primitives (which do not call operating system synchronization objects) cannot be recognized by the VTune Amplifier. As a result, some of the timing metrics may be distorted.
There are no dedicated libraries supplying a user API for collection control in the Java source code. However, you may want to try applying the native API by wrapping the __itt calls with JNI calls.