Intel® VTune™ Amplifier XE and Intel® VTune™ Amplifier for Systems Help

Java* Code Analysis from the Command Line

Intel® VTune™ Amplifier provides a low-overhead user-mode sampling and tracing and hardware event-based sampling analysis of the JIT compiled code executed with Oracle* JDK or OpenJDK*. The analysis of the interpreted Java methods is limited.

You may use the hardware event-based sampling data collection that monitors hardware events in the CPU's pipeline and can identify coding pitfalls limiting the most effective execution of instructions in the CPU. The hardware performance metrics are available and can be displayed against the application modules, functions, and Java code source lines. You may also run the hardware event-based sampling collection with stacks when you need to find out a call path for a function called in a driver or middleware layer in your system.

Configuring Java Collection

Use the following syntax to configure Java analysis from the command line:

$ amplxe-cl -collect <analysis_type> [-[no-]follow-child] [-mrte-mode=<mrte_mode_value>] [<-knob> <knob_name=knob_option>] [--] <target>

where

Note

To see all knobs available for a predefined analysis type, enter:

$ amplxe-cl -help collect <analysis_type>

To see knobs for a custom analysis type, enter:

$ amplxe-cl -help collect-with <analysis_type>

Examples

Example 1: Running Java Analysis

The following command line runs the Advanced Hotspots analysis on a java command:

$ amplxe-cl -collect advanced-hotspots -- java -Xcomp -Djava.library.path=native_lib/ia32 -cp /home/Design/Java/mixed_call MixedCall 3 2

Example 2: Running Analysis for Embedded Java Command

You may embed your java command in a batch file or executable script before running the analysis. For example, create a run.sh file with the following command:

java -Xcomp -Djava.library.path=native_lib/ia32 -cp /home/Design/Java/mixed_call MixedCall 3 1

The following command line runs the Basic Hotspots analysis on a specified batch file with embedded java command:

$ amplxe-cl -collect hotspots -- run.sh

Example 3: Attaching Analysis to Java Process

In case your Java application needs to run for some time or cannot be launched at the start of this analysis, you may attach the VTune Amplifier to the Java process. To do this, specify the following analysis target: --target-process java.

Note

The dynamic attach mechanism is supported only with the Java Development Kit (JDK).

The following example attaches the Advanced Hotspots analysis to a running Java process:

$ amplxe-cl -collect advanced-hotspots --target-process java

Viewing Summary Report

VTune Amplifier automatically generates the summary report when data collection completes. Similar to the Summary window, available in GUI, the command line report provides overall performance data of your Java target.

Note

For more information on analyzing the summary report data, refer to the Summary Report section.

Examples

The following example generates the summary report for the Basic Hotspots analysis result. For user-mode sampling and tracing analysis results, the summary report includes Collection and Platform information, CPU information and summary per the basic metrics.

Collection and Platform Info
----------------------------
Parameter                 r002hs

------------------------  -----------------------------------------------------
-------------------------------------------------------------------------------
Application Command Line  /tmp/java_mixed_call/src/run.sh

Operating System          3.16.0-30-generic NAME="Ubuntu"
VERSION="14.04.2 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.2 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
Computer Name             10.125.21.55

Result Size               11560723

Collection start time     13:55:00 05/02/2016 UTC

Collection stop time      13:55:10 05/02/2016 UTC


CPU
---
Parameter          r001hs
-----------------  -------------------------------------------------
Name               3rd generation Intel® Core™ Processor family
Frequency          3492067692
Logical CPU Count  8

Summary
-------
Elapsed Time:       10.183
CPU Time:           19.200
Average CPU Usage:  1.885

This example generates the summary report for the Advanced Hotspots analysis result. For hardware event-based sampling analysis results, the summary report includes Collection and Platform information, CPU information, summary per the basic metrics, and an event summary.

Collection and Platform Info
----------------------------
Parameter                 r002ah

------------------------  ---------------------------------------------------------------------------------
Operating System          3.16.0-30-generic NAME="Ubuntu"
VERSION="14.04.2 LTS, Trusty Tahr"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 14.04.2 LTS"
VERSION_ID="14.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
Result Size               171662827
Collection start time     10:44:34 15/04/2016 UTC
Collection stop time      10:44:50 15/04/2016 UTC

CPU
---
Parameter          r002ah
-----------------  -------------------------------------------------
Name               4th generation Intel® Core™ Processor family
Frequency          2494227445
Logical CPU Count  4

Summary
-------
Elapsed Time:       15.463
CPU Time:           6.392
Average CPU Usage:  0.379
CPI Rate:           1.318

Event summary
-------------
Hardware Event Type         Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
--------------------------  -------------------------  --------------------------------  -----------------
INST_RETIRED.ANY                          13014608235                              8276  1900000
CPU_CLK_UNHALTED.THREAD                   17158609921                              8207  1900000
CPU_CLK_UNHALTED.REF_TSC                  15942400300                              5163  1900000
BR_INST_RETIRED.NEAR_TAKEN                 1228364727                              4648  200003
CALL_COUNT                                  213650621                             75413  1
ITERATION_COUNT                             370567815                             84737  1
LOOP_ENTRY_COUNT                            162943310                             70069  1

Identifying Hottest Methods

Use the hotspots command line report as a starting point for identifying program units (for example: functions, modules, or objects) that take the most processor time (Hotspots analysis), underutilize available CPUs (Concurrency analysis), have long waits (Locks and Waits analysis), and so on.

The report displays the hottest program units in the descending order by default, starting from the most performance-critical unit. The command-line reports provide the same data that is displayed in the default GUI analysis viewpoints.

Note

  • To display a list of available groupings for a hotspots report, enter: amplxe-cl -report hotspots -r <result_dir> group-by=?.
  • To set the number of top items to include in a report, use the limit action option: amplxe-cl -report <report_type> -limit <value> -r <result_dir>

Examples

Examples

This example generates the hotspots report for the Basic Hotspots analysis result and groups the data by module. The result file is not specified and VTune Amplifier uses the latest analysis result.

$ amplxe-cl -report hotspots


Function            CPU Time  CPU Time:Effective Time  CPU Time:Effective Time:Idle  CPU Time:Effective Time:Poor  CPU Time:Effective Time:Ok  CPU Time:Effective Time:Ideal  CPU Time:Effective Time:Over  CPU Time:Spin Time  CPU Time:Overhead Time  Module            Function (Full)     Source File  Start Address
------------------  --------  -----------------------  ----------------------------  ----------------------------  --------------------------  -----------------------------  ----------------------------  ------------------  ----------------------  ----------------  ------------------  -----------  -------------
[libmixed_call.so]   17.180s                  17.180s                            0s                       17.180s                          0s                             0s                            0s                  0s                      0s  libmixed_call.so  [libmixed_call.so]  [Unknown]    0

[libjvm.so]           1.698s                   1.698s                        0.020s                        1.678s                          0s                             0s                            0s                  0s                      0s  libjvm.so         [libjvm.so]         [Unknown]    0

[libpthread.so.0]     0.136s                   0.136s                            0s                        0.136s                          0s                             0s                            0s                  0s                      0s  libpthread.so.0   [libpthread.so.0]   [Unknown]    0

[libtpsstool.so]      0.052s                   0.052s                            0s                        0.052s                          0s                             0s                            0s                  0s                      0s  libtpsstool.so    [libtpsstool.so]    [Unknown]    0
...

The following example generates the hotspots report for the specified Advanced Hotspots analysis result, sets the number of items to include in the report to 3, and groups the report data by application module.

$ amplxe-cl -report hotspots -limit 3 -r r002ah -group-by module


Module            CPU Time  CPU Time:Effective Time  CPU Time:Effective Time:Idle  CPU Time:Effective Time:Poor  CPU Time:Effective Time:Ok  CPU Time:Effective Time:Ideal  CPU Time:Effective Time:Over  CPU Time:Spin Time  CPU Time:Overhead Time  Instructions Retired  CPI Rate  Wait Rate  CPU Frequency Ratio  Context Switch Time  Context Switch Time:Wait Time  Context Switch Time:Inactive Time  Context Switch Count  Context Switch Count:Preemption  Context Switch Count:Synchronization  Module Path                                                                                 
----------------  --------  -----------------------  ----------------------------  ----------------------------  --------------------------  -----------------------------  ----------------------------  ------------------  ----------------------  --------------------  --------  ---------  -------------------  -------------------  -----------------------------  ---------------------------------  --------------------  -------------------------------  ------------------------------------  -----------
libmixed_call.so   15.294s                  15.294s                        0.419s                       14.871s                      0.004s                             0s                            0s                  0s                      0s        21,148,958,284     1.907      0.000                1.149               1.401s                             0s                             1.401s                26,769                           26,769                                     0  /tmp/java_mixed_call/src/libmixed_call.so
libjvm.so           0.582s                   0.582s                        0.033s                        0.547s                      0.002s                             0s                            0s                  0s                      0s           792,807,896     1.513      0.437                0.899               0.047s                         0.005s                             0.042s                   462                              451                                       11  /tmp/java_mixed_call/src/libmjvm.so                                         
...                                                    
...

Analyzing Stacks

To get the maximum performance out of your Java application, writing and compiling performance critical modules of your Java project in native languages, such as C or even assembly. This will help your application take advantage of vectorization and make complete use of powerful CPU resources. This way of programming helps to employ powerful CPU resources like vector computing (implemented via SIMD units and instruction sets). In this case, compute-intensive functions become hotspots in the profiling results, which is expected as they do most of the job. However, you might be interested not only in hotspot functions, but in identifying locations in Java code these functions were called from via a JNI interface. Tracing such cross-runtime calls in the mixed language algorithm implementations could be a challenge.

Use the callstacks report to display full stack data for each hotspot function and identify the impact of each stack on the function CPU or Wait time.

Note

To display a list of available groupings for a callstacks report, enter amplxe-cl -report callstacks -r <result_dir> group-by=?.

Example

The following command line generates the callstacks report for the specified Basic Hotspots analysis result.


Function            Function Stack             CPU Time  Module                Function (Full)                 Source File     Start Address
------------------  -------------------------  --------  --------------------  ------------------------------  --------------  --------------
[libmixed_call.so]                              17.180s  libmixed_call.so      [libmixed_call.so]              [Unknown]       0
                    [libmixed_call.so]           8.600s  libmixed_call.so      [libmixed_call.so]              [Unknown]       0
                    MixedCall::CallNativeFunc        0s  [Compiled Java code]  MixedCall::CallNativeFunc(int)  MixedCall.java  0x7fb63937eec0
                    MixedCall::foo4                  0s  [Compiled Java code]  MixedCall::foo4(int)            MixedCall.java  0x7fb6393831e3
                    MixedCall::foo3                  0s  [Compiled Java code]  MixedCall::foo3(int)            MixedCall.java  0x7fb63938046c
                    MixedCall::foo2                  0s  [Compiled Java code]  MixedCall::foo2(int)            MixedCall.java  0x7fb63938046c
                    MixedCall::foo1                  0s  [Compiled Java code]  MixedCall::foo1(int)            MixedCall.java  0x7fb63938046c
                    MixedCall::run                   0s  [Compiled Java code]  MixedCall::run()                MixedCall.java  0x7fb63938009b                    
...

Analyzing Hardware Metrics

VTune Amplifier provides an advanced profiling option of optimizing Java applications for the CPU microarchitecture utilized in your platform. Although Java and JVM technology is intended to free a developer from hardware architecture specific coding, once Java code is optimized for the current Intel microarchitecture, it will most probably keep this advantage for future generations of CPUs.

VTune Amplifier counts the number of hardware events during the hardware event-based sampling collection to help you understand how your Java application utilizes available hardware resources. Use the hw-events report type to display hardware events count per application functions in the descending order by default.

Note

To display a list of available groupings for a hw-events report, enter amplxe-cl -report hw-events -r <result_dir> group-by=?.

Example

This example generates the hw-events report for the specified Advanced Hotspots analysis result.


Function            Hardware Event Count:INST_RETIRED.ANY  Hardware Event Count:CPU_CLK_UNHALTED.THREAD  Hardware Event Count:CPU_CLK_UNHALTED.REF_TSC Context Switch Time  Context Switch Time:Wait Time  Context Switch Time:Inactive Time  Context Switch Count  Context Switch Count:Preemption  Context Switch Count:Synchronization  Module              Function (Full)     Source File  Start Address
------------------  -------------------------------------  --------------------------------------------  --------------------------------------------- -------------------  -----------------------------  ---------------------------------  --------------------  -------------------------------  ------------------------------------  ------------------  ------------------  -----------  -------------
[libmixed_call.so]                         21,148,958,284                                40,338,264,445                                 35,096,009,324              1.401s                             0s                             1.401s                26,769                           26,769                                     0  [libmixed_call.so]  [libmixed_call.so]  [Unknown]    0
[libjvm.so]                                   792,807,896                                 1,199,773,286                                  1,335,034,092              0.047s                         0.005s                             0.042s                   462                              451                                    11  [libjvm.so]         [libjvm.so]         [Unknown]    0
...

Limitations

VTune Amplifier supports analysis of Java applications with some limitations:

See Also