Intel® VTune™ Amplifier 2017

Build Application

Before you start identifying hotspots in your native Intel® Xeon Phi™ coprocessor application, do the following:

  1. Get software tools.

  2. Build application with full optimizations on the host.

  3. Create a performance baseline.

Get Software Tools

You need the following tools to try these tutorial steps yourself using the matrix sample application:

Acquire Intel VTune Amplifier

If you do not already have access to the VTune Amplifier, you can download an evaluation copy from http://software.intel.com/en-us/articles/intel-software-evaluation-center/.

Install and Set Up VTune Amplifier Sample Applications

  1. Copy the matrix_vtune_amp_xe.tgz file from the <install_dir>/samples/<locale>/C++ directory to a writable directory or share on your system.

    Note

    The default installation path for the VTune Amplifier XE is /opt/intel/vtune_amplifier_xe_version. For the VTune Amplifier for Systems, the default <install_dir> is:

    • For super-users: /opt/intel/system_studio_version/vtune_amplifier_for_systems
    • For ordinary users: $HOME/intel/system_studio_version/vtune_amplifier_for_systems

  2. Extract the sample from the .tgz file.

Note

  • Samples are non-deterministic. Your screens may vary from the screen captures shown throughout this tutorial.
  • Samples are designed only to illustrate the VTune Amplifier features; they do not represent best practices for creating code.

Build the Target

Build the target on the host with full optimizations, which is recommended for performance analysis.

  1. Browse to the linux directory within where you extracted the sample code (for this example assume that location is /home/sample/matrix/linux). Make sure this directory contains Makefile.

  2. Set up the environment for Intel C++ Compiler:

    source <path_to_compiler_bin>/compilervars.sh intel64

  3. Build the code using the make command:

    $ make mic

    The matrix application is built as matrix.mic and stored in the matrix/linux directory.

    Note

    This application uses OpenMP* library for compilation. To run the sample on the Intel Xeon Phi coprocessor, make sure to copy the OpenMP library to the card and set up the default path.

Create a Performance Baseline

To communicate with the Intel Xeon Phi coprocessor cards, you may use any of the following mechanisms:

  1. Ensure that the binary to analyze is copied to the Intel Xeon Phi coprocessor. You can do this by using scp, for example:

    scp matrix.mic mic0:/tmp
    

    Note

    You may add this command to build scripts to automate a copy action after the binary recompilation. In this tutorial's scenario, scp command is added to the Makefile. So, the matrix application is built and automatically copied to the Intel Xeon Phi coprocessor.

  2. Run the application on the coprocessor using ssh and record the results to establish a performance baseline:

  3. Note the execution time displayed at the bottom. For the matrix.mic executable in the figure above, the execution time is 30.466 seconds. Use this metric as a baseline against which you will compare subsequent runs of the application.

    Note

    Run the application several times, noting the execution time for each run, and use the average time. This helps to minimize skewed results due to transient system activity.

Note

  • If you experience a problem with permissions to run the commands, use sudo or root access.

  • Alternatively, you may create an ssh script to copy and launch your application on a card or use the micnativeloadex utility. For details, see the Preparing an Intel® Xeon Phi™ Coprocessor System for Analysis online help topic.

Key Terms

Next Step

Create Project and Configure Target

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804