Build Application

Before you start identifying hotspots in your native Intel® Xeon Phi™ coprocessor application, do the following:

Get Software Tools

You need the following tools to try these tutorial steps yourself using the matrix sample application:

Intel® VTune™ Amplifier, including sample applications
Sampling driver, set up during the VTune Amplifier installation

Note

If, for some reason, the VTune Amplifier was not able to install the driver, you will not be able to run the analysis and will see a warning message. See online help for additional instructions how to install the driver manually.
Intel® Manycore Platform Software Stack (Intel® MPSS). See Release Notes for more information.
tar file extraction utility
Intel® C++ Compiler installed on the host. See Release Notes for more information.

Acquire Intel VTune Amplifier

If you do not already have access to the VTune Amplifier, you can download an evaluation copy from http://software.intel.com/en-us/articles/intel-software-evaluation-center/.

Install and Set Up VTune Amplifier Sample Applications

Samples are non-deterministic. Your screens may vary from the screen captures shown throughout this tutorial.
Samples are designed only to illustrate the VTune Amplifier features; they do not represent best practices for creating code.

Build the target on the host with full optimizations, which is recommended for performance analysis.

Browse to the linux directory within where you extracted the sample code (for this example assume that location is /home/sample/matrix/linux). Make sure this directory contains Makefile.
Set up the environment for Intel C++ Compiler:

source <path_to_compiler_bin>/compilervars.sh intel64
Build the code using the make command:

$ make mic

The matrix application is built as matrix.mic and stored in the matrix/linux directory.

Note

This application uses OpenMP* library for compilation. To run the sample on the Intel Xeon Phi coprocessor, make sure to copy the OpenMP library to the card and set up the default path.

To communicate with the Intel Xeon Phi coprocessor cards, you may use any of the following mechanisms:

Mount an NFS share. See the NFS Mounting a Host Export topic in the Intel Manycore Programming Software Stack help for details.
Use existing SSH tools.

Ensure that the binary to analyze is copied to the Intel Xeon Phi coprocessor. You can do this by using scp, for example:
```
scp matrix.mic mic0:/tmp
```
Note

You may add this command to build scripts to automate a copy action after the binary recompilation. In this tutorial's scenario, scp command is added to the Makefile. So, the matrix application is built and automatically copied to the Intel Xeon Phi coprocessor.
Run the application on the coprocessor using ssh and record the results to establish a performance baseline:
Note the execution time displayed at the bottom. For the matrix.mic executable in the figure above, the execution time is 30.466 seconds. Use this metric as a baseline against which you will compare subsequent runs of the application.

Note

Run the application several times, noting the execution time for each run, and use the average time. This helps to minimize skewed results due to transient system activity.

If you experience a problem with permissions to run the commands, use sudo or root access.
Alternatively, you may create an ssh script to copy and launch your application on a card or use the micnativeloadex utility. For details, see the Preparing an Intel® Xeon Phi™ Coprocessor System for Analysis online help topic.