Analyzing application performance: A guide to manual instrumentation with Go tracing

Using manual instrumentation with Go tracing to analyze application performance

In this post, we look at Go tracing and its support to manually instrument applications to gain valuable insights into its execution behavior. We’ll be using a simple step-by-step example to show how this can be done in a real-life scenario, and you’ll also learn how to use the Go trace tool to analyze trace data. Finally, we’ll introduce our own visualization of the added trace events to complement the existing ones.

Go has built-in support for tracing in the runtime/trace package. This gives you an insight into how goroutines are being scheduled and how parallelization or garbage collection works in your application.

The specific events that runtime/trace records are:

Creation, start, and end of goroutines
Events that block/unblock goroutines (syscalls, channels, locks)
Network I/O related events
Syscalls
Garbage collection

This data can already be used to identify problems with your application. But what if you need more? Manual instrumentation provides a deeper understanding of how your application behaves at runtime.

Tracing code during execution with manual instrumentation

To get a better insight into the execution of an application we can manually instrument relevant program parts that we are interested in. Go supports the following user annotations to instrument the program with and to log interesting events during execution and link them to a trace:

Tasks are a top-level component to aid tracing. Tasks have a start and an end and are similar in concept to spans in OpenTelemetry. Tasks can involve multiple goroutines and are tracked via a context.Context object. All regions and logs within the same context will be associated with the task.
Regions refer to a region of code (associated with its calling goroutine) whose execution time interval is traced. Regions log time intervals within the same goroutine. By definition, a region starts and ends in the same goroutine.
Logs are for logging time-stamped messages during execution that allow further information to be added along the trace.

Creating tasks

To better illustrate the use of these annotations we will now go over some simple code examples. Consider this sample program to make a cappuccino. If you want to follow along, find the complete file over here: coffee.go.

func main() {
	traceFile, err := os.Create("coffee.trace")
	if err != nil {
		panic(err)
	}
	if err = trace.Start(traceFile); err != nil {
		panic(err)
	}
	defer func() {
		trace.Stop()
		if err := traceFile.Close(); err != nil {
			panic(err)
		}
	}()

	ctx := context.Background()
	ctx, task := trace.NewTask(ctx, "main")
	defer task.End()

	makeCappuccino()
}

func makeCappuccino() {
	ctx := context.Background()
	ctx, task := trace.NewTask(ctx, "makeCappuccino")
	defer task.End()

	steamMilk()
	foamMilk()
	extractCoffee()
	mixMilkCoffee()
}

func steamMilk() {
	ctx := context.Background()
	ctx, task := trace.NewTask(ctx, "steamMilk")
	defer task.End()

	// The steps are simulated with a sleep time for simplicity.
	time.Sleep(100 * time.Millisecond)
}

func foamMilk() {
	ctx := context.Background()
	ctx, task := trace.NewTask(ctx, "foamMilk")
	defer task.End()

	time.Sleep(200 * time.Millisecond)
}

func extractCoffee() {
	ctx := context.Background()
	ctx, task := trace.NewTask(ctx, "extractCoffee")
	defer task.End()

	time.Sleep(300 * time.Millisecond)
}

func mixMilkCoffee() {
	ctx := context.Background()
	ctx, task := trace.NewTask(ctx, "mixMilkCoffee")
	defer task.End()

    time.Sleep(500 * time.Millisecond)
}

At the beginning of the main function, we initialize the tracing with trace.Start and pass the file (coffee.trace) that stores the tracing data. For each task that we want to track, we create a new task by calling trace.NewTask and end the task after it is finished.

We can open the created trace file with go tool trace coffee.trace which will display the data in the browser. Initially, it shows an overview documentation page. From there, let’s just click on User-defined tasks to get an overview of all tasks. To get a bit more detail, we can click Submit to get the following view:

Each task has an ID and a link to a parent task. In this case, the parent is always 0, meaning there is no parent task. To get a better understanding of how the tasks are connected, we need to pass along the context.Context object for creating all tasks. This way, the tracing can link tasks and subtasks.

Here’s the complete file: coffee.go

func main() {
	...

	makeCappuccino(ctx)
}

func makeCappuccino(ctx context.Context) {
	ctx, task := trace.NewTask(ctx, "makeCappuccino")
	defer task.End()

	steamMilk(ctx)
	foamMilk(ctx)
	extractCoffee(ctx)
	mixMilkCoffee(ctx)
}

func steamMilk(ctx context.Context) {
	ctx, task := trace.NewTask(ctx, "steamMilk")
	defer task.End()

	time.Sleep(100 * time.Millisecond)
}

func foamMilk(ctx context.Context) {
	ctx, task := trace.NewTask(ctx, "foamMilk")
	defer task.End()

	time.Sleep(200 * time.Millisecond)
}

func extractCoffee(ctx context.Context) {
	ctx, task := trace.NewTask(ctx, "extractCoffee")
	defer task.End()

	time.Sleep(300 * time.Millisecond)
}

func mixMilkCoffee(ctx context.Context) {
	ctx, task := trace.NewTask(ctx, "mixMilkCoffee")
	defer task.End()

	time.Sleep(500 * time.Millisecond)
}

When executing this and analyzing the trace file with the Go trace tool again, we are shown the following task data:

This shows us that there are connections between the tasks since they have parent references.

From this, we get a hierarchy of tasks like this:

main (id: 1, parent: 0 (i.e., a root task without a parent))
- makeCappuccino (id: 2, parent: 1)
  - steamMilk (id: 3, parent: 2)
  - foamMilk (id: 4, parent: 2)
  - extractCoffee (id: 5, parent: 2)
  - mixMilkCoffee (id: 6, parent: 2)

By clicking on the goroutine view link of tasks main, we get the following view:

This shows a timeline view of the tasks with a visual representation of their execution times.

But what if we make more than one cappuccino? Let’s call makeCappuccino multiple times:

func main() {
	...

	makeCappuccino(ctx)
	makeCappuccino(ctx)
	makeCappuccino(ctx)
}

Check out the complete file here: coffee.go

The timeline view looks like this:

We see that now there are three consecutive calls to makeCappuccino with their respective subtasks. The program ran for over three seconds. If we had more than one worker, we could speed up our cappuccino-making and produce several cups at the same time.

func main() {
	...

	numOfWorkers := 3
	var wg sync.WaitGroup
	wg.Add(numOfWorkers)

	for i := 0; i < numOfWorkers; i++ {
		go func() {
			defer wg.Done()
			makeCappuccino(ctx)
		}()
	}

	wg.Wait()
}

Here’s the file: coffee.go

By executing makeCappuccino in parallel, we get a timeline view like this:

Tasks are shown in parallel in the timeline view. Concurrency makes it especially difficult to identify task hierarchies in this view, since it is not immediately clear which subtask belongs to which parent task.

Creating regions

Let’s now consider that we want to get insight into how long it takes to prepare the ingredients for a cappuccino. We can do this by creating a region inside makeCappuccino that starts before steaming the milk and ends after extracting the coffee:

func makeCappuccino(ctx context.Context) {
	ctx, task := trace.NewTask(ctx, "makeCappuccino")
	defer task.End()

	region := trace.StartRegion(ctx, "Ingredients")
	steamMilk(ctx)
	foamMilk(ctx)
	extractCoffee(ctx)
	region.End()

	mixMilkCoffee(ctx)
}

Again, here’s the file: coffee.go

After executing this, we get the following timeline view:

Timeline view of parallel calls and the region per goroutine.

This shows us the times in which ingredients are prepared. Regions start and end in the same goroutine and are therefore displayed with the goroutine they were executed in.

If we want to know which region belongs to which task, we can find that out by clicking on the region. Its properties will be displayed at the bottom of the screen. One of them is the taskid:

It’s clear that this region was executed as part of task 2. The output also shows us the start time, the duration of the region’s execution, and the stack traces at the start and at the end of the region.

Creating logs

Go supports storing execution data with logs associated with the current task. For instance, we can log the number of workers who are making cappuccinos to support developers in tracking and analyzing such execution details:

func main() {
	...

	numOfWorkers := 3
	trace.Log(ctx, "numOfWorkers", strconv.Itoa(numOfWorkers))
	var wg sync.WaitGroup
	wg.Add(numOfWorkers)

	for i := 0; i < numOfWorkers; i++ {
	...
}

Find the file here: coffee.go

These logs can be viewed when looking at the details of the associated task, which shows us there were three workers started.

What is missing

While the Go trace tool offers valuable insights into application’s execution, it has some limitations that you should be aware of:

Lack of clear parent-child relationship: The visualization provided by the Go trace tool may not always clearly depict the parent-child relationship between tasks, particularly in scenarios involving concurrency. This can make it challenging to understand the hierarchical structure of tasks and their dependencies.
Scalability issues with large task counts: Visualizing a large number of tasks in the Go trace tool can lead to scalability issues. As the number of tasks increases, the visualization may become cluttered, difficult to interpret and lead to performance issues in the browser, hindering effective analysis.
Difficulty in identifying performance outliers: While the Go trace tool provides an overview of task execution times, it may not always make it easy to identify performance outliers. In some cases, certain tasks may take significantly longer to execute than others, indicating potential performance bottlenecks. However, pinpointing these outliers in a large number of tasks can be challenging.

Despite these limitations, the Go trace tool remains a valuable resource for analyzing application performance and gaining insights into execution behavior. That said, we should be mindful of these shortcomings and consider using additional tools and techniques to complement our performance analysis efforts.

Symflower task visualization

To overcome the limitations of existing visualization tools, we’ve developed our own custom task visualization that provides enhanced insights into application execution behavior. Our visualization utilizes a tree-based representation of tasks, offering a clear and intuitive way to understand task hierarchies and dependencies with the following key features:

Tree Structure: Our visualization presents tasks in a hierarchical tree structure, making it easy to visualize parent-child relationships and understand the flow of execution within the application. The tree can be depicted vertically or horizontally.
Color-Coding for Performance Outliers: We’ve implemented color-coding to indicate the execution times of the types of tasks relative to one another on a linear scale. Tasks with faster execution times are depicted in green, gradually transitioning to yellow and finally red for slower executions. By visually distinguishing performance outliers, developers can quickly pinpoint potential bottlenecks and prioritize optimization efforts.
Scalability: Our visualization tool is designed to handle large numbers of tasks without sacrificing performance or usability. Whether analyzing a handful of tasks or a complex system with thousands of tasks, our tool remains responsive and intuitive.

Let’s revisit our example of making cappuccinos and analyze the task execution using our custom visualization tool. We can create the visualization by calling symflower trace –trace-file=coffee.trace –output-file=coffee.html, which generates the coffee.html file that contains the tree visualization.

Task vertical tree view to make one cappuccino.

In this visualization, each task is represented as a node in the tree structure. The root task, main, initiates the makeCappuccino task, which in turn spawns subtasks for steaming milk, foaming milk, extracting coffee, and mixing milk and coffee. For each task we show its type, associated log messages, and its execution time. The percentages on the links show the ratio of the execution time of the parent task to the execution time of the sub task.

Task horizontal tree view to make three cappuccinos in parallel.

When viewing the tasks for making three cappuccinos in parallel, we see that they all take over 99% of the main task, because they are executed concurrently.

Tasks that exhibit performance outliers are color-coded to draw attention to potential bottlenecks. For instance, if the extraction of coffee took longer than usual in a particular instance, it would be highlighted in red color, making it easy for developers to identify and investigate such anomalies. In the current version the color-code strictly represents the relation between different instances of the same task type. So even if makeCappuccino was only a few microseconds slower than in other executions, the slowest instance is highlighted in dark red, while the quickest one in green.

Outlook: further improving the task visualization

As we continue to refine our task visualization tool, we have several ideas for enhancements that will empower developers to gain even greater insights into application execution behavior and performance. Here’s a glimpse into our plans for future improvements:

Enhanced Statistics: We plan to expand the visualization capabilities to include more detailed statistics for tasks, such as distributions of execution time, and execution time that cannot be attributed to a tracked subtask to highlight potential blindspots in the trace data.
Analysis of Log and Region Data: Another idea is the analysis of data associated with tasks to identify data that causes longer execution times. By correlating task execution data developers can uncover underlying causes of performance bottlenecks and streamline the debugging process.
Enhanced User Experience: To improve usability and facilitate analysis, we plan to implement features such as collapsible tree structures, advanced filtering options, and an adjustable color-coding to better highlight outliers. These enhancements will enable developers to focus on the tasks they care about most and streamline the visualization process for complex applications.
Visualization of Parallel Execution: We aim to enhance the visualization to clearly indicate which tasks are running in parallel. By visualizing concurrent execution paths, developers can better understand the impact of concurrency on application performance and troubleshoot related issues more effectively.

Make sure you never miss any of our upcoming content by signing up for our newsletter and by following us on X, LinkedIn, or Facebook!

Try Symflower in your IDE to generate unit test templates & test suites

| 2024-06-11