Verifying Performance with Simple Benchmarks - Go Benchmarks

November 06, 2023

I am reflecting on how certain programming language features make it easy to iterate and test changes, specifically benchmarking.

I recently started working on a project that involves transforming string values to a specific format that can be used to create a database table. This service was written in Go and used a simple replacer to convert any non-alphanumeric character into an underscore (_).

var validStringRegex = regexp.MustCompile(`^[a-zA-Z0-9_]+$`)
var replacer = strings.NewReplacer(
	" ", "_",
	",", "_",
	".", "_",
	"-", "_",
	"*", "_",
	"!", "_",
	"?", "_",
	"(", "_",
	")", "_",
)

func NormalizeWithReplacer(text string) (string, error) {
	formattedString := replacer.Replace(text)

	if !validStringRegex.MatchString(formattedString) {
		slog.Error("error normalising string", "value", text, "normalizedValue", formattedString)
		return "", fmt.Errorf("error normalising string %s", text)
	}

	return formattedString, nil
}

You’ll also notice that the code uses a regular expression (Regex) to check that the transformed string matches the required format. This is necessary because chances are a character not included in the Replacer’s list may exist in the string, leading to errors down the line.

As you may have guessed, a side effect of this code is that when a new character that is not accounted for is found, we will need to update the replacer list and then redeploy the service. Not the best way to do this, right? Since there is already a regex check in this code, why don’t we flip the regex and use it to match and replace invalid characters with an underscore? This way, the code will be more robust and save Engineers time updating the replacer; Win-Win, right?

So, I quickly wrote up a Regex version of the function.

// Matches all characters not in the valid syntax
var replacerRegex = regexp.MustCompile(`[^a-zA-Z0-9_]+?`)

func NormalizeWithRegex(text string) (string, error) {
	formattedString := replacerRegex.ReplaceAllString(text, "_")

	return formattedString, nil
}

There are even fewer lines of code in this version, and notice now that I do not need to return an error (the function signature still includes an error so it will be a drop-in replacement for the existing function).

All positive so far, Yay!. All existing test cases also pass on this function. However, there is one last thing I need to confirm before this new change can be shipped to production. I need to confirm that this doesn’t impact the performance of the service.

Benchmarking my change

This service processes over 25,000+ requests/s, and this Normalize function is on the hot path, being called for all these events. This means that a minor change like this can drastically impact the tail latencies of this service. It’s a good thing Go has Benchmarking tests native to it.

func BenchmarkNormalise(b *testing.B) {
	b.Run("WithReplacer", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			formatter.NormalizeWithReplacer("Hello, World!")
		}
	})

	b.Run("WithRegexp", func(b *testing.B) {
		for i := 0; i < b.N; i++ {
			formatter.NormalizeWithRegex("Hello, World!")
		}
	})
}

Then run the benchmark with:

go test -bench . -count 5 -benchmem  -benchtime 10s ./...

Including the -benchmem flag because the extra memory allocation information comes in handy in evaluating memory tradeoffs along with compute.

The Surprising Results

BenchmarkNormalise/WithReplacer-10   52982422    229.2 ns/op    32 B/op    2 allocs/op
BenchmarkNormalise/WithRegexp-10     26595258    453.2 ns/op    56 B/op    4 allocs/op

And there you have it, my new regex function runs slower (about 50% slower) and even allocates more memory to do so 😅.

Conclusion

Contrary to what I thought, it is not all win-win. However, we can try and argue that the extra robustness from this new implementation may be worth more than slowdown, especially when you consider the Engineering time that is spent to update the Replacer function anytime a new character is encountered. The final decision will be up to your specific function. However, in this case, the values to be normalised are fairly standard and we only get new characters on rare occasions, so taking a hit in processing throughput is not going to fly.

All the code in this article can be found in this github repository. If you liked this article, don’t forget to subscribe to get notified about new articles. I write about my experience building software, focused on Infrastructure, automation and performance.

Share this article:  
Get updates about new articles:


ALSO, YOU SHOULD READ THESE ARTICLES

Robotics in Nigeria: My “Eye Opening” Trip to the Wo...

I had the opportunity of accompanying the Team from iLab ROC for their Contest in the National World Robotics Olympiad (WRO) Finals 2015 in… Read more

October 02, 2015

Can multiple threads run on less than 1 vCPU in Kube...

In today’s fast-paced world of application development, containerisation and orchestration have become crucial components for deploying and… Read more

August 08, 2023

K Means and Image Quantization [Part 1]

I was having a random discussion with a colleague of mine about the University he graduated from, and I realized that there are some… Read more

August 23, 2017