Sorting Strings In Go, Fast and Slow

Nov 2023

Updated Jun 2024

In Go 1.21, the slices package was added to the Go standard library. It provides functions for working with slices of any type. For example, the generic slices.Sort function is a generic sort implementation that accepts any slice as long as the slice elements can be ordered. Its type constraint cmp.Ordered demands that the slice element type supports the operators < <= >= >. Currently, such types include numbers (ints and floats), pointers, and strings. For other types, the slices package provides the SortFunc function which also sorts the elements but takes an explicit comparison function as a second argument. So far, this seems like a classic example of standard library functionality.

However, the Go standard library, since version 1.0, also contains the sort package. As its name indicates, it provides functions for sorting slices. Its interface, however, differs from the slices package since Go did not support generics back then. Before Go 1.21, most people used functions like sort.Slice or sort.Strings to sort items. Only recently, have people started using the newer slices package. Since the Go standard library now provides two packages for sorting, one might wonder which one to choose. According to the sort.Strings documentation:

// Strings sorts a slice of strings in increasing order.
//
// Note: consider using the newer slices.Sort function,
// which runs faster.
func Strings(x []string)

How Sorting Works in Go

Both the sort and slices package, use the same pattern-defeating quicksort (pdqsort) implementation. They differ only slightly in how values are compared and partitioned. On a high level:

pdqsort:
   1. If the data is small, use insertion sort. Return.
   2. If there are no good pivot elements, fallback to heap sort. Return.
   3. Pick a pivot element and partition the slice.
   4. Call pdqsort for the left and right partitions.

It is basically textbook quicksort with some extra functionality. Textbook quicksort may require O(n2) operations in certain cases while pattern-defeating quicksort falls back to an O(N * log(N)) sorting algorithm when it runs into such a case.

Even though both packages use essentially the same quicksort implementation, a benchmark sorting 10k integers shows a significant performance difference:

goos: darwin
goarch: arm64
pkg: github.com/aead/bench
BenchmarkSlicesSort_Int-8           6771            176058 ns/op
BenchmarkSort_Ints-8                2109            585427 ns/op
PASS
ok      github.com/aead/bench   3.829s

Profiling the benchmark reveals that the partition and insertion steps in the sort package are noticeably slower. The slices package uses direct assignments when swapping elements, and the compiler is able to optimize the cmp.Less calls for comparing two numbers into a single instruction while the sort package is tied to the Less and Swap methods of the sort.Interface type. When replacing the call to slices.Sort with slices.SortFunc and a custom comparison function func(a, b int) int { return a - b } the performance difference becomes much smaller.

The compiler is able to optimize the generic slices.Sort much better since it has more information about the types. In particular, it knows how to compare two numbers. For the sort.Ints, the compiler could, in theory, figure out that the given sort.Interface is an []int, inline the method calls and apply the same optimizations. It just doesn't do that. However, inlining is a trade-off, and the compiler will not be able to infer the concrete type behind a sort.Interface in many cases.

Strings != Numbers

However, when running the same benchmark for strings the results are a bit different:
goos: darwin
goarch: arm64
pkg: github.com/aead/bench
BenchmarkSlicesSort_String-8         858           1404786 ns/op
BenchmarkSort_Strings-8             1040           1155523 ns/op
PASS
ok      github.com/aead/bench   2.764s
Even though the compiler should be able to (and does) apply the same optimizations, the slices.Sort function is now slower than sort.Strings. Replacing slices.Sort with slices.SortFunc and cmp.Compare decreases performance even further:
goos: darwin
goarch: arm64
pkg: github.com/aead/bench
BenchmarkSlicesSortFunc_Compare-8            572           2112564 ns/op
BenchmarkSlicesSort_String-8                 865           1427147 ns/op
BenchmarkSort_Strings-8                     1016           1161378 ns/op
PASS
ok      github.com/aead/bench   4.227s

This is quite surprising. How can slices.Sort be faster for []int and slower for []string? It's not specificly optimized for one particular data type. Also, why is slices.SortFunc using cmp.Compare so much slower than the other two functions?

Strings != NaN

When comparing two instances of a type (x and y), Compare returns, as one might expect, -1 if x < y, 0 if x == y and +1 if x > y. However, there is one special case: What if x and/or y are NaN floating point numbers?

// For floating-point types, a NaN is considered less than any non-NaN,
// a NaN is considered equal to a NaN, and -0.0 is equal to 0.0.

Therefore, Compare first checks whether x is a NaN by evaluating x != x. The idea is that, according to the IEEE 754 specification, NaN is not equal to any value - including another NaN. Hence, Compare looks like this:

func Compare[T Ordered](x, y T) int {
	xNaN := x != x
	yNaN := y != y
	if xNaN && yNaN {
		return 0
	}
	if xNaN || x < y {
		return -1
	}
	if yNaN || x > y {
		return +1
	}
	return 0
}

But why does all of this matter when comparing strings?

Things Being Equal

To generate the machine code for a Compare call, the Go compiler has to determine the type of x and y. For example, when x and y are integers, the compiler emits a single CMPQ instruction (on amd64) for the check xNaN || x < y. The compiler is smart enough to notice that x != x is always false for any value of type int and eliminates the xNaN variable entirely.

When x and y are strings, the compiler inserts a call to runtime.cmpstring. Comparing two strings has linear (O(N)) runtime complexity. Therefore, comparing strings is usually slower than comparing integers. However, the main difference is that the compiler does not recognize that x != x is always false for strings. It does not apply the optimisation from above. Hence, it also inserts a call to runtime.memequal and does not eliminate the xNaN variable. This results in calling runtime.memequal twice and runtime.cmpstring either once or twice while an optimal implementation would compare both strings exactly once. These extra calls add up. Especially when comparing strings that share a common prefix, like long filesystem paths.

At the end of the day, this is a limitation of the Go compiler. While the compiler could recognize that x != x is always false and eliminate the comparison, regardless whether x is of type int or string, it applies this optimization just for integers. This is unfortunate because the documentation guides developers to use slices.Sort which may be faster or slower depending on the type of your slice.

Update

In Go 1.22, this issue has been resolved - at least to some extent. In particular, CL 503795 has been merged. It adds a dead code elimination optimization to the compiler. The Go 1.22 compiler is now able to detect that statements like x != x are always false if x is a string. Go 1.22 also changed the sort.Strings and sort.Ints implementation to thin wrappers around slices.Sort. So most Go programs that sort primitive types, like strings or numbers, will use the slices.Sort implementation automatically.

However, cmp.Compare still compares strings twice whenever x >= y. This will be addressed in Go 1.23 by CL 578835, which will improve the performance of functions like strings.Compare.

Thanks to these changes, sorting and comparing strings in Go will be even faster than before.