In Go 1.21, the slices package was added to the Go standard library.
It provides functions for working with slices of any type. For example, the generic
slices.Sort
function is a generic sort implementation that accepts any slice as long as the
slice elements can be ordered. Its type constraint cmp.Ordered
demands that the slice element type
supports the operators < <= >= >
. Currently, such types include numbers (ints and floats),
pointers, and strings. For other types, the slices package provides the SortFunc
function which also sorts the elements but takes an explicit comparison function as a second argument. So far, this
seems like a classic example of standard library functionality.
However, the Go standard library, since version 1.0, also contains the sort
package. As its name indicates, it provides functions for sorting slices. Its interface, however, differs from the
slices package since Go did not support generics back then. Before Go 1.21, most people used functions like sort.Slice
or sort.Strings
to sort items. Only recently, have people started using the newer slices package. Since the
Go standard library now provides two packages for sorting, one might wonder which one to choose. According to
the sort.Strings
documentation:
// Strings sorts a slice of strings in increasing order.
//
// Note: consider using the newer slices.Sort function,
// which runs faster.
func Strings(x []string)
How Sorting Works in Go
Both the sort and slices package, use the same pattern-defeating quicksort (pdqsort) implementation. They differ only slightly in how values are compared and partitioned. On a high level:
pdqsort:
1. If the data is small, use insertion sort. Return.
2. If there are no good pivot elements, fallback to heap sort. Return.
3. Pick a pivot element and partition the slice.
4. Call pdqsort for the left and right partitions.
It is basically textbook quicksort with some extra functionality. Textbook quicksort may require O(n
operations in certain cases while pattern-defeating quicksort falls back to an 2
)O(N * log(N))
sorting algorithm when it runs into
such a case.
Even though both packages use essentially the same quicksort implementation, a benchmark sorting 10k integers shows a significant performance difference:
goos: darwin
goarch: arm64
pkg: github.com/aead/bench
BenchmarkSlicesSort_Int-8 6771 176058 ns/op
BenchmarkSort_Ints-8 2109 585427 ns/op
PASS
ok github.com/aead/bench 3.829s
Profiling the benchmark reveals that the partition and insertion steps in the sort package are noticeably slower.
The slices package uses direct assignments when swapping elements, and the compiler is able to optimize the
cmp.Less
calls for comparing two numbers into a single instruction while the sort package is tied
to the Less
and Swap
methods of the sort.Interface
type.
When replacing the call to slices.Sort
with slices.SortFunc
and a custom comparison
function func(a, b int) int { return a - b }
the performance difference becomes much smaller.
The compiler is able to optimize the generic slices.Sort
much better since it has more information
about the types. In particular, it knows how to compare two numbers. For the sort.Ints
, the compiler
could, in theory, figure out that the given sort.Interface
is an []int
, inline the method
calls and apply the same optimizations. It just doesn't do that. However, inlining is a trade-off, and the compiler
will not be able to infer the concrete type behind a sort.Interface
in many cases.
Strings !=
Numbers
However, when running the same benchmark for strings the results are a bit different:
goos: darwin
goarch: arm64
pkg: github.com/aead/bench
BenchmarkSlicesSort_String-8 858 1404786 ns/op
BenchmarkSort_Strings-8 1040 1155523 ns/op
PASS
ok github.com/aead/bench 2.764s
Even though the compiler should be able to (and does) apply the same optimizations, the slices.Sort
function is now slower than sort.Strings
. Replacing slices.Sort
with slices.SortFunc
and cmp.Compare
decreases performance even further:
goos: darwin
goarch: arm64
pkg: github.com/aead/bench
BenchmarkSlicesSortFunc_Compare-8 572 2112564 ns/op
BenchmarkSlicesSort_String-8 865 1427147 ns/op
BenchmarkSort_Strings-8 1016 1161378 ns/op
PASS
ok github.com/aead/bench 4.227s
This is quite surprising. How can slices.Sort
be faster for []int
and slower
for []string
? It's not specificly optimized for one particular data type. Also, why is
slices.SortFunc
using cmp.Compare
so much slower than the other two functions?
Strings !=
NaN
When comparing two instances of a type (x and y), Compare
returns, as one might expect,
-1 if x < y
, 0 if x == y
and +1 if x > y
. However, there is
one special case: What if x and/or y are NaN floating
point numbers?
// For floating-point types, a NaN is considered less than any non-NaN,
// a NaN is considered equal to a NaN, and -0.0 is equal to 0.0.
Therefore, Compare
first checks whether x is a NaN by evaluating x != x
.
The idea is that, according to the IEEE 754 specification, NaN
is not equal to any value -
including another NaN
. Hence, Compare
looks like this:
func Compare[T Ordered](x, y T) int {
xNaN := x != x
yNaN := y != y
if xNaN && yNaN {
return 0
}
if xNaN || x < y {
return -1
}
if yNaN || x > y {
return +1
}
return 0
}
But why does all of this matter when comparing strings?
Things Being Equal
To generate the machine code for a Compare
call, the Go compiler has to determine the
type of x and y. For example, when x and y are integers, the compiler emits a single CMPQ
instruction (on amd64) for the check xNaN || x < y
. The compiler is smart enough to
notice that x != x
is always false for any value of type int
and eliminates
the xNaN
variable entirely.
When x and y are strings, the compiler inserts a call to runtime.cmpstring
. Comparing two
strings has linear (O(N)) runtime complexity. Therefore, comparing strings is usually slower than comparing
integers. However, the main difference is that the compiler does not recognize that x != x
is always false for strings. It does not apply the optimisation from above. Hence, it also inserts a call
to runtime.memequal
and does not eliminate the xNaN
variable. This results in
calling runtime.memequal
twice and runtime.cmpstring
either once or twice while
an optimal implementation would compare both strings exactly once. These extra calls add up. Especially
when comparing strings that share a common prefix, like long filesystem paths.
At the end of the day, this is a limitation of the Go compiler. While the compiler could recognize that
x != x
is always false and eliminate the comparison, regardless whether x
is
of type int or string, it applies this optimization just for integers. This is unfortunate because the
documentation guides developers to use slices.Sort
which may be faster or slower depending
on the type of your slice.
Update
In Go 1.22, this issue has been resolved - at least to some extent. In particular, CL 503795
has been merged. It adds a dead code elimination optimization to the compiler. The Go 1.22 compiler is now able
to detect that statements like x != x
are always false if x
is a string. Go 1.22 also
changed the sort.Strings
and sort.Ints
implementation to thin wrappers around slices.Sort
.
So most Go programs that sort primitive types, like strings or numbers, will use the slices.Sort
implementation automatically.
However, cmp.Compare
still compares strings twice whenever x >= y
. This will be addressed
in Go 1.23 by CL 578835, which will improve the performance
of functions like strings.Compare
.
Thanks to these changes, sorting and comparing strings in Go will be even faster than before.