Simple techniques to optimise Go programs

kevinconaway · 7 years ago

Most of these techniques sacrifice readability so before implementing them, you should really profile to make sure that the code in question is a hot spot in terms of allocations or CPU usage.

That said, the example using sync.Pool is not quite right. The New function should always return a pointer type to avoid incurring an additional allocation on Get [0]

The code should look like:

  var bufpool = sync.Pool{
    New: func() interface{} {
        buf := make([]byte, 512)
        return &buf
    }}

  b := *bufpool.Get().(*[]byte)

  defer bufpool.Put(&b)

[0] see example in https://golang.org/pkg/sync/#Pool

sjwhitworth · 7 years ago

There's a little at the bottom that encourages caution. In a previous draft, I wrote about the importance of proving hot spots before optimising, but took it out as I wanted this to be more a list of techniques.

On sync.Pool: good spot. I'll update it - thanks.

stebalien · 7 years ago

Unfortunately, that's still wrong (I think). Go will still allocate a temporary slice-header on the heap and then de-allocate it when it's dereferenced.

I believe the only way to do this without unsafe code is to pool these slice-headers as in https://github.com/libp2p/go-buffer-pool/blob/master/pool.go.

shereadsthenews · 7 years ago

The risk of write-profile-optimize is that you might write a big slow program with a totally flat profile. If it doesn’t meet performance requirements at that point, then what?

fwip · 7 years ago

In your hypothetical scenario, you then optimize whatever is easiest to optimize.

kevinconaway · 7 years ago

I think there is a difference between architectural changes that would inhibit the overall performance of an application vs language specific optimizations like the article is referring to.

My comment re write-profile-optimize was geared towards the latter, i.e. using strconv is usually uglier to read than fmt.Sprintf, you probably should leave it alone unless you've profiled and found it to be a problem.

firethief · 7 years ago

Does that ever happen?

nhooyr · 7 years ago

Your example doesn't seem right. Doesn't the Get perform an allocation anyway because you access the value of the pointer and store it in a new variable b? It's very likely b will escape to the heap.

peterwaller · 7 years ago

I agree with nhooyr's analysis. The interface{} will anyway transparently "contain" a pointer-to-the-[]byte, in other words, the []byte value itself will be heap allocated.

(Note for anyone new to this that the "[]byte-value" - we say "the byte slice" - is a distinct thing from the "values stored-in-the-byte-slice", which is a heap allocated backing array)

likeabbas · 7 years ago

So maybe we should sacrifice compiler speed for having a better compiler optimize more readable code?

indentit · 7 years ago

Most compilers skip optimizations when compiling in debug mode, so that it is still quick to iterate/test, then perform the optimizations which take longer to compile for release builds. Obviously it means one can only profile code when it has been compiled in release mode, which I guess has its drawbacks...

SallySwanSmith · 7 years ago

Isn’t that the entire point of higher order programming languages?

kstenerud · 7 years ago

The JSON marshaler actually caches struct serializations so that you don't incur the reflection cost on every run.

cafxx · 7 years ago

> a previous version of this blog post did not specify that the New() function should return a pointer type. This avoids an extra allocation when returning through the interface{} type.

While this is good advice, it's not entirely correct. Even with the current go compiler there are ways to use sync.Pool with non-pointer values without incurring in the extra allocation, e.g. using a second sync.Pool to reuse the interface{}. Although I would not recommend it as it's slower, and much less maintainable.

> The safe way to ensure you always zero memory is to do so explicitly:

  // reset resets all fields of the AuthenticationResponse before pooling it.
  func (a* AuthenticationResponse) reset() {
      a.Token = ""
      a.UserID = ""
  }

I think this is safer, in face of modifications to the AuthenticationResponse structure, and much clearer in its intent:

  // reset resets all fields of the AuthenticationResponse before pooling it.
  func (a* AuthenticationResponse) reset() {
      *a = AuthenticationResponse{}
  }

pcwalton · 7 years ago

> During a garbage collection, the runtime scans objects containing pointers, and chases them. If you have a very large map[string]int, the GC has to check every string within the map, every GC, as strings contain pointers.

This would, of course, be much less of an issue with a generational GC, which doesn't have to scan the entire heap on every collection.

kchr · 7 years ago

> In A/B tests, we tried delaying the page in increments of 100 milliseconds and found that even very small delays would result in substantial and costly drops in revenue. - Greg Linden, Amazon.com

Just curious, do vendors on Amazon get reimbursed for the drops in revenue during tests like this?

dymk · 7 years ago

They'll imperceptible to individual vendors, as it's distributed across thousands. "Substantial" is extrapolated from a small (but just large enough to draw strong statistical conclusions from) A/B rollout.

Should Amazon charge vendors more when they identify where to invest more engineering effort, as a result of these A/B tests, which eventually lead to far higher increased revenue?

Groxx · 7 years ago

I'm going with "yes, it's part of the cost of using Amazon as a service to sell your goods"

tedunangst · 7 years ago

Replacing strings with ints would be more believable with a real world example.

ryanworl · 7 years ago

Dictionary encoding strings to integers is a common compression technique. In GC'd languages it fakes out the garbage collector because your dictionary codes are essentially pointers but not actually pointers from the GC's perspective. You also have the benefit of potentially using a smaller integer than a pointer if you know roughly the number of values you're encoding beforehand.

weberc2 · 7 years ago

Does this hold for Go? The tracing is the only relevant cost and tracing is very, very fast, so I would expect this to be negligible? Or am I misunderstanding the hypothetical scenario?

sjwhitworth · 7 years ago

Good point. Here's one off the top of my head: a giant map that caches some count by date. You can represent this as a map[time.Time]int (native Go time type), a map[string]int (time represented in some string format), or a map[int]int (time represented as an integer like 20190718).

astockwell · 7 years ago

> Allocate capacity in make to avoid re-allocation

Am I the only one who has done this and used append() within a loop, and resulted in a slice that is 2x as long as the original desired length, and the first 1x of items are all empty?

I quickly caught it and fixed the approach (instead of append, I used the `i` as the insertion index into my new empty slice which already had capacity allocated), but the ease with which that subtlety could be overlooked turned me off to this approach unless I profile and really find it's a hot path.

Edit to add: I tried his code, and it resulted in the new slice being as expected.

capo64 · 7 years ago

The make function takes the type, length and capacity. If you don’t supply capacity, it defaults to length. So if you want to use append, do something like: make([]int, 0, length)

That will allocate a slice that can fit length ints but has a length of 0 (so append starts at index 0).

You can call len(slice) and cap(slice) to see the difference, but append will insert an element at index len(slice), growing the capacity if necessary

astockwell · 7 years ago

Ahh thanks for the great explanation! Love HN.

jerf · 7 years ago

make can take three parameters for slices, which is the type, the initial length, and the initial "capacity": https://play.golang.org/p/8d1mS1mGsW6

By default, the slice is filled in and fully initialized. On the one hand, IMHO and IME that's not really what you want if you're going to append things which is a pretty common case, but on the other hand, worrying about "capacity" for slices, while perfectly sensible once you understand the abstraction, is something you can get in trouble with if you're not paying attention, because it makes you really pay attention to the fact the slice has an underlying array.

(Many languages have this concept somewhere, Go just happens to put it in the syntax: A "slice" is an object with a reference to an underlying array, an index of the first element, a length, and a "capacity" which can be smaller than the underlying array has room for. If you want to grow past the underlying array, the runtime then has to create a new one and copy the elements out into that.)

Similarly, if you have a slice of ten elements and you want to take a subset of it and start appending your own bits and pieces, you have to do that carefully: https://play.golang.org/p/QcZOndSp-9x There's a syntax for reducing the capacity of the new slice deliberately, which forces any new "append" to copy to a new slice, producing the behavior you want.

For as tricky as this can get, I'm surprised it doesn't come up more often, but so far in the last five years, this was the source of my bug I believe twice, and one of them I was doing rather inadvisable things anyhow. It turns out that in most "just code" you're not cutting lists up and distributing them various places very often.

ben0x539 · 7 years ago

You're running into the difference between

    make([]T, 0, n)

and

    make([]T, n)

weberc2 · 7 years ago

I occassionally hit this bug, but it's pretty easy to spot after I introduced it a couple of times. Allocations in Go are slow enough that I usually do this by default unless I know that performance absolutely will never be critical (e.g., doing CLI boilerplate that just needs to be fast on human timescales).