前言
最近golang的1.13版本发布了,有很多新特性与改进合入。这里主要分析sync.pool的优化。
本文主要解答以下几个问题:
- sync.pool优化体现在哪里?
- 优化是如何实现?
- 优化的好处有哪些?
优化
具体优化项如下:
- 无锁化
- GC策略
无锁化
sync.pool实现了无锁化,具体如下:
go1.12.1版本实现
1
2
3
4
5
6
|
// Local per-P Pool appendix.
type poolLocalInternal struct {
private interface{} // Can be used only by the respective P.
shared []interface{} // Can be used by any P.
Mutex // Protects shared.
}
|
go1.13版本
1
2
3
4
5
|
// Local per-P Pool appendix.
type poolLocalInternal struct {
private interface{} // Can be used only by the respective P.
shared poolChain // Local P can pushHead/popHead; any P can popTail.
}
|
通过上面对比发现了go1.12版本的Mutex删除了。那么go1.13版本又是如何实现无锁化的呢?
先回答问题:go1.13通过poolChain实现SPMC的无锁队列来实现无锁化。
poolChain是什么东东呢?
别急,代码面前无秘密。
我们具体看一下代码就可以了。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
// poolChain is a dynamically-sized version of poolDequeue.
//
// This is implemented as a doubly-linked list queue of poolDequeues
// where each dequeue is double the size of the previous one. Once a
// dequeue fills up, this allocates a new one and only ever pushes to
// the latest dequeue. Pops happen from the other end of the list and
// once a dequeue is exhausted, it gets removed from the list.
type poolChain struct {
// head is the poolDequeue to push to. This is only accessed
// by the producer, so doesn't need to be synchronized.
head *poolChainElt
// tail is the poolDequeue to popTail from. This is accessed
// by consumers, so reads and writes must be atomic.
tail *poolChainElt
}
type poolChainElt struct {
poolDequeue
// next and prev link to the adjacent poolChainElts in this
// poolChain.
//
// next is written atomically by the producer and read
// atomically by the consumer. It only transitions from nil to
// non-nil.
//
// prev is written atomically by the consumer and read
// atomically by the producer. It only transitions from
// non-nil to nil.
next, prev *poolChainElt
}
|
关于poolChain是如何实现SPMC无锁队列?具体可以分析poolqueue.go的代码。
这一部分不展开说明,要点如下:
- 无锁队列是SPMC
- 无锁队列是可以灵活调整大小,调整大小的方法:slice+double-list实现(根据这个思路来阅读代码也是容易理解 )
- 无锁队列的实现基础是CAS
好处
- 避免锁的开销,mutex变成atomic
GC策略
相比较于go1.12版本,go1.13版本中增加了victim cache。具体作法是:
- GC处理过程直接回收oldPools的对象
- GC处理并不直接将allPools的object直接进行GC处理,而是保存到oldPools,等到下一个GC周期到了再处理
具体代码如下:
1
2
3
4
|
var (
allPoolsMu Mutex
allPools []*Pool
)
|
1
2
3
4
5
6
7
8
9
10
11
12
|
var (
allPoolsMu Mutex
// allPools is the set of pools that have non-empty primary
// caches. Protected by either 1) allPoolsMu and pinning or 2)
// STW.
allPools []*Pool
// oldPools is the set of pools that may have non-empty victim
// caches. Protected by STW.
oldPools []*Pool
)
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
|
func poolCleanup() {
// This function is called with the world stopped, at the beginning of a garbage collection.
// It must not allocate and probably should not call any runtime functions.
// Because the world is stopped, no pool user can be in a
// pinned section (in effect, this has all Ps pinned).
// Drop victim caches from all pools.
for _, p := range oldPools {
p.victim = nil
p.victimSize = 0
}
// Move primary cache to victim cache.
for _, p := range allPools {
p.victim = p.local
p.victimSize = p.localSize
p.local = nil
p.localSize = 0
}
// The pools with non-empty primary caches now have non-empty
// victim caches and no pools have primary caches.
oldPools, allPools = allPools, nil
}
|
这样可导致Get的实现有变化,原来的实现是:
- 先从本P绑定的poolLocal获取对象:先从本poolLocal的private池获取对象,再从本poolLocal的shared池获取对象
- 上一步没有成功获取对象,再从其他P的shared池获取对象
- 上一步没有成功获取对象,则从Heap申请对象
引入victim cache,Get实现变成如下:
- 先从本P绑定的poolLocal获取对象:先从本poolLocal的private池获取对象,再从本poolLocal的shared池获取对象
- 上一步没有成功获取对象,再从其他P的shared池获取对象
- 上一步没有成功,则从victim cache获取对象
- 上一步没有成功获取对象,则从Heap申请对象
具体代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
|
func (p *Pool) getSlow(pid int) interface{} {
// See the comment in pin regarding ordering of the loads.
size := atomic.LoadUintptr(&p.localSize) // load-acquire
locals := p.local // load-consume
// Try to steal one element from other procs.
for i := 0; i < int(size); i++ {
l := indexLocal(locals, (pid+i+1)%int(size))
if x, _ := l.shared.popTail(); x != nil {
return x
}
}
// Try the victim cache. We do this after attempting to steal
// from all primary caches because we want objects in the
// victim cache to age out if at all possible.
// 尝试从victim cache获取
size = atomic.LoadUintptr(&p.victimSize)
if uintptr(pid) >= size {
return nil
}
locals = p.victim
l := indexLocal(locals, pid)
if x := l.private; x != nil {
l.private = nil
return x
}
for i := 0; i < int(size); i++ {
l := indexLocal(locals, (pid+i)%int(size))
if x, _ := l.shared.popTail(); x != nil {
return x
}
}
// Mark the victim cache as empty for future gets don't bother
// with it.
atomic.StoreUintptr(&p.victimSize, 0)
return nil
}
|
好处
- 空间上通过引入victim cache增加了Get获取内存的选项,增加了对象复用的概率
- 时间上通过延迟GC,增加了对象复用的时间长度
- 上面这个两个方面降低了GC开销,增加了对象使用效率
参考
- 深入分析Golang sync.pool
- Using sync.Pool