GRPC KeepAlive 设置参数
Keepalive
使用gRPC的stream模式时,遇到网络波动,Recv阻塞没有接受到信息,在gRPC的默认设置下,是会长时间等待,造成假死的现象。
这种情况下,需要使用gRPC的Keepalive机制,无论客户端与服务端哪一方出现网络波动,在一定时间内Ping没有得到回应, 就需要断开连接,程序内部处理尝试重连。
Server
type ServerParameters struct {
// MaxConnectionIdle is a duration for the amount of time after which an
// idle connection would be closed by sending a GoAway. Idleness duration is
// defined since the most recent time the number of outstanding RPCs became
// zero or the connection establishment.
MaxConnectionIdle time.Duration // The current default value is infinity.
// MaxConnectionAge is a duration for the maximum amount of time a
// connection may exist before it will be closed by sending a GoAway. A
// random jitter of +/-10% will be added to MaxConnectionAge to spread out
// connection storms.
MaxConnectionAge time.Duration // The current default value is infinity.
// MaxConnectionAgeGrace is an additive period after MaxConnectionAge after
// which the connection will be forcibly closed.
MaxConnectionAgeGrace time.Duration // The current default value is infinity.
// After a duration of this time if the server doesn't see any activity it
// pings the client to see if the transport is still alive.
// If set below 1s, a minimum value of 1s will be used instead.
Time time.Duration // The current default value is 2 hours.
// After having pinged for keepalive check, the server waits for a duration
// of Timeout and if no activity is seen even after that the connection is
// closed.
Timeout time.Duration // The current default value is 20 seconds.
}
服务端的设置,主要看 Time 和 Timeout 参数
Time 指服务端在Time时间内未接收到来自客户端的活动, 比如在stream的过程中,没有接收到数据,就会发送Ping,检测连接是否还正常。默认为2小时, 最好定制该参数缩短时间,及时发现及时重试。我在公司的项目设置 2 * time.Minute
Timeout 指在发送上面的 Ping 后,如果在 Timeout 的时间内客户端没有响应,服务端会关闭此连接。
Client
type ClientParameters struct {
// After a duration of this time if the client doesn't see any activity it
// pings the server to see if the transport is still alive.
// If set below 10s, a minimum value of 10s will be used instead.
Time time.Duration // The current default value is infinity.
// After having pinged for keepalive check, the client waits for a duration
// of Timeout and if no activity is seen even after that the connection is
// closed.
Timeout time.Duration // The current default value is 20 seconds.
// If true, client sends keepalive pings even with no active RPCs. If false,
// when there are no active RPCs, Time and Timeout will be ignored and no
// keepalive pings will be sent.
PermitWithoutStream bool // false by default.
}
同样地,客户端也需要设置 Time 和 Timeout,保证连接能够及时重试。
客户端的默认 Time为 +inf, 这样的话客户端不会进行Ping的发送。我在项目中改为了 2 * time.Minute。这样两端都有基本的重试机制的保证。
简单实验
在服务端和客户端各自设置好 Keepalive 时间后,本地开启并互相连接上,然后尝试断网。在一段时间后,日志会输出以下错误:
code = Unavailable desc = transport is closing
说明 Keepalive 生效,并且因未接收到连接关闭了 gRPC 连接。