在生产环境,突然发现consumer调用provider的时候 报错forbidden。
去zk路径上面查看,发现provider在zk的/dubbo/provder-interfacename上面丢失了。
怀疑跟zk机制有关,所以能否让provider也定时检测自己是否在zk上面存活
我自己在测试环境做实验,发现手动通过zkCli.sh删除zk路径上的provider,provider无动于衷,因为provider只监听configures目录的事件。
另外consumer会监听自己消费的provider路径事件,provider丢失以后,consumer将会无法调用到consumer
zk本身属于三方组件,我理解dubbo应该增加对于zk的容错机制,不能完全依赖zk的watch事件
provider定时检测在zk是否存活,发现挂了就重新注册?
provider定时检测在zk是否存活,发现挂了就重新注册?
确切的说是检测自己是否还在zk的路径上,如果不在,那么其他consumer肯定也无法调用这个provider,如果zk没有通知provider zk-reconnect事件,那么这个provider永远无法被调用到。
是的,每个provider根据host、port只需要关注自己是否还在,不需要管别的provider哪怕是同一个version吗,代码我已经写好了、自测可以解决provider丢失的情况,我回头提一下merge,看看能否可行
这个提交 不知是否可以解决这个:
https://github.com/apache/incubator-dubbo/pull/2975
那dubbo服务禁用是怎么实现的?是不是通过删除zk上的服务数据?
那dubbo服务禁用是怎么实现的?是不是通过删除zk上的服务数据?
the provider’s url has a parameter called ‘enabled’, After this parameter is set to false, and the registry will notify the client. When the client resolves the url, it will filter out the url with this property, so as to disable the service
provider的url里面有一个属性enabled,默认情况下为true,可以设置enabled=false,设置过后,注册中心会通知客户端,客户端在解析该url的时候就会把具有该属性的url过滤掉,这样就可以达到禁用服务的目的
@notlate zk不是提供了和 provider 的类似心跳的机制, provider 失效会剔除 zk 的临时节点, 进而通知consumer , 为什么还要provider 自检查?
provider定时检测在zk是否存活,发现挂了就重新注册?
确切的说是检测自己是否还在zk的路径上,如果不在,那么其他consumer肯定也无法调用这个provider,如果zk没有通知provider zk-reconnect事件,那么这个provider永远无法被调用到。
是的,每个provider根据host、port只需要关注自己是否还在,不需要管别的provider哪怕是同一个version吗,代码我已经写好了、自测可以解决provider丢失的情况,我回头提一下merge,看看能否可行
这个应该是你ZK重启后丢失吧?
@notlate
It is very rare zk will lose a provider's entry for the reason caused by itself. if your provider runs in a good condition. But it happens very often that the consumer cannot consume provider's service within the provider's downtime before re-connected with zk.
So it is more important to find out the reasons why your provider lost connection with zk? (I guess somehow zk client in your provider does not regularly report to zk I am still alive).
Let me know if it works on you.
regards.
@notlate
It is very rare zk will lose a provider's entry for the reason caused by itself. if your provider runs in a good condition. But it happens very often that the consumer cannot consume provider's service within the provider's downtime before re-connected with zk.So it is more important to find out the reasons why your provider lost connection with zk? (I guess somehow zk client in your provider does not regularly report to zk I am still alive).
Let me know if it works on you.
regards.
My sentiment exactly!
should we close it, @notlate @chickenlj ?
@notlate zk不是提供了和 provider 的类似心跳的机制, provider 失效会剔除 zk 的临时节点, 进而通知consumer , 为什么还要provider 自检查?
想请问consumer是如何接受zk通知?正为这问题发愁
Most helpful comment
the provider’s url has a parameter called ‘enabled’, After this parameter is set to false, and the registry will notify the client. When the client resolves the url, it will filter out the url with this property, so as to disable the service
provider的url里面有一个属性enabled,默认情况下为true,可以设置enabled=false,设置过后,注册中心会通知客户端,客户端在解析该url的时候就会把具有该属性的url过滤掉,这样就可以达到禁用服务的目的