salt.states.zk_concurrency.lock(name, zk_hosts, identifier=None, max_concurrency=1, timeout=None, ephemeral_lease=False)
The identifier parameter to zk_concurrency.lock() and zk_concurrency.unlock() is optional and defaults to 'None'. The zk_concurrency.unlock() function however does not release any locks created with a 'None' identifier
A running zookeeper server, a salt master and two minions. The standard saltstack vagrant demo environment with zookeeper added to the master would probably suffice.
Note: it is also necessary to install the python kazoo module (via pip) on the minions, as the zk_concurrency module is dependent on it.
salt 'minion1' zk_concurrency.lock /testlock [zkipaddr]:[zkport]minion1: Truesalt 'minion2' zk_concurrency.lock /testlock [zkipaddr]:[zkport]salt 'minion1' zk_concurrency.unlock /testlock [zkipaddr]:[zkport]minion1: TrueCtrl-c out of any waiting consoles (ssh session B) from above and now repeat adding the identifier parameter to the commands, also note if you run the tests after the steps above you need to change the lock path as well (eg /testlock2):
salt 'minion1' zk_concurrency.lock /testlock2 [zkipaddr]:[zkport] id1minion1: Truesalt 'minion2' zk_concurrency.lock /testlock2 [zkipaddr]:[zkport] id2salt 'minion1' zk_concurrency.unlock /testlock [zkipaddr]:[zkport] id1minion1: Trueminion2: TrueSalt Version:
Salt: 2015.8.8.2
Dependency Versions:
Jinja2: 2.2.1
M2Crypto: 0.20.2
Mako: Not Installed
PyYAML: 3.11
PyZMQ: 14.5.0
Python: 2.6.6 (r266:84292, Jul 23 2015, 15:22:56)
RAET: Not Installed
Tornado: 4.2.1
ZMQ: 4.0.5
cffi: Not Installed
cherrypy: Not Installed
dateutil: Not Installed
gitdb: Not Installed
gitpython: Not Installed
ioflo: Not Installed
libgit2: Not Installed
libnacl: Not Installed
msgpack-pure: Not Installed
msgpack-python: 0.4.6
mysql-python: Not Installed
pycparser: Not Installed
pycrypto: 2.6.1
pygit2: Not Installed
python-gnupg: Not Installed
smmap: Not Installed
timelib: Not Installed
System Versions:
dist: centos 6.7 Final
machine: x86_64
release: 2.6.32-573.el6.x86_64
system: CentOS 6.7 Final
@jjc001 i will setup a test case tomorrow and post results.
I am able to replicate this behavior. Here is my test case:
[root@saltmaster zookeeper-3.4.8]# salt '*' zk_concurrency.lock /testlock 127.0.0.1:2181
saltmaster.local:
True
[root@saltmaster zookeeper-3.4.8]# salt '*' zk_concurrency.lock /testlock 127.0.0.1:2181
As shown above it is hanging state, which at this point is to be expected.
[root@saltmaster ~]# salt '*' zk_concurrency.unlock /testlock 127.0.0.1:2181
saltmaster.local:
True
@ksvasan This might be an issue you want to be aware of, since you're one of the main authors of this state.
@rallytime let me have a look at this. I believe this is a RAII like behavior which is needed here. Let me fix this and send you a PR.
@ksvasan That would be great! Thanks!
@rallytime I guess I've the root cause for this issue. The reason for this issue is that, when we call zk_concurrency.lock(), the identifier is by default None. But the kazoo client is actually translating the identifier to an empty string ("") while creating the child node. Until this point it is fine.
But to identify a host/process which has already acquired a lock (in modules/zk_concurrency.py) we use this comparison [identifier == data.decode('utf-8')], here the identifier is None, but the decoded data is an empty string, hence this comparison is not returning true and we're unable to find the matching node. Hence unlock() is unable to find the node for this host/process and leaves is_acquired flag as False, which results in unlock returning true without deleting any zookeeper node.
Lets come to the case of existing correct blocking behavior. This is because the default max_concurrency is 1 and when we already have one child node created by lock() call (this cannot be freed by unlock as mentioned above) the other will continue to block even after unlock.
Solution: We should have a default identifier as the hostname of the host where the minion is executing (if people require exotic thread/process separation they should supply an identifier). If we had this all our problems would be solved and we can use the functionality correctly.
Let me know your thoughts/comments, before I send you a pull request.
@ksvasan I think that sounds like a good solution. Thank you for the thorough explanation. I think the only thing I have to add is to just be sure that you're documenting any changes that might be necessary to add the default identifier so users of this module can easily know of this change.
@jjc001 This should be fixed by #33663. Are you in a position to give that fix a try?
@rallytime will try to get to it today but if not should be able to give it a go on Monday 6th.
@rallytime and @ksvasan I've re-run my tests using the develop branch and everything appears to work as it should now. I also ran the zk_concurrency.lock_holders execution module and it is reporting the hostname as the lock holder when an id isn't passed and the passed id when it is. So I'm happy with the fix.
One other important side note of this issue is that, this _will not_ happen when we're executing this through a state file (state.apply). This is because the salt "state" zk_concurrency.py is already rewriting the identifier=None to identifier=grains['id'].
Thanks @jjc001 for your confirmation of the fix.
Thanks for confirming @jjc001! I'll close this up.