Ejabberd: 18.12 offline_message_hook crashed when running (mod_offline, mod_mam)

Created on 9 Jan 2019  路  19Comments  路  Source: processone/ejabberd

Environment

  • ejabberd version: 18.12
  • Erlang version: Erlang/OTP 21 [erts-10.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
  • OS: Mac OSX 10.12.6 (HFS+ root filesystem)
  • Installed from: Binary distribution
  • Uses LDAP for mod_vcard, sql/postgres for everything else

Errors from error.log (nothing in crash.log unfortunately)

2019-01-09 07:24:10.366 [error] <0.6141.11>@ejabberd_hooks:safe_apply:384 Hook offline_message_hook crashed when running mod_mam:offline_message/1:
 ** Reason = {error,function_clause,[{mod_mam,offline_message,[{file,"src/mod_mam.erl"},{line,379}],[{atomic,ok}]},{ejabberd_hooks,safe_apply,[{file,"src/ejabberd_hooks.erl"},{line,380}],4},{ejabberd_hooks,run_fold1,[{file,"src/ejabberd_hooks.erl"},{line,364}],4},{ejabberd_sm,route,[{file,"src/    ejabberd_sm.erl"},{line,142}],1},{ejabberd_local,route,[{file,"src/ejabberd_local.erl"},{line,72}],1},{ejabberd_router,do_route,[{file,"src/ejabberd_router.erl"},{line,368}],1},{ejabberd_router,route,[{file,"src/ejabberd_router.erl"},{line,92}],1},{ejabberd_c2s,check_privacy_then_route,[{file,"src/ejabberd_c2s.erl"},{line,831}],2}]}
 ** Arguments = [{atomic,ok}]
 2019-01-09 07:24:10.366 [error] <0.6141.11>@ejabberd_hooks:safe_apply:384 Hook offline_message_hook crashed when running mod_offline:store_packet/1:
 ** Reason = {error,function_clause,[{mod_offline,store_packet,[{file,"src/mod_offline.erl"},{line,395}],[{atomic,ok}]},{ejabberd_hooks,safe_apply,[{file,"src/ejabberd_hooks.erl"},{line,380}],4},{ejabberd_hooks,run_fold1,[{file,"src/ejabberd_hooks.erl"},{line,364}],4},{ejabberd_sm,route,[{file,  "src/ejabberd_sm.erl"},{line,142}],1},{ejabberd_local,route,[{file,"src/ejabberd_local.erl"},{line,72}],1},{ejabberd_router,do_route,[{file,"src/ejabberd_router.erl"},{line,368}],1},{ejabberd_router,route,[{file,"src/ejabberd_router.erl"},{line,92}],1},{ejabberd_c2s,check_privacy_then_route,[{file,"src/ejabberd_c2s.erl"},{line,831}],2}]}
 ** Arguments = [{atomic,ok}]

ejabberd.yml

Gist is here: https://gist.github.com/scottcc/e94f6b5a6e87da1cfe227e470f8163f3

Custom mod_apns.erl

Gist is here: https://gist.github.com/scottcc/a68d71b9ad1bc2bf63c5e53c56bf91fe

Bug description

We've seen this in our test system on 17.09 as well, which was NOT using PostgreSQL (just default Mnesia) so we are currently thinking that is not the case. What we see is:

If a user sends a message to another user (from their roster) who is offline and has not ever logged in at least once, we see the above offline_message_hook crash. We believe (but are not certain) this is correlated to the fact that the archive table in Postgres only seems to have the "sender" side of the message stored. If a sender sends a message to a receiver who has connected/authenticated to XMPP (c2s I believe, we're using the XMPPFramework for iOS here), then a second message does correctly get created that we can see in the archive table.

We build the entire system roster nightly and populate ejabberd from an external process, and it's all visible using admin tools (command line & web).

The problem is that it appears that unless every user connects at least once, messages sent to them prior to connecting/authenticating are not saved at the server, so they're lost when they log in.

Example that appeared to work

I can find a user (called receiver-ok) that when logging in, correctly gets all messages. By turning the debug logs up to 5 I can see this is the query that ejabberd executes: SELECT timestamp, xml, peer, kind, nick FROM archive WHERE username='receiver-ok' and timestamp >= 1544388452000000 ORDER BY timestamp ASC limit 251;. If I find that first message, I can see the id column within the xml, and search by that ID to see both the sender and receiver-ok messages:

SELECT * FROM archive WHERE xml like '%CC68FF24-2F54-4471-9A69-D353545207A2%' ORDER BY timestamp ASC;

-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------
username   | sender
timestamp  | 1545338566671451
peer       | [email protected]
bare_peer  | [email protected]
xml        | <message xml:lang='en' to='[email protected]' from='[email protected].
           |..com/1575491376206722324522594' type='chat' id='CC68FF24-2F54-4471-9A69-D353545207A2' xmlns='jabber:client'><fileTyp.
           |.e>motd</fileType><body>Test Message! </body></message>
txt        | Test Message!
id         | 25
kind       | chat
nick       |
created_at | 2018-12-20 15:42:46.671684
-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------------------
username   | receiver-ok
timestamp  | 1545338566673422
peer       | [email protected]/1575491376206722324522594
bare_peer  | [email protected]
xml        | <message xml:lang='en' to='[email protected]' from='[email protected].
           |..com/1575491376206722324522594' type='chat' id='CC68FF24-2F54-4471-9A69-D353545207A2' xmlns='jabber:client'><fileTyp.
           |.e>motd</fileType><body>Test Message! </body></message>
txt        | Test Message!
id         | 26
kind       | chat
nick       |
created_at | 2018-12-20 15:42:46.673717

Above we can see that the second message has a username of receiver-ok and thus the ejabberd sql finds it when they get their first set of messages.

Example that appears to fail

Here, the receiver is receiver-BAD - I can find the sender's message and query by the embedded id again in the xml:

SELECT * FROM archive WHERE xml like '%F8C9D7F8-71F2-4E12-B18E-DAB27DE29AD7%' ORDER BY timestamp ASC;

-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------------------
username   | sender
timestamp  | 1546953994224971
peer       | [email protected]
bare_peer  | [email protected]
xml        | <message xml:lang='en' to='[email protected]' from='[email protected]/1404.
           |.660110658050889491362' type='chat' id='F8C9D7F8-71F2-4E12-B18E-DAB27DE29AD7' xmlns='jabber:client'><fileType>motd</f.
           |.ileType><body>Bonjour 脿 tous! Ne pas oublier l&apos;appel conference 脿 9am. </body></message>
txt        | Bonjour 脿 tous! Ne pas oublier l'appel conference 脿 9am.
id         | 106
kind       | chat
nick       |
created_at | 2019-01-08 08:26:34.225113

Above we only see the sender's message - there's no second message so when receiver-BAD logs in, they do NOT get the message, but the sender thinks they have it.

This is the highest priority thing I've got, any suggestions whatsoever are SO VERY WELCOME. I see two things:

  1. Why is offline_message_hook crashing? Is this what is causing the second message not to save?
  2. Should I expect the second message to be saved for users that have not connected at least once?

So much thanks in advance. So much.
./scc

All 19 comments

Dumb question: is there any chance that this fix in 18.12.1 would address this?

https://github.com/processone/ejabberd/commit/44dfc108ca86e1d85f570380d39bfa1410047d17

Second dumb question: would 18.12.1 have better stack traces?

Looks like some third-party module is using the hook incorrectly, i.e. it returns incorrect accumulator value.

@zinid thanks so much for that - how might I diagnose it in any way further? We only have one third party module for mod_apns, did the method signature change b/w 17.09 and 18.12?

@scottcc yes it did

Where is the code for mod_apns?

Oh my @#$#@. How might I hunt down the diff? We can disable to detect, and one moment I'll add the mod_apns as a gist...

Yeah, the fix is typically trivial, just show the code.

You can actually just show the code of the function hooked at offline_message_hook

Here's that gist: https://gist.github.com/scottcc/a68d71b9ad1bc2bf63c5e53c56bf91fe

which I'll also add to the above section too.

Is this from ejabberd_mod_apns?

Yes, I believe that's exactly it, but has (I believe) a slight modification to handle finding one of many APNS certs, perhaps clumsily.

Well, the code from that repo has been updated to support newest ejabberd API.

I see - I'm now looking into the diffs there, we should be able to redo that. Thanks so much for your help @zinid. Really - fantastic! 馃帀

To finish, this does not surprise you that it doesn't work > 16.09 right?

Basically, if you don't want to mess with diffs, just return the same argument from the function as you get it into the function, i.e.

message({_, Packet} = Acc) ->
   ...
   Acc.

@scottcc right

Sadly my erlang is nowhere near yours. Non-existent actually. However! We have someone here who knows it slightly better than I do. Super helpful diff there too. Sounds like this issue can be closed, PEBKAC.

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

sujankumar4593 picture sujankumar4593  路  4Comments

rahul-l picture rahul-l  路  3Comments

irvingwa picture irvingwa  路  3Comments

licaon-kter picture licaon-kter  路  4Comments

Vshnv picture Vshnv  路  4Comments