Cht-core: Add RapidPro as an SMS Gateway

Created on 10 Jul 2020  路  31Comments  路  Source: medic/cht-core

The I-Tech project is moving forward with scale-up in Zimbabwe. The care protocol relies heavily on using two-way-text to communicate between patients and facility level providers. Using medic-gateway running on android to send and receive SMS using medic-api APIs has some issues including being dropped from the play store, the app going to sleep, low throughput, high latency, etc. And, the CHT's current sms aggregator integration with Africa's Talking does not cover Zimbabwe.

For this project, we need to choose an sms aggregator service provider. Infobip looks like the best option and has coverage in other countries that may be relevant for future projects. As an added benefit, it looks like RapidPro already supports Infobip (needs verified) if we want to consider using their service in the future. Infobip demo applications includes resources for integrating with their API's.

cc: @benkags @derickl

3 - Low SMS Feature

Most helpful comment

After digging into the config for the project mentioned, it appears that scenario 1 is used at times, where RapidPro is being used for outgoing messages from the CHT:
"relevant_to": "doc.type === 'data_record' && doc.tasks && doc.tasks[0] && doc.tasks[0].state && doc.tasks[0].state === 'pending'",
These outgoing messages could either be sent late, not at all, or multiple times. The status of these message will also be unknown to the person who sent them (eg the status is not know in the CHT app UI). Given that, I think we should still prioritize this issue highly and work with app developers of existing/upcoming deployments to make sure it meets their needs so that they would be able to use is when it is released..

All 31 comments

@derickl can you confirm that Infobip is the sms aggregator service provider that we want to use for Zimbabwe?

Yes. Is the aggregator of choice. cc @benkags

@derickl @benkags I have a few followup questions about the implementation of this...

  1. Our SMS integration is currently limited by only allowing a single outgoing SMS aggregator to be defined. Can you confirm for this project that Infobip will be able to send SMS to all users irrespective of carriers?
  2. Similarly currently the SMS aggregator can only forward SMS to one CHT instance. If you need multiple instances (one per branch, a separate instance for training, etc) you will need a separate phone number and Infobip conifg. Will that work for you?
  3. Will you be transitioning to a short code? If so, will you keep the gateway running for people still using the old phone number, or shut it down?
  4. Will you use RapidPro or similar? It may be better to have CHT -> RapidPro -> Infobip, rather than CHT -> Infobip...

@garethbowen I do not have an answer to the first question but I have reached out to Infobip and as soon as I have a reply, will post here.

  1. For the 2 way texting(2WT) system in Zimbabwe, one instance would suffice.
  2. Yes. Clients respond when prompted by the system; it is therefore easy to make that transition even with an ongoing deployment.
  3. RapidPro has not been considered for the Zimbabwe context but you are right it may be worth bringing it up in those scale up discussions if and when they happen @SMurithi

Infobip got back and I had a call with a representative(Jeff). A follow up call with the technical team was suggested to get more info on integration including whether a single integration supports multiple carries out of the box and any other technical concerns we may have.

Here is some related useful information from that conversation:-

  • Short codes in Zimbabwe are issued by Econet, the largest Telco and issuance is currently limited to banks. The representative indicated there maybe consider approvals for some cases e.g say in our case we involved the government.
  • The customer does the application via an Econet application portal and Infobip hooks it up to their systems. For example Medic Mobile would do the application, and Infobip would hook it to their systems after we have been issued the short code.
  • Infobip issues virtual long numbers as an alternative. This is a number that is similar to a normal mobile phone number in Zimbabwe. The rest of the process would typically be similar to a short code integration.

@garethbowen I ~was going to respond~ have responded to Jeff with question 1 above and requested for details on how to go about the integration and copied you. FYI, Jeff indicated that we may need to purchase a virtual long number for the technical teams on their end to facilitate an integration and I indicated what we would be looking to have a generic integration. This may be something to clear up some more with a technical person. Are there any specific queries other than 1 above that you would like to put forward to their technical team and hopefully get us a call or a more constructive direction?

Thanks @benkags !

The generic implementation would have a configurable API key issued by Infobip after the phone number (short or long) is set up. This is very similar to the AT integration so I expect it will work well.

RapidPro has not been considered for the Zimbabwe context but you are right it may be worth bringing it up in those scale up discussions if and when they happen

If RapidPro can work with Infobip then we can integrate with no additional Product development, so in many ways this would be preferable from our point of view. It also gives much more flexibility for messaging so it may be more future-proof. I think this would be worth investigating before developing a custom integration directly from the CHT.

Are there any specific queries other than 1 above that you would like to put forward to their technical team and hopefully get us a call or a more constructive direction?

No, that's all I can think of right now.

@benkags Can you confirm if Medic is going to be hosting this project?

It is not clear at this point. @SMurithi correct me if I am wrong but as I understand it, the partner is yet to give us actionable information.

Correct @benkags MM will continue to host on behalf of partner.
I will advise otherwise if anything changes down the road

From the project and eng team, it sounds like we can use the RapidPro integration with Infobip, but it requires some code modifications to deploy with the CHT. @garethbowen has details. Please consult with him before picking up.

The planned SMS workflows for scale-up have the following cadence:

  • A successfully enrolled patient will receive automated SMS messages through day 7, even if they do not respond to any of the previous messages
  • If no patient response to SMS messages through day 7, a Task follow-up for a CHW would be automatically generated
  • If a patient responds to an SMS health inquiry with symptoms anytime within a 14 day window, direct messaging between the patient and nurse is initiated

@garethbowen do you need any additional detail to inform the implementation pathway?

Pulling in from slack conversation...

For implementation our choices are to:

  1. ~Write a bespoke SMS aggregator integration to Infobip (just like the Africa's Talking one)~
  2. Write a bespoke SMS aggregator integration to RapidPro
  3. Use RapidPro and Outbound Push and integrate RapidPro to Infobip (probably no Core dev required)

Option 2 seems like the best option since it would allow the CHT to treat RapidPro like a simple relay service so we can immediately support every SMS aggregator that RapidPro supports.

Option 3 is doable as a prototype (and hack), and requires logic in RP that we may be better off productizing in CHT to avoid difficulties in production/deployment

After discussing with @garethbowen it seemed as though option 3 would be preferred since we can quickly spin it up, and using flows gives more flexibility for handling multiple gateways.

I am putting notes here after exploring that further, but the summary version is that _webapp terminating_ messages can easily and reliably be handled with existing features, whereas _webapp originating_ messages have some limitations that can delay when messages get sent, and also cause duplicates to be sent.

Webapp Terminating

Messages that are sent to the CHT can be handled by RapidPro with a simple flow that starts with a trigger for "messages not handled anywhere else".

The flow only needs to contain a webhook and error handling
image

The headers, and body must be set:

@(json(object(
  "messages", array(
    object(
      "id", run.uuid,
      "from", replace(urns.tel,"tel:+", "+"),
      "content", results.message.value,
      "sms_sent", epoch(run.created_on),
      "sms_received", epoch(now())

    ),
  "updates", array()
  )
)))

Error handling could include retries, logging, and messages to the sender to let them know that their message was not processed.

Webapp Originating

Reusing the choices from the comment above, here are the ways that messages could be sent from the CHT to people:

  1. ~Write a bespoke SMS aggregator integration to Infobip~
  2. CHT-RapidPro messaging integration: the CHT would call the broadcast API to send messages, and poll it to check the status.
    _Pros: no need to build or manage a RapidPro flow_
    _Cons: more involved feature in CHT yet less configurable and doesn't handle multiple RapidPro channels_
  3. CHT-outbound push to RapidPro: A flow in RapidPro can be triggered when the state of CHT messages go to pending.
    _Pros: handles multiple gateways and processing in RapidPro flows._
    _Cons: difficult to reliably trigger flow, duplicates are possible without more work_

The way the outbound push was prototyped was to trigger a flow when a message's state became pending. The flow would then call the SMS endpoint to get a list of _all_ messages that need to be sent, process them in small batches to send them and report the status back to the CHT. This made it easier to catch and retry any messages that failed to send previously, but makes duplicates theoretically possible if the flow was triggered multiple times before it completes and they get the same set of messages. This may have changed with recent improvements to the SMS API.

Here is what the outbound push prototype config looked like:

  "outbound": {
    "textit-gateway": {
      "relevant_to": "doc.type === 'data_record' && doc.tasks && doc.tasks[0] && doc.tasks[0].state && doc.tasks[0].state === 'pending'",
      "destination": {
        "base_url": "https://textit.in",
        "auth": {
          "type": "header",
          "name": "Authorization",
          "value_key": "textit.in"
        },
        "path": "/api/v2/flow_starts.json"
      },
      "mapping": {
        "flow": {
          "expr": "'abcdef1234567890'"
        },
        "urns": {
          "expr": "[ 'tel:' + doc.tasks[0].messages[0].to ]",
          "optional": false
        }
      }
    },

Note that the above outbound push config has limitations in that it would only trigger for the first message in tasks. Also, SMS schedules, which are in scheduled_tasks, are not being considered.

This option of triggering a flow is still advantageous since it permits more flexible channel setups, but we need the following two improvements:

  • Make sure that messages are sent without delay perhaps by identifying (or creating) a better CHT doc trigger when there are new pending messages, and not being so dependent on the message configuration.
  • Find a reliable way to avoid duplicates by reviewing the new SMS API behaviour, or perhaps by making the outgoing message flow non re-entrant. There is no known easy way to do this in RapidPro. One option considered in the past is to call a webhook to set and unset a semaphore in a webservice/CHT, which doesn't seem particularly robust.

After discussion with @abbyad we've settled on option 2: writing an SMS aggregator in API to send and receive messages via RapidPro. This work should be started soon but should not block 3.11.0.

@binokaryg Can you clarify whether this issue is still a priority for I-Tech Zimbabwe? If I-Tech Zimbabwe is migrating to use outbound push RapidPro integration, then this feature does not add value for I-Tech Zimbabwe. Can you confirm that I-Tech Zimbabwe is indeed migrating to outbound push RapidPro integration?

@kitsao Can you clarify that MSF Goma does not have plans to migrate from outbound push integration with RapidPro to use CHT-SMS capabilities?

We have used outbound push to RapidPro in ITECH Aurum and planning to reuse the same method in the ITECH Zimbabwe scaleup.
Proper integration with RapidPro with delivery status in the future would still be helpful.

@binokaryg, where is the logic for the messaging flows? Which scenario is it:

  1. Using SMS workflows in the CHT, with outbound push to get the actual SMS to the network
  2. Using CHT to trigger a RapidPro flow that contains additional messaging/content logic
  3. Something else

Also, it would be helpful to see the relevant configs -- could you post a link or snippet of the outbound push?

After digging into the config for the project mentioned, it appears that scenario 1 is used at times, where RapidPro is being used for outgoing messages from the CHT:
"relevant_to": "doc.type === 'data_record' && doc.tasks && doc.tasks[0] && doc.tasks[0].state && doc.tasks[0].state === 'pending'",
These outgoing messages could either be sent late, not at all, or multiple times. The status of these message will also be unknown to the person who sent them (eg the status is not know in the CHT app UI). Given that, I think we should still prioritize this issue highly and work with app developers of existing/upcoming deployments to make sure it meets their needs so that they would be able to use is when it is released..

This is ready for AT on 6532-rapid-pro-sms-gateway. Documentation PR here: https://github.com/medic/cht-docs/pull/462

I have tested this using their simulator. Using the example defined in the docs PR acts a basically a forwarding mechanism from SMS to CHT-Core and back.
I was able to send messages and forms with the standard config. I received responses as well to my test phone.

I set up a flow and was able to generate a filled out form as well and get the responses.

Checking without token, with invalid token, results that aren't configured all are handled but I think there is a bit of change that could make it better.

When we save nothing because there is an issue our response code is 200. The flows in rapid-pro will acknowledge that as being successful. I think the response code needs to be 400 in this case. The failure state of a flow will not be triggered.

The logs showing the missing value that also returns a 200.

Apr 07 15:01:47 dev-gamma-b dev-gamma-b-medic-api-logs: (dev-gamma-b-58dd8f66f7-fmlbr) | [2021-04-07 19:01:47] 2021-04-07 19:01:47 WARN: Message missing required field "id": {} 
Apr 07 15:01:47 dev-gamma-b dev-gamma-b-medic-api-logs: (dev-gamma-b-58dd8f66f7-fmlbr) | [2021-04-07 19:01:47] RES 527276ef-1d8a-4e38-b9f5-a3d539f03a8a 34.236.102.117 - POST /api/v1/sms/radpidpro/incoming-messages HTTP/1.1 200 11 4.611 ms

Thanks for the feedback. How would you prefer the error message to send back on error?

That's a good question. I don't know what would be useful in our use cases. The failing flow means we could prompt back with a message saying X is invalid fix X. So at least an indication of why we didn't save anything so the configurer of the flow could respond to the messenger they need to provide the correct value.

I've change the code to check if no messages were created ({ saved: 0 }) and return a 400 with a message when this happens.
Given that more endpoints (africas-talking and gateway /sms) use the exact same function to create messages, I'm reluctant to change the way it works so late in the dev cycle, so it actually returns validation errors.

I think the 400 is well enough at this point. An additional feature request can be logged detailing the needs for a failure state if that is not enough.

I think this is ready to merge. Sending and receiving is working well. Failure states are hitting the flows correctly.

Thanks @newtewt .

Another part of the RapidPro workflow is outgoing messages getting correct states from RapidPro (it involves querying all messages that are in a non-final state that exist in the medic database), and also backing off from querying when RapidPro starts returning 429s (rate-limited).

Is that working as expected as well?

I'm getting the responses from cht-core through rapidpro to my phone. Saw the message go through the different states, received by gateway, delivered.

EX: Thank you Contact for registering Patient. Their ID is 12345. If they are pregnant, please enroll in ANC with the P form.

How can I tell that I'm getting 429 vs just an issue with something? Logs?

Yes, in the logs you should see a failed request to rapidpro. The error code should be 429 (and a message like "Request was throttled. Expected available in "something" seconds.") and you should see the error being logged once, and then 1 minute later the next "iteration" of polling should start.

I think the rate limit is working as well. I hit the limit, registered a new person, eventually when my limit was over I got a response about my patient being registered.

Thanks for the update @newtewt !

Merged to master.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

n-orlowski picture n-orlowski  路  5Comments

diannakane picture diannakane  路  6Comments

alxndrsn picture alxndrsn  路  6Comments

kennsippell picture kennsippell  路  3Comments

alxndrsn picture alxndrsn  路  4Comments