element-web 🚀 - Button to record audio snippets and send them as audio events (voice messages)

I'm interested in exactly this, for a little side project for my daughter... in this case I'm looking to run as a Matrix client on a headless device (a CHIP, RPI, etc) to act as a kids "walkie talkie". We have two CHIPs arriving in the next week!

I may be able to help on this if I can carve out some time...

freelock on 29 Jun 2016

@aviraldg did anything ever land for this?

ara4n on 18 Aug 2016

I fear this got stuck in https://github.com/matrix-org/matrix-doc/issues/310

richvdh on 28 Sep 2016

close this?

edit: Oh, @aviraldg hasn't merged it into master I think?

spacekitteh on 10 Apr 2017

@spacekitteh There was no consensus on the formats to be supported, so it's still unmerged. I currently do not have the time to create a proposal for it, but I believe the core team would welcome one if it was created and then it could be merged.

aviraldg on 10 Apr 2017

Just adding a link here to the current PR... https://github.com/matrix-org/matrix-react-sdk/pull/690

spacekitteh on 10 Apr 2017

If this is akin to push to talk in VoIP apps like Mumble, Teamspeak, etc, then I'd be very interested in this getting into riot! :O can use right away!

BloodyIron on 28 Nov 2017

Can we get this bumped to a milestone or higher priority? Seriously, push to talk is actually a big deal for our implementation! I know a lot of other people will just not use Riot because it doesn't have push to talk. The value of this needs to be revisited! :(

BloodyIron on 3 Jan 2018

@BloodyIron Mumble and TeamSpeak are real-time voice chat systems. Matrix is message/event-oriented.

It would be interesting if there were an addon to associate Matrix rooms with Mumble servers and channels.

alphapapa on 16 Jan 2018

👎1

@BloodyIron this is more akin to old Facebook and WhatsApp, record audio at the push of a button then send on release

t3chguy on 16 Jan 2018

@alphapapa so what? Riot (which is what this is categorised under) has voice and video conferencing. Furthermore Matrix does interface with other voice communication tech such as Freeswitch. So push to talk for Riot makes a LOT of sense, especially in a busy channel.

If there is no push to talk, a lot of people just simply won't use Riot. I'm not just talking about myself, I run a large gaming community, and almost all of them require push to talk for whatever voice tech they use. Which, Riot advertises itself as (voice/video tech).

BloodyIron on 16 Jan 2018

@t3chguy that's going to cause a lot of latency problems, especially if you need to get info to someone in a sensitive time-frame, or if multiple people are talking. Nobody is going to want push to talk if it means they can only talk for a short period before they're heard at all. At scale it's going to lead to one big echo chamber, and actually make things worse.

Have you ever had to listen to your own voice when it was time delayed? It's extremely disorienting, and I seriously see the not sending till release causing new problems.

BloodyIron on 16 Jan 2018

@BloodyIron its the concept of sending audio messages, like voicemails, its not Push-to-Talk for WebRTC calls.

Riot advertises itself as open source Team Collaboration, not voice/video tech.

t3chguy on 16 Jan 2018

@t3chguy : http://i.imgur.com/hnxQsxc.png "VOIP & VIDEO CALLING"

Also, having the audio not send till release the button will lead to abuse when trolls find it, as they will just send large audio bombs. Seriously, I see no good reason why the audio should not send right when the button is pressed. But I see a LOT more problems being caused if that's the case (audio not sending till release).

BloodyIron on 16 Jan 2018

Thats a feature within

It'd be no different than sending an audio file from your computer, so

Also, having the audio not send till release the button will lead to abuse when trolls find it, as they will just send large audio bombs.

is moot

This is simply not the issue you should be arguing in. This issue is for not a feature you care for.

Read the OP:

Button, which when pushed records audio and sends it as m.audio that automatically plays.

m.audio is an event type, events are sent after uploading the media they refer to, so they can't be "live"

t3chguy on 16 Jan 2018

👍2

@BloodyIron I feel like you're lacking some context here. What you're asking for is real-time, multi-party, channel-oriented voice chat, i.e. Mumble, TeamSpeak, Discord.

Discord is interesting to compare Matrix with since it's also got chat rooms like Matrix. But Discord is a large system with lots of funding that runs on AWS. Matrix/Riot is a relatively small, barely funded organization and service. The matrix.org homeserver already gets overloaded and slow sometimes. Providing real-time, channel-based voice chat as you desire would require much more infrastructure.

Ideally, sure, Matrix would provide everything. And maybe someday it will. But in the meantime, something like Mumble already provides real-time, channel-based voice chat in a distributed way, with lots of servers available. If there were a standard way to interface Matrix rooms with Mumble servers and channels, it would require no additional infrastructure on the Matrix side, as well as avoiding reimplementing all the functionality that Mumble provides. And interfacing a Matrix client with the Mumble client could make it work seamlessly and transparently to the user.

Do you understand what I mean?

alphapapa on 16 Jan 2018

@t3chguy is right, I mis-read the scope of this particular issue. Sorry about that! I'll open another one more appropriate for what I am seeking. My bad, sorry for stealing your guys' time on my sillyness. :(

@alphapapa just to respond to what you said, before I stop being a silly goose in this thread, I'm not talking about when something like what I was talking about would be implemented, more how. But that's not actually relevant to this discussion, so I'm going to exit stage left.

Again, sorry for the misunderstanding!

BloodyIron on 16 Jan 2018

👍1

I do completely agree a PTT for both native calling and Jitsi conferencing would be useful fwiw

t3chguy on 16 Jan 2018

👍4

For simplicity, here are the two general types of PTT that people use today:

Nextel PTT (Always online, push notifications with audio, Example: Zello)
Remember this? I was too young to have a phone, but my parents each had one and the PTT was far above everything else at the time. There were no smartphones at the time. It was faster than calling, and usually the message got through. These days texting won out, partially because if you put your phone down and came back to it you could just read it. PTT you might miss. The other reason texting won out is because you don't necessarily want the person you are talking too IRL to hear what the person on the phone said. There are pros and cons to this method. There is a decent following on Zello which does this. In fact I know someone who uses Zello with all his camping buddies.

Mumble PTT (Must be running, no push notifications for audio, login/join/accept call to use and hear PTT)
This one is more what people will do now. Especially gamers. Even my friend on Zello would probably use this instead. The way he uses Zello is to schedule an hour with everybody, and everybody goes online at the same time and talks. There is no reason to use the always on Zello if you are just going to chatroom with voice. Basically you just mute the microphone, and have a keyboard button to hold down to unmute. (Headphone button on phone) If someone is trying to be quiet IRL and can't get around it, he can plug his headphones in to hear, and type in the room instead of talking. Optionally a text-to-speech thing would be cool, but that is a separate subject.

Those are the two types. I believe some of the confusion when I read this thread is people are talking about two different PTTs. Almost everyone is trying to go the Nextel PTT route, and I will admit it's pretty cool and I want it. But the Mumble type PTT is what people actually seem to need. You set up a room for the game, and everyone logs on and plays the game. (Not necessarily games but games is the common example) Text is king unless you are doing something with your hands. So I recommend we add a temporary unmute button to the native calling and Jitsi conferencing like the previous comment states until someone implements the Nextel PTT version. A keyboard button for desktop and headphone button for phone would be excellent shortcuts for this button.

Edit: Apparently an issue was just added an hour ago for the Mumble style PTT, #5993 My apologies for spamming a finished thread.

josephtocci on 16 Jan 2018

@josephtocci I got muddled about the scope of this particular "issue". I didn't see that it was for a pre-recorded message that automatically plays. So, I think the PTT stuff that I was talking about really isn't on-point here.

BloodyIron on 16 Jan 2018

I got muddled too, that is why I wrote that huge thing. Now that I read the issue again, I don't think I helped. lol

He does say Push-to-talk in the title which is misleading. Oh well. What he actually wants is pretty cool too actually. When you send an audio file, everyone in the room should be able to just click on it to hear it. Makes perfect sense.

Regardless I look forward to #5993 being done so I can use it and recommend more people to Riot.

josephtocci on 16 Jan 2018

😄1 👍1

@josephtocci perhaps lend your thoughts in #5993 too then? ;)

Yeah the feature in this issue #1358 is neat!

BloodyIron on 16 Jan 2018

👍2

@aviraldg
Please change the title to this issue to Click-to-talk as that will probably remove some confusion next time someone looks at it. lol

Edit: or click-to-play-audio or something.

josephtocci on 16 Jan 2018

@josephtocci - There's another type of "push to talk" that works like Nextel PTT with message storage. I thought that was the direction this was going. Here's an example: https://youtu.be/oyHv62md24c

That would definitely take more infrastructure, though.

dpflug on 18 Jan 2018

Seems very similar to Zello, except it has better text and photo messages.

Adding a feature such as this one and #5993 would bring more people over, the more users that join make Riot exponentially more useful. Infrastructure is a solvable problem. Getting new users is a much harder problem than infrastructure.

Option 1, improve infrastructure
As far as infrastructure goes, I saw the default matrix.org website basically as a demo of the features. Why other people don't see it that way I don't understand. One option I think you guys haven't tried is to setup AWS and have people pay you for their own homeserver that they don't have to think about. Similar to Minecraft Worlds or whatever it's called. You guys have to make money somehow. Just don't make a "Pro" client like some companies, that would alienate most developers. (Looking at you Slack)

Option 2, add feature audio/video PTT to reduce load
As awesome as the Nextel PTT thing is, I don't think it's really necessary anyway if we add push to talk to conference calls. If anything that would reduce need for infrastructure. Instead of all the microphones and cameras being on in a audio conference call, all the cameras would be turned off and microphones would only be on when someone is talking. You could even make the same button to unmute the microphone to turn on the camera. In this way, even if you added users, you would make it so that all existing and new users could reduce their load under the guise of reducing distraction during Video and Audio calls. I don't necessarily want my camera on all the time, but when I'm talking I want it on. So have the camera on but don't send anything to matrix unless I hold down the PTT button to unmute the microphone. You could make this the default for conference calls and not for 1-on-1. Does that solve the infrastructure problem? Maybe sometime in the future add the Nextel PTT, it is awesome, but if you say it's too much for now then ok.

My recommendation is Option 2, it reduces load, and you can have an audio PTT only/no camera conference option for gamers, and a full conference call option for people that have 20 eyes to look at 20 different camera feeds for their friends. lol. Probably the best option considering infrastructure, and nothing has to change server side because load should go down even if you add a healthy amount of users.

josephtocci on 18 Jan 2018

👍1

Please can we keep this ticket related to audio snippets only? There's already another issue for PTT.

turt2live on 18 Jan 2018

I'm not personally interested in this, as I prefer text/IRC, but I know some people who use Whatsapp and do not want to use Matrix because I cannot tell them it supports this. So I discovered this issue.

To put this into perspective, I don't see why this requires any specification changes whatsoever. It is essentially asking for a client wrapper that easily enables recording and uploading an audio file, without having to resort to thirdparty applications that take a person out of the riot.im webapp interface.

Currently I can experimentally do everything the initial request asked for, if I use ffmpeg to record microphone events into a file and then separately upload them. Then, riot.im already offers to play that audio file.
This is an inconvenient workflow, and riot.im should be able to handle that all inside the browser.

vector-im/riot-android#1762 seems to be asking for the same thing.

eli-schwartz on 18 Jan 2018

👍3

@eli-schwartz

custom typing indicator for recording audio?

would be one of the sub-parts which would require spec

t3chguy on 18 Jan 2018

No it doesn't. If you want a custom typing indicator, you can add that separately.

It would just mean that Matrix doesn't natively know about the convenience function offered client-side for uploading audio files.
Creating real solutions to real problems (like Whatsapp users not having feature parity when it comes to conveniently uploading voice notes, e.g. messages while they are driving which do not suffer from the chronic failure of voice-to-text to actually produce predictably intelligible messages), using existing functionality, should be prioritized over adding new specifications to solve those same problems in ever-so-slightly-different ways.

I guess you could even abuse the typing indicator by not bothering to differentiate between typing and voice recording, but I confess my primary matrix client is weechat, which doesn't use them.

...

If waiting for a typing indicator spec delays this by another 1.5 years, that would probably be counterproductive.

eli-schwartz on 18 Jan 2018

Except typing indicators are shown to OTHER people, so there must be some standardised format/message sent over the wire to signify that you're recording a message so your peers can display such an indicator.

t3chguy on 18 Jan 2018

Show them the same typing indicator you show them to signify that you are attaching a file. :stuck_out_tongue:

That is, after all, the only thing that is happening here.

...

AFAIK no one sees a typing indicator when I am typing a message into weechat using the matrix.lua plugin. As for the riot.im webapp, there is a preference "Don't send typing notifications".
Clearly there is no hard requirement that such a typing indicator actually be sent (or received -- weechat doesn't show them even when riot-web sends them). The IRC bridge certainly doesn't invade freenode, install itself onto the computers of IRC users, and "update" peoples' IRC clients to send typing indicators to Freenode in order for the Matrix IRC bridge to utilize them.
Therefore, the lack of such a typing indicator should be completely orthogonal to the purpose of adding audio file composition into riot-web.

If a spec is separately developed for sending a typing indicator message over the wire, then audio file composition can be wired up to that spec.

eli-schwartz on 18 Jan 2018

Most people will want it similar to how other clients show whether someone is typing on mobile or somewhere else. It's not very difficult to add (none of this really is) - it just requires someone to do it, much like the other 2200 issues on this repo.

turt2live on 19 Jan 2018

👍1

I don't see why this requires any specification changes whatsoever.

IIRC, the main spec hangup was in defining what audio formats would be allowed, because different browsers record using different formats and support playing different formats. The last time it was discussed, I think the consensus was that allowing both opus in ogg (which I think Firefox records to) and opus in webm (which I think Chrome records to) was fine, but nobody pushed forward with this after that discussion.

edit: see https://matrix.to/#/!htOanVjArJyYUFjsSC:matrix.org/$1504011455166229biWUr:matrix.org (in #webrtc:matrix.org)

uhoreg on 19 Jan 2018

Button, which when pushed records audio and sends it as m.audio that automatically plays.

May be set "autoplay" options for incoming audio records (ptt message) as options and by default is off? Because if you in bus and read conference in matrix and at this time anybody send ptt-record to conference - all people in bus - will hear this message.

Otherwise, if you go in car/bike and your friends also go in his cars/bikes and you create conference for hear your frends ptt-mesage - at this situation riot must play ptt-message always, even if app have not focus and display is off.

Result: for confidential ptt autoplay must be off by default in room, but may be set at six variants:

No autoplay ptt-message (default)
Autoplay ptt message when riot app is foreground (app have focus)
Always autoplay ptt message (and when display is off) - by 1 hour
Always autoplay ptt message (and when display is off) - by 2 hour
Always autoplay ptt message (and when display is off) - by 4 hour
Always autoplay ptt message (and when display is off) - forever (use with caution)

P.S.: ptt (push-to-talk or Walkie-Talkie) functional - is simply "autoplay" incoming audio message.

progserega on 28 Aug 2018

👍3

WhatsApp distinguishes between audio files and voice messages.
Here is an argument from a users' perspective for using a new message type (or some optional flag) instead of just sending a normal audio file:
They are functionally different:

If I see a voice message, I know it was just now recorded by the person I'm talking to. I know it's current information and a primary way to move the conversation forward.
If I see an audio file, I know it likely was recorded by a third party and will not contain time-critical direct communication. I can treat it like any other attatched file and open it when my friend tells me why I should care about it.
If I forward a voice message in WhatsApp, it is sent as a regular audio file.

I realize that this guarantee of "I just recorded this" is difficult to give in the Matrix open system. In my opinion, we shouldn't implement an actual guarantee, but just allow a way to communicate whether this was a voice message or a third-party audio file. If somebody would maliciously send the wrong flag, it wouldn't cause any security problems.

ludwigbald on 5 Feb 2019

👍8

In the African context I referenced in that issue, the connections are so bad that a PTT-like use would often be impossible. So having the voice messages appear and requiring tapping to hear them would be fine, and often preferable.

Biep on 21 May 2019

👍1

I am surprised Telegram hasn't been mentioned here. It's another IM app that implements audio (and video) snippets making it difficult to advocate for Riot use when this is not implemented.

MagicFab on 17 Dec 2019

👍5

That would be a very useful feature!
I would also like to have some kind of walkie talkie (realtime) functionality like announced within MS Teams: https://www.theverge.com/2020/1/9/21058313/microsoft-teams-walkie-talkie-push-to-talk-feature-preview (Jitsi push to talk?!)

P.S.: ptt (push-to-talk or Walkie-Talkie) functional - is simply "autoplay" incoming audio message.

The Walke Talkie Feature is "Realtime" and not just Autoplay. There are pros and cons for both.
But +1 for Autoplay. It would be the best option for using this kind of communication on Matrix Protocol itself.

Realtime Communication should be handled by Jitsi or an better Framework (Mumble e.g.)
PTT for Gamers should also include a Voice Activation Function (Client Side)

fti7 on 14 Jan 2020

👍4 🚀1

A step that could help would be add support for using the inline player (msgtype "m.audio") for .ogg, .opus and other relevant formats. It works with .m4a for now (don't know about mp3)

tuxayo on 20 Mar 2020

Here is the issue for more formats support: https://github.com/vector-im/riot-web/issues/7370

tuxayo on 20 Mar 2020

Is this feature planned to be developed? If yes, do we have an ETA ?
I've been trying to get friends to switch to Matrix (from Telegram and Whatsapp), but there are 2 features that they request and prevent them from making the switch :

voice messages : you press the button, you talk, you release - or press it again - and it gets sent. Using another app to record and switching back and worth is a big no-no for most users.
ephemeral/self-destructing messages : messages that get deleted after a configured amount of time - ideally possible to change in the UI. This feature has been partially implemented (beta) in Synapse already.

It's almost mandatory features for IM apps nowadays, since all concurrent apps have it.

Hexalyse on 19 May 2020

👍17 🚀5 🎉4

Hope this isn't considered as noise, but in case some weren't tracking the news on this, I just wanted to share how the popular WeChat service of China is now to be blocked to millions in the U.S. (as other platforms are already blocked in China) and offer the notion of the potential opportunity for the Matrix platform in appealing to such a large number of users if voice messages were to be implemented.

As a regular user of WeChat (based now in China) to communicate to my relatives back in the U.S., I can personally live without Matrix's current lack of payments compared to WeChat (the only other feature I would miss), but the inability of the client to leave brief voice messages is pretty fundamental.

If that could be implemented, I think it would be a much stronger "sell" to WeChat users, no less given that from my experience, Element video/audio quality has been actually much better and more stable than even WeChat for international communications. (And if payments were integrated, Matrix might become truly a force to be reckoned with, no less given how much Chinese are accustomed to being able to use it for payments everywhere from the subway to restaurants, offices, etc.),

The domain matrix.org is blocked in China as well as the Element app in app stores (but not element.io) but with this of course being a federated, HTTPS-based protocol, unless China were to block all foreign sites, I don't see that China would seek to cut themselves off from the world if the Matrix protocol expands, just as they have not sought to block email as a whole.

While I understand Matrix is being driven by the company behind Element, with a need to spend limited resources in adequate measure on its own interests and sustainable business model, and thus might be concerned about a China focus given that its matrix.org accounts are unavailable here (except to those on VPN), I would think that a greater attention to the ready distributability of the open source Synapse server implementation (e.g., through Ubuntu package managers) might also help gain adoption for the protocol, and once the Matrix protocol became more of a proven federation of servers, China would come to see Matrix.org as just another service in a truly international system, like Yahoo Mail among other internationally distributed email servers, rather than solely as a gateway to the West. They could, as with other countries, still block domains, but not be as likely to block average users, and give up on blocking some whole domains as well. (And such a genuine federation may promote trust in the decentralized, truly open nature of the system elsewhere as well.)

Another complementary approach might perhaps be applying to have a domain and hosting of matrix.org in China. While being subject to its laws (to the extent E2EE would even raise problems), the federation as a whole would not be restricted. This could open you to many Chinese users (and them in turn to much of the world as they can currently through email).

Anyways, I apologize for the seeming tangents, but it does strike me that they could be potentially relevant as far as prioritization of this issue especially. Thanks!

brettz9 on 19 Sep 2020

👍5 ❤2

Any update to this?

EchedeyLR on 3 Oct 2020

Any update to this?

Pablini on 18 Oct 2020

Any update to this?
Dont use Github that much, is there a way I can offer a bounty for some one to implement with the proper free software license and add it to the Android app and Web?

Pablini on 18 Oct 2020

@Pablini there are some websites you can use to add bounties to GitHub issues like https://www.bountysource.com

aaronraimist on 18 Oct 2020

The only major missing feature as an user point of view, hope it is considered.

Atalonica on 6 Nov 2020

👍10

Cross referencing:
this feature is also wished for in android: https://github.com/vector-im/element-android/issues/29
and it seems both are waiting for an update of the Matrix protocol https://github.com/matrix-org/matrix-doc/pull/2516

gabmert on 24 Nov 2020

👍1

The spec change would be nice, but it is in no way a blocker to the
functionality!
All it takes is someone to go and implement it.

gabmert notifications@github.com schrieb am Di., 24. Nov. 2020, 15:12:

Cross referencing:
this feature is also wished for in android: vector-im/element-android#29
https://github.com/vector-im/element-android/issues/29
and it seems both are waiting for an update of the Matrix protocol
matrix-org/matrix-doc#2516
https://github.com/matrix-org/matrix-doc/pull/2516

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/vector-im/element-web/issues/1358#issuecomment-732998339,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AIGAOOYFTMXZX2VJQ3T7H3LSRO5LTANCNFSM4CAO6MGA
.

ludwigbald on 24 Nov 2020

👍2

Indeed. You can already send a voice memo if you record it using a program like Audacity and upload the audio file. My understanding was that this feature request is just for a built-in way to record those audio files.

JimmyCushnie on 24 Nov 2020

👀1

I think that for Element to really be viable and formidable for mass adoption all of these features are important. This is a big one. I use voice memos all-the-time on WhatsApp and Signal. To be able to just tap-and-hold, record, and release is so easy. We need to have this in Element on desktop and mobile.

morrisonbrett on 25 Nov 2020

👍9

Element-web: Button to record audio snippets and send them as audio events (voice messages)

Most helpful comment

All 51 comments

Related issues