Support real-time access to audio stream
This feature provides a real-time access to audio stream to an application code, allowing the application to monitor raw audio frames coming from microphone and/or sent to speaker device, and to implement custom processing of these frames (e.g. to add real-time transcription).
OSX
The audio stream real time access is supported by utilizing new method to enable/disable audio monitoring of particular stream direction, and notification delegate that takes an audio frame data object holding specific information about that frame.
@protocol GSEndpoint <NSObject> /** Enable audio processing support for requested stream type @param streamType values: 0 - disable support; 1 - mic stream; 2 - speaker stream; 3 - both streams; @returns result of the operation */ - (GSStatus) enableAudioMonitor:(int) streamType; @protocol GSEndpointNotificationDelegate <NSObject> /** Called when an audio frame received. @see GSEndpointEvent */ - (void) audioFrameReceivedNotification:(GSAudioFrame*) audioFrame; /** Audio frame data structure supplied with the notification delegate. @field direction to indicate the collected media steam type: mic or speaker @field samples to hold an array of collected media samples in the received frame @field length to hold the count of the stored in array samples @field samplingFrequency to hold a frequency @field isStereo to indicate whether the the received frame content has stereo or mono data @see GSEndpointEvent */ @interface GSAudioFrame : NSObject { @private int direction; NSArray *samples; int length; int samplingFrequency; bool isStereo; }
.NET
The audio stream real time access support is added to the IExtendedService interface:
// Audio related //0 - processing disabled; 1 - mic stream; 2 - speaker stream; 3 - both streams; GsStatus EnableAudioMonitor(int streamType); event EventHandler<EndpointEventArgs^>^ AudioFrameDelivered;
Audio frame data is incorporated into endpoint event property dictionary,
EndpointEventArgs^ event; IDictionary<String^, Object^>^ property = event->Context->Properties;
where the property will hold audio frame data as key value pairs:
("direction", direction); /* int */ ("samples", samples[length]); /* int16_t samples[] */ ("length", length); /* int */ ("samplingFrequency", samplingFrequency); /* int */ ("isStereo", isStereo); /* bool */
Detailed Description
When Audio monitoring is enabled for particular direction, it is applicable for current and all future session, until explicitly turned off. Parameter streamType is basically a bit mask specifying monitoring state of capture (least significant bit) and playback devices (second bit), with bit=1 enabling monitoring and bit=0 disabling it.
One single notification callback is used for both audio streams, with
- direction property indicating stream type - 1 for capture, 2 for playback stream
- samples array holding the audio data, total of length 16-bit signed integers representing PCM samples; when isStereo is true, data is in interleaved stereo format: L0,R0,L1,R1,...
- samplingFrequency indicating number of samples per second, based on
For narrow-band codecs such as G.711 or G.729, sampling frequency is always 8000 and stereo = false. For wide-band codecs, sampling rate depends upon device capabilities and codec used. Namely:
- sampling rate of captured stream is the maximum rate both codec and device supports, stereo is used only for Opus codec, is microphone device supports it
- sampling rate and stereo status for playback stream always follows the codec parameters (the rate will be as high as 48000 for Opus codec, with stereo = true)