Line 2: | Line 2: | ||
{{NoteFormat|This feature was introduced as part of the [[Documentation:RN:sep-sdk-osx90rn:sep-sdk-osx9000804|9.0.008.04]] release.}} | {{NoteFormat|This feature was introduced as part of the [[Documentation:RN:sep-sdk-osx90rn:sep-sdk-osx9000804|9.0.008.04]] release.}} | ||
− | This feature | + | This feature provides a real-time access to audio stream to an application code, allowing the application to monitor raw audio frames coming from microphone and/or sent to speaker device, and to implement custom processing of these frames (e.g. to add real-time transcription). |
− | { | + | == OSX == |
+ | The audio stream real time access is supported by utilizing new method to enable/disable audio monitoring of particular stream direction, and notification delegate that takes an audio frame data object holding specific information about that frame. | ||
+ | |||
+ | <pre> | ||
+ | @protocol GSEndpoint <NSObject> | ||
+ | /** | ||
+ | Enable audio processing support for requested stream type | ||
+ | @param streamType values: 0 - disable support; 1 - mic stream; 2 - speaker stream; 3 - both streams; | ||
+ | @returns result of the operation | ||
+ | */ | ||
+ | - (GSStatus) enableAudioMonitor:(int) streamType; | ||
+ | @protocol GSEndpointNotificationDelegate <NSObject> | ||
+ | /** | ||
+ | Called when an audio frame received. | ||
+ | @see GSEndpointEvent | ||
+ | */ | ||
+ | - (void) audioFrameReceivedNotification:(GSAudioFrame*) audioFrame; | ||
+ | /** | ||
+ | Audio frame data structure supplied with the notification delegate. | ||
+ | @field direction to indicate the collected media steam type: mic or speaker | ||
+ | @field samples to hold an array of collected media samples in the received frame | ||
+ | @field length to hold the count of the stored in array samples | ||
+ | @field samplingFrequency to hold a frequency | ||
+ | @field isStereo to indicate whether the the received frame content has stereo or mono data | ||
+ | @see GSEndpointEvent | ||
+ | */ | ||
+ | @interface GSAudioFrame : NSObject { | ||
+ | @private | ||
+ | int direction; | ||
+ | NSArray *samples; | ||
+ | int length; | ||
+ | int samplingFrequency; | ||
+ | bool isStereo; | ||
+ | } | ||
+ | </pre> | ||
+ | |||
+ | == .NET == | ||
+ | The audio stream real time access support is added to the IExtendedService interface: | ||
+ | <pre> | ||
+ | // Audio related | ||
+ | //0 - processing disabled; 1 - mic stream; 2 - speaker stream; 3 - both streams; | ||
+ | GsStatus EnableAudioMonitor(int streamType); | ||
+ | event EventHandler<EndpointEventArgs^>^ AudioFrameDelivered; | ||
+ | </pre> | ||
+ | Audio frame data is incorporated into endpoint event property dictionary, | ||
+ | <pre> | ||
+ | EndpointEventArgs^ event; | ||
+ | IDictionary<String^, Object^>^ property = event->Context->Properties; | ||
+ | </pre> | ||
+ | where the property will hold audio frame data as key value pairs: | ||
+ | <pre> | ||
+ | ("direction", direction); /* int */ | ||
+ | ("samples", samples[length]); /* int16_t samples[] */ | ||
+ | ("length", length); /* int */ | ||
+ | ("samplingFrequency", samplingFrequency); /* int */ | ||
+ | ("isStereo", isStereo); /* bool */ | ||
+ | </pre> | ||
+ | |||
+ | ==Detailed Description== | ||
+ | |||
+ | When Audio monitoring is enabled for particular direction, it is applicable for current and all future session, until explicitly turned off. Parameter streamType is basically a bit mask specifying monitoring state of capture (least significant bit) and playback devices (second bit), with bit=1 enabling monitoring and bit=0 disabling it. | ||
+ | |||
+ | One single notification callback is used for both audio streams, with | ||
+ | * direction property indicating stream type - 1 for capture, 2 for playback stream | ||
+ | * samples array holding the audio data, total of length 16-bit signed integers representing PCM samples; when isStereo is true, data is in interleaved stereo format: L0,R0,L1,R1,... | ||
+ | * samplingFrequency indicating number of samples per second, based on | ||
+ | |||
+ | For narrow-band codecs such as G.711 or G.729, sampling frequency is always 8000 and stereo = false. For wide-band codecs, sampling rate depends upon device capabilities and codec used. Namely: | ||
+ | |||
+ | * sampling rate of captured stream is the maximum rate both codec and device supports, stereo is used only for Opus codec, is microphone device supports it | ||
+ | * sampling rate and stereo status for playback stream always follows the codec parameters (the rate will be as high as 48000 for Opus codec, with stereo = true) | ||
+ | |||
+ | {{NoteFormat| To provide most flexibility and avoid potential loss of data, SDK never tries to convert audio data it gets from voice engine from one format to another. Application code should implement resampling itself, if needed.}} | ||
[[Category:V:SESDK:9.0.0OSXDRAFT]] | [[Category:V:SESDK:9.0.0OSXDRAFT]] |
Revision as of 07:42, December 3, 2018
Support real-time access to audio stream
This feature provides a real-time access to audio stream to an application code, allowing the application to monitor raw audio frames coming from microphone and/or sent to speaker device, and to implement custom processing of these frames (e.g. to add real-time transcription).
OSX
The audio stream real time access is supported by utilizing new method to enable/disable audio monitoring of particular stream direction, and notification delegate that takes an audio frame data object holding specific information about that frame.
@protocol GSEndpoint <NSObject> /** Enable audio processing support for requested stream type @param streamType values: 0 - disable support; 1 - mic stream; 2 - speaker stream; 3 - both streams; @returns result of the operation */ - (GSStatus) enableAudioMonitor:(int) streamType; @protocol GSEndpointNotificationDelegate <NSObject> /** Called when an audio frame received. @see GSEndpointEvent */ - (void) audioFrameReceivedNotification:(GSAudioFrame*) audioFrame; /** Audio frame data structure supplied with the notification delegate. @field direction to indicate the collected media steam type: mic or speaker @field samples to hold an array of collected media samples in the received frame @field length to hold the count of the stored in array samples @field samplingFrequency to hold a frequency @field isStereo to indicate whether the the received frame content has stereo or mono data @see GSEndpointEvent */ @interface GSAudioFrame : NSObject { @private int direction; NSArray *samples; int length; int samplingFrequency; bool isStereo; }
.NET
The audio stream real time access support is added to the IExtendedService interface:
// Audio related //0 - processing disabled; 1 - mic stream; 2 - speaker stream; 3 - both streams; GsStatus EnableAudioMonitor(int streamType); event EventHandler<EndpointEventArgs^>^ AudioFrameDelivered;
Audio frame data is incorporated into endpoint event property dictionary,
EndpointEventArgs^ event; IDictionary<String^, Object^>^ property = event->Context->Properties;
where the property will hold audio frame data as key value pairs:
("direction", direction); /* int */ ("samples", samples[length]); /* int16_t samples[] */ ("length", length); /* int */ ("samplingFrequency", samplingFrequency); /* int */ ("isStereo", isStereo); /* bool */
Detailed Description
When Audio monitoring is enabled for particular direction, it is applicable for current and all future session, until explicitly turned off. Parameter streamType is basically a bit mask specifying monitoring state of capture (least significant bit) and playback devices (second bit), with bit=1 enabling monitoring and bit=0 disabling it.
One single notification callback is used for both audio streams, with
- direction property indicating stream type - 1 for capture, 2 for playback stream
- samples array holding the audio data, total of length 16-bit signed integers representing PCM samples; when isStereo is true, data is in interleaved stereo format: L0,R0,L1,R1,...
- samplingFrequency indicating number of samples per second, based on
For narrow-band codecs such as G.711 or G.729, sampling frequency is always 8000 and stereo = false. For wide-band codecs, sampling rate depends upon device capabilities and codec used. Namely:
- sampling rate of captured stream is the maximum rate both codec and device supports, stereo is used only for Opus codec, is microphone device supports it
- sampling rate and stereo status for playback stream always follows the codec parameters (the rate will be as high as 48000 for Opus codec, with stereo = true)