Previously I blogged about creating a Mixed Reality 2D app integrating with a Bot using LUIS via the Direct Line channel available in the Bot Framework.
I decided to add more interactivity to the app by also enabling text to speech for the messages received by the Bot: this required the addition of a new MediaElement for the Speech synthesiser to the main XAML page:
<Page x:Class="HoloLensBotDemo.MainPage" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" xmlns:d="http://schemas.microsoft.com/expression/blend/2008" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" mc:Ignorable="d"> <Grid Background="{ThemeResource ApplicationPageBackgroundThemeBrush}"> <Grid.ColumnDefinitions> <ColumnDefinition Width="10"/> <ColumnDefinition Width="Auto"/> <ColumnDefinition Width="10"/> <ColumnDefinition Width="*"/> <ColumnDefinition Width="10"/> </Grid.ColumnDefinitions> <Grid.RowDefinitions> <RowDefinition Height="50"/> <RowDefinition Height="50"/> <RowDefinition Height="50"/> <RowDefinition Height="Auto"/> </Grid.RowDefinitions> <TextBlock Text="Command received: " Grid.Column="1" VerticalAlignment="Center" /> <TextBox x:Name="TextCommand" Grid.Column="3" VerticalAlignment="Center"/> <Button Content="Start Recognition" Click="StartRecognitionButton_Click" Grid.Row="1" Grid.Column="1" VerticalAlignment="Center" /> <TextBlock Text="Status: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="2" /> <TextBlock x:Name="TextStatus" Grid.Column="3" VerticalAlignment="Center" Grid.Row="2"/> <TextBlock Text="Bot response: " Grid.Column="1" VerticalAlignment="Center" Grid.Row="3" /> <TextBlock x:Name="TextOutputBot" Foreground="Red" Grid.Column="3" VerticalAlignment="Center" Width="Auto" Height="Auto" Grid.Row="3" TextWrapping="Wrap" /> <MediaElement x:Name="media" /> </Grid> </Page>
Then I initialized a new SpeechSynthesizer at the creation of the page:
public sealed partial class MainPage: Page { private SpeechSynthesizer synthesizer; private SpeechRecognizer recognizer; public MainPage() { this.InitializeComponent(); InitializeSpeech(); } private async void InitializeSpeech() { synthesizer = new SpeechSynthesizer(); recognizer = new SpeechRecognizer(); media.MediaEnded += Media_MediaEnded; recognizer.StateChanged += Recognizer_StateChanged; // Compile the dictation grammar by default. await recognizer.CompileConstraintsAsync(); } private void Recognizer_StateChanged(SpeechRecognizer sender, SpeechRecognizerStateChangedEventArgs args) { if (args.State == SpeechRecognizerState.Idle) { SetTextStatus(string.Empty); } if (args.State == SpeechRecognizerState.Capturing) { SetTextStatus("Listening...."); } } …….
And added a new Speech() method using the media element:
private async void Speech(string text) { if (media.CurrentState == MediaElementState.Playing) { media.Stop(); } else { try { // Create a stream from the text. This will be played using a media element. SpeechSynthesisStream synthesisStream = await synthesizer.SynthesizeTextToStreamAsync(text); // Set the source and start playing the synthesized audio stream. media.AutoPlay = true; media.SetSource(synthesisStream, synthesisStream.ContentType); media.Play(); } catch (System.IO.FileNotFoundException) { var messageDialog = new Windows.UI.Popups.MessageDialog("Media player components unavailable"); await messageDialog.ShowAsync(); } catch (Exception) { media.AutoPlay = false; var messageDialog = new Windows.UI.Popups.MessageDialog("Unable to synthesize text"); await messageDialog.ShowAsync(); } } }
When a new response is received from the Bot, the new Speech() method is called:
var result = await directLine.Conversations.GetActivitiesAsync(convId); if (result.Activities.Count > 0) { var botResponse = result .Activities .LastOrDefault(a => a.From != null && a.From.Name != null && a.From.Name.Equals("Davide Personal Bot")); if (botResponse != null && !string.IsNullOrEmpty(botResponse.Text)) { var response = botResponse.Text; TextOutputBot.Text = "Bot response: " + response; TextStatus.Text = string.Empty; Speech(response); } }
And then the recognition for a new phrase is started again via the MediaEnded event to simulate a conversation between the user and the Bot:
private void Media_MediaEnded(object sender, Windows.UI.Xaml.RoutedEventArgs e) { StartRecognitionButton_Click(null, null); }
As usual, the source code is available for download on GitHub.