I recently wrote an application to make my phone report current Los Angeles traffic conditions. I used the Windows.Media.SpeechSynthesis
namespace to read in either a plain-text string, or an SSML-formatted string, and to speak it until the end. It turns out that speech synthesis with .NET libraries is incredibly simple, but some information takes a little effort to find. For me, searching for "text-to-speech windows phone 8.1" shows me a bunch of results for how to accomplish this with Silverlight! Definitely not what I want.
The setup
Because this was for a Windows Phone application, I needed a way to access a MediaElement
object to play, pause, and stop the speaking if I pressed a button to do this. I opted to create a private object on my MainPage
instance.
namespace PhoneTraffic
{
public sealed partial class MainPage : Page
{
private MediaElement media;
public MainPage()
{
this.InitializeComponent();
this.NavigationCacheMode = NavigationCacheMode.Required;
media = new MediaElement();
}
}
Synthesizing some sentences
From here, we instantiate a SpeechSynthesizer
object.
using(var synth = new SpeechSynthesizer())
Then we need to pass a plain-text string into the synthesizer, and store the outputted stream so we can set it on the media
object.
var stream = await synth.SynthesizeTextToStreamAsync("Hello, World!");
Then set the stream source on our media
object.
media.SetSource(stream, stream.ContentType);
Now we're ready to tell the phone to play, pause, or stop the speaking of our sentence. Here's how it looks in my application.
Using plain-text
private async void GetTrafficButton_Click(object sender, RoutedEventArgs e)
{
var incidents = await Task.Run(() => JsonConvert.DeserializeObject<TrafficIncident[]>(trafficJson));
if (incidents.Length == 0)
{
using(var synth = new SpeechSynthesizer())
{
var stream = await synth.SynthesizeTextToStreamAsync("There are no incidents right now.");
media.SetSource(stream, stream.ContentType);
media.Play();
}
}
else
{
using(var synth = new SpeechSynthesizer())
{
var toSay = String.Empty;
for(var i = 0; i < incidents.Length; i++)
{
var incident = incidents[i];
toSay += " At " + incident.Time + " there was a " + incident.Incident + " incident at " + incident.Location;
toSay += (i < incidents.Length - 1) ? " and another " : ".";
}
var stream = await synth.SynthesizeTextToStreamAsync(toSay);
media.SetSource(stream, stream.ContentType);
media.Play();
}
}
}
My pause and stop methods are simpler:
private void PausedSpeechButton_Click(object sender, RoutedEventArgs e)
{
media.Pause();
}
private void StopSpeechButton_Click(object sender, RoutedEventArgs e)
{
media.Stop();
}
Using SSML
If you want to use SSML, use the method SynthesizeSsmlToStreamAsync
instead of SynthesizeTextToStreamAsync
and pass an SSML-formatted string to it.
My application supports both modes. I create the SSML string on my API server, and the phone consumes it. Here's what the code looks like (replaces the "else" block in the plain-text example).
using(var synth = new SpeechSynthesizer())
{
var stream = await synth.SynthesizeSsmlToStreamAsync(ssml);
media.SetSource(stream, stream.ContentType);
media.Play();
}
Here are some resources I used.
- Wikipedia Speech Synthesis Markup Language
- W3 Speech Synthesis Markup Language (SSML) Version 1.0
- Microsoft MSDN Windows.Media.SpeechSynthesis namespace
- Microsoft MSDN SSML say-as Examples
- Jayway Windows Phone 8.1 for Developers - Text to speech