The Speech To Text Transform accepts audio as input from a microphone or wave audio file, compares it with the specified grammar file and produces text output. It uses the Windows speech recognition engine to process the audio input.




Audio Source is an enumerated property which has two options (1. Default Audio Device, 2. Wave Audio File). When the “Default Audio Device” option is selected, the Transform takes audio input from the default audio input device connected to the PC (such as a microphone). Select the “Wave Audio File” option to take audio input from a wave audio file, which is specified using the “Wave Audio File Path” property.

Grammar File Path specifies the path to a CSV file which defines the grammar for the speech recognition engine (i.e. word or phrase to recognize and the text to display, corresponding to recognized speech). To select a file click on the property, a browse button will appear (3dots_button) browse to select the file.  The CSV file should be in the format as shown below.

; Text To Recognize, Text To Display

Hello World, Hello All

One, you said "one"

Two, you said "two"

Three, you said "three"

Four, you said "four"

Five, you said "five"

If the Text To Display field is left blank, then the Text To Recognize is displayed after the phrase has been recognized.

Language Culture:  Select a language culture from the drop-down list. All installed language cultures are listed. If left blank, the Transform loads the default culture.

Wave Audio File Path: If "Wave Audio File" option is selected in the Audio Source property, select a valid wave audio file as audio source. To select a file, click on the property to display a browse button (3dots_button), then browse to select the file.


Inports: There are no inports for this transform.

Outports: There are three outports available for this transform:

STT Text Out is a message port. After the audio is recognized, the corresponding text specified in the grammar file is transmitted from this port.

STT Audio Level is a discrete-numeric port. It shows the audio level on a scale of 100%. It should not be used in pass-fail analysis. It is only for visualization to show whether the microphone is working and the Transform is receiving the audio input.

STT Enumerated Out is a discrete-numeric port. It transmits the record number from the grammar file that corresponds to the recognized phrase.