Convert audio file to text

The “Speech to Text” action retrieves a transcript from an audio file by recognising the words used in the source based on a user-chosen language.

This action could provide a fast and efficient way for users to automate the process of transcribing or subtitling the content they upload to sites like YouTube.

Parameters

TitleNameTypeDescription

Language

language

string

Language of speech in file

File

file

file

Source file

File (file name)

filename

string

Name of file

Response

Status

TitleNameTypeDescription

Success

Result

result

file

Transcript of audio file

Failure

Result

result

string

Error description

Integrate the "Speech to Text" function into a workflow or app with the platform of your choice:

How to convert audio files to text with Microsoft Power Automate

Instructions

  1. In the Flow designer, click the “+” icon to insert a new action.
  2. Select the “Text – Speech to Text” action under PowerTools in the “Choose an operation” dialog.
  3. Insert the necessary values or variables in each input field.
  4. Execute the flow.

NOTE: The Power Platform connector framework imposes a maximum time limit restriction of 2 minutes on all action responses. This severely limits the amount of data that can be processed by the speech to text engine within that timeframe. Our testing has shown that the largest input file that can be processed in this time limit is around 1MB compressed (MP3) or 10MB uncompressed (WAV). Until Microsoft increases the response time restriction, it is advisable to break up larger files into smaller chunks under these limits and split them across multiple actions. 

Example

Video

How to convert audio files to text with Microsoft Power Apps

Instructions

  1. Add the PowerTools connector from the Data menu.
  2. In the formula for the control, variable or element, type “ApptigentPowerTools.SpeechToText().result”. Within the parentheses, enter the field, control or variable that contains the source collection.
  3. Preview or run the app.

NOTE: The Power Platform connector framework imposes a maximum time limit restriction of 2 minutes on all action responses. This severely limits the amount of data that can be processed by the speech to text engine within that timeframe. Our testing has shown that the largest input file that can be processed in this time limit is around 1MB compressed (MP3) or 10MB uncompressed (WAV). Until Microsoft increases the response time restriction, it is advisable to break up larger files into smaller chunks under these limits and split them across multiple actions. 

Example

Video

How to convert audio files to text with Nintex

Instructions

  1. Locate the “Apptigent PowerTools” group in the actions navigator then drag and drop the “Text – Speech to Text” action onto the design surface.
  2. Insert the necessary values or variables in each input field.
  3. Assign the result to a variable.
  4. Test the workflow.

Example

Video

How to convert audio files to text with another Platform or Custom Code

Instructions

If your platform is not listed and it supports Open API (Swagger) extensions, import the API Definition document from the Developer Edition product on our Customer Portal at https://portal.apptigent.com/product (look for the Open API link at the top of the PowerTools Developer API definition page). Invoke the desired actions in your app or workflow design tool, supplying values for the listed parameters. Refer to the developer documentation on the Customer Portal for details on input and output formats.

If you are developing a custom app, execute a RESTful POST operation to the /CountCollection endpoint in your application code or use the pre-generated client scaffolding from our Github repo at https://github.com/apptigent/powertools. Be sure to include your API Key (Client ID) in the header using the “X-IBM-Client-Id” key/value pair. The body should be a well-formed JSON object with the parameter label(s) and value(s) in the specified format. Refer to the API documentation at https://portal.apptigent.com for more information.

Example

const request = require('request');

const options = {
  method: 'POST',
  url: 'https://connect.apptigent.com/api/utilities/SpeechToText',
  headers: {
    'X-IBM-Client-Id': 'REPLACE_THIS_KEY',
    'content-type': 'multipart/form-data; boundary=---011000010111000001101001',
    accept: 'application/json'
  },
  formData: {
    language: 'Finnish (Finland)',
    file: 'data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQCAYAAAAf8/9hAAACcElEQVQ4T43Sb0hTYRQG8OdlGUgxEG7dsslAkGZjoAjLgSiK5lCXY7AY72vDmhQjs5Lg2qIyJTEWmksLlqMxjC0lsWKSJBkjk8KhzExLkMQxy8zhBcv+WEEh1Ifgno/nwO88HA7p6uqkPwIRRDgO4a0puJU7Ag9nQLlYhelJM1YXjqFd6UKJygbBGEVmghYtKjlG5CnQGatBTmWlUq8wisRQHh5ym6Ef8ONljEfFuXRcbbLhhZAFPr4HYtp+yE0iague4ITwChbXZ/RXykBgu06D4nGUqtdvQ2J195Wzs40leMbWQLikR7QztQJGw3vJQEzTwnxRM84HvoLcNdXTxuGdGLcf/Q38vPD/GOTin5lKfoTNl6VCJdOBaE08jfDbscZPSAamlFtYq8yJ+qQJELs5RstmHCg1ev8BNrb9nWqjt6K5xNLlIdTMxUGmtEEalfWiqNgjGfAltbHa+QAcli6Q5ZUvdK1gEMmGUsmApX2U2T944NnUC+K0N1ONxg394qzkG1zbHWTWb8kIPYiCBIV3tEc4DK+rXzIg5PpZTloY+/bKQb43VNPXQwlQ57VK/oOWM8tsST+DytUmkLFJP1X46rAtcU4y4Nd8YsW1J3GIWUF8+Va6Hu7DNBwQlTbEPl5B9c1x1Gl1MGW+Rc+SDvY2BQbcXhSquxF9OozAmzY83tULfbYZpGG5hiYGONywViG+boEvPgunwQG+Mx/ywQ6oc0Qo8hTQKe2ImGO4zA+jcDobai6MwigDcY5mULdMRE1GORZ3uFF2sBkdsyFonCr0H+jD6TkBd+4vYKxIi+fNM3BNDEHIGMC9JgHuFCV+ATVS/Fsd7kN2AAAAAElFTkSuQmCC'
  }
};

request(options, function (error, response, body) {
  if (error) throw new Error(error);

  console.log(body);
});