Blogginlägg

Solving "The recordings URI contains invalid data." in Azure Speech - Batch Transcription

Av Peter Örneholm | Blogg | 8 maj 2020

… or “How to remove a thumbnail from an mp3 using ffmpeg?”

While building the site RadioText.net (which I’ve blogged about here) I found that a few of the files I tried to transcribe returned an undocumented error: “The recordings URI contains invalid data.” and I’ll describe the solution to the (at least my) problem below.

The problem

So, the problem appeared when using Azure Speech Services - Batch transcription. Speech Services is one of the AI services in Azure provided through Azure Cogntitive Services. Batch Transcription allows for large scale, asyncronous transcription of audio files.

Initiating the transcription went fine is done using a HTTP POST like this (which of course can vary depending on your preferences):

HTTP POST: https://westeurope.cris.ai/api/speechtotext/v2.0/Transcriptions/

{
   "Name":"RadioText - Episode 1482554",
   "Description":"RadioText",
   "RecordingsUrl":"https://STORAGEACCOUNT.blob.core.windows.net/media/Audio.mp3",
   "Locale":"sv-SE",
   // ...
}

The error is returned to you when you are querying for the transcription status:

HTTP GET: https://westeurope.cris.ai/api/speechtotext/v2.0/Transcriptions/TRANSCRIPTION-GUID

{
  "recordingsUrl": "https://STORAGEACCOUNT.blob.core.windows.net/media/Audio.mp3",
  // ...
  "statusMessage": "The recordings URI contains invalid data.",
  "status": "Failed",
  "locale": "sv-SE",
  // ...
}

Video stream / JPG

After some investigation, the issue is seems undocumented but is related to that the batch service requires that the audio file being transcribed need to contain audio streams exclusively. The mp3 files I were trying to transcribe had an embedded video stream containing a jpg file - basically, the thumbnails used in media players. At the moment, batch transcription can’t handle audio files containing such image/video stream (confirmed with the team at Microsoft).

Solution

Until support for these media files is built into the service, it’s easy for us to extract only the audio channel ourselves using ffmpeg. The following command will extract the first audio channel from Input.mp3 and output it as Output.mp3 - ready to use for batch service. It’s documented under Stream selection in the ffmpeg documentation.

ffmpeg -i Input.mp3 -map 0:a -codec:a copy Output.mp3

If we break it down:

  • ffmpeg: The utility
  • -i Input.mp3: Use Input.mp3 as input, could also be a URL (https://STORAGEACCOUNT.blob.core.windows.net/media/Audio.mp3).
  • -map 0:a: Use the audio (a) from the first input file (0)
  • -codec:a copy: Set codec option for audio (a) to only copy, this will make it very efficient as it does not need to encode anything.
  • Output.mp3: Output the new file that only contains the audio stream(s) into Output.mp3

There might be other options or tweaks that work for you, but I found this to be fast and work well to solve my problem.

Calling ffmpeg from CSharp

In my case, I wanted to run the above ffmpeg command as part of my transcription pipeline and therefore call it from C#. The below code is a minimal approach to do so:

const string ffMpegLocation = "PATH_TO_FFMPEG.EXE";

var inputFile = "Input.mp3";
var outputFile = "Output.mp3";

var processStartInfo = new ProcessStartInfo
{
    FileName = ffmpegLocation,
    Arguments = $"ffmpeg -i \"{inputFile}\" -map 0:a -codec:a copy \"{outputFile}\"",
    UseShellExecute = false,
    RedirectStandardOutput = true,
    CreateNoWindow = true
};

var process = Process.Start(processStartInfo);
process.WaitForExit();

Contribute

I hope this helps, and if you have any tweaks you want to share, please let me know. I’m at @PeterOrneholm.

Till inlägget