About IDEAL VoiceBase

IDEAL VoiceBase

IDEAL VoiceBase Captioning Infrastructure (IVBCI) provides users the ability to upload audio recordings and uncaptioned /poorly captioned videos for automatic captioning.  The system is capable of transcribing speech in any of the following languages/dialects:  Dutch, English (Australian), English (Indian) , English (South East Asian), English (UK), English (US), French, German, Italian, Spanish, and Spanish (Latin American).

The following formats are accepted: 3gp, aac, aiff, amr, asf, au, avi, caf, cf, flac, flv, m4a, m4v, mov, mp3, mp4, mpeg, mpg, ogg, ra, wav, webm, wma, and wmv. Audio recordings/videos are automatically captioned using neural networks and a highly sophisticated base of human speech recognition algorithms.  The captions are then returned to the user in a time-coded editor that enables them (or someone else) to easily edit the captions without throwing off the timing of the captions;

After editing, people accessing the transcribed audio tracks/videos can:

  • View the video in sync with the captions;
  • Hear the video;
  • See the full transcript;
  • Conduct a word-search on the video (captions) and then play the video forward from the first, and every subsequent, utterance of the keyword(s) used;
  • Click on any word in the transcript to begin playing the video from that point forward;

In addition,

  • Words in the transcript are highlighted as they are being spoken in the video, in addition to displaying the captions;
  • Keywords used in the video are automatically identified, extracted from the transcript, and listed on the video viewing page;
  • Topics discussed in the video are automatically identified and posted on the video page.
  • Clicking on any “keyword” results in the system placing little triangles at the point(s) where that keyword was spoken in the video. Users can then click on any triangle and play the video forward from that point.