20240115如何在线识别俄语字幕?

2024-01-15 23:52

本文主要是介绍20240115如何在线识别俄语字幕?,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

20240115如何在线识别俄语字幕?
2024/1/15 21:25


百度搜索:俄罗斯语 音频 在线识别 字幕
Bilibili:俄语AI字幕识别

音视频转文字 字幕小工具V1.2

BING:音视频转文字 字幕小工具V1.2


https://www.bilibili.com/video/BV1d34y1F7qA
https://www.bilibili.com/video/BV1d34y1F7qA/?p=4&vd_source=4a6b675fa22dfa306da59f67b1f22616
音|视频转文字|字幕小工具V1.2,新增whisper-large-V3模型,支持100多种语言,自动翻译,解压即用!

万能君的软件库
主要分享自己做的一些有意思的原创工具,工具追求解压即用,希望对您有所帮助

解压即用的音|视频转文字|字幕小工具下载地址,关注 & 私信我:字幕,即可获取。
解压即用的音|视频转文字|字幕小工具下载地址,关注 & 私信我:字幕,即可获取。
软件制作不易,不用三连,有个免费的赞就行!!!!


音视频转文字字幕小工具V1.2下载
win10、win11
(1)夸克网盘链接:https://pan.quark.cn/s/82b36b6adfa7提取码:JsyQ
(2)百度网盘链接:https://pan.baidu.com/s/1UOV0orx6GhgMfoyETcNe0g?pwd=9p2x

开发不易,有条件的可以点击软件里的打赏按钮进行打赏O(∩_∩)O


https://github.com/openai/whisper
Whisper
[Blog] [Paper] [Model card] [Colab example]

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Approach
Approach

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Setup
We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:

pip install -U openai-whisper
Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

pip install git+https://github.com/openai/whisper.git 
To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
You may need rust installed as well, in case tiktoken does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH". If the installation fails with No module named 'setuptools_rust', you need to install setuptools_rust, e.g. by running:

pip install setuptools-rust
Available models and languages
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

Size    Parameters    English-only model    Multilingual model    Required VRAM    Relative speed
tiny    39 M    tiny.en    tiny    ~1 GB    ~32x
base    74 M    base.en    base    ~1 GB    ~16x
small    244 M    small.en    small    ~2 GB    ~6x
medium    769 M    medium.en    medium    ~5 GB    ~2x
large    1550 M    N/A    large    ~10 GB    1x
The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

WER breakdown by language

Command-line usage
The following command will transcribe speech in audio files, using the medium model:

whisper audio.flac audio.mp3 audio.wav --model medium
The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

whisper japanese.wav --language Japanese
Adding --task translate will translate the speech into English:

whisper japanese.wav --language Japanese --task translate
Run the following to view all available options:

whisper --help
See tokenizer.py for the list of all available languages.

Python usage
Transcription can also be performed within Python:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Internally, the transcribe() method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.

Below is an example usage of whisper.detect_language() and whisper.decode() which provide lower-level access to the model.

import whisper

model = whisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)
More examples
Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.

License
Whisper's code and model weights are released under the MIT License. See LICENSE for further details.


百度搜索:whisper ubuntu
https://blog.csdn.net/huiguo_/article/details/133382558
ubuntu使用whisper和funASR-语者分离-二值化


https://blog.csdn.net/yangyi139926/article/details/135110390
ubuntu16.04安装语音识别whisper及whisper-ctranslate2工具(填坑篇)


https://zhuanlan.zhihu.com/p/664661510
基于arm架构图为智盒(T906G)ubuntu20.04搭建open-ai Whisper并实现语音转文字


https://www.ncnynl.com/archives/202310/6051.html
ROS2与语音交互教程-利用whisper实现ros2下发布语音转文字话题


参考资料:
https://www.bilibili.com/video/BV14C4y1F7YM
https://www.bilibili.com/video/BV14C4y1F7YM/?spm_id_from=333.337.search-card.all.click&vd_source=4a6b675fa22dfa306da59f67b1f22616
音频视频转换字幕,支持100多种语言识别与翻译,支持离线

这款音频视频转字幕工具支持100多种语言识别与翻译,翻译识别的语言支持英语、日语、韩语、德语、俄语等等,支持纯离线运行。
这款音频视频转字幕工具基于openAI的whisper的衍生项目faster whisper而做的,操作简单,转换完成后,输出目录会生成srt和TXT的字幕格式文本。


https://www.bilibili.com/video/BV1WR4y1e7Fh/?spm_id_from=333.337.search-card.all.click&vd_source=4a6b675fa22dfa306da59f67b1f22616
沙拉俄语·字幕插件如何在手机和电脑上使用?
俄语 音频 识别


https://www.bilibili.com/read/cv17827622/
俄语学习:俄语音视频转文字(vlc player +字幕专家)


【收费】
https://gglot.com/zh/russian-subtitles/
俄语字幕
准确的俄语字幕,轻松在线生成


【免费的工具额外收费了!】
https://www.98dw.com/102.html
https://www.bilibili.com/read/cv28458016/?jump_opus=1
音视频转字幕小工具V1.2,支持上百种语言,翻译神器

基于openAI的whisper的衍生项目faster whisper做成,支持100多种语言识别与翻译。
软件纯离线运行

1、软件的界面很简单,操作步骤也说的很清楚了:
2、转换完成后,输出目录会有srt字幕格式和txt纯文本格式。
3、测试一些视频语音翻译的字幕效果截图
翻译识别语言涉及到了日语、英语、韩语、俄语、德语等。 


 

这篇关于20240115如何在线识别俄语字幕?的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/610629

相关文章

Python中图片与PDF识别文本(OCR)的全面指南

《Python中图片与PDF识别文本(OCR)的全面指南》在数据爆炸时代,80%的企业数据以非结构化形式存在,其中PDF和图像是最主要的载体,本文将深入探索Python中OCR技术如何将这些数字纸张转... 目录一、OCR技术核心原理二、python图像识别四大工具库1. Pytesseract - 经典O

Python基于微信OCR引擎实现高效图片文字识别

《Python基于微信OCR引擎实现高效图片文字识别》这篇文章主要为大家详细介绍了一款基于微信OCR引擎的图片文字识别桌面应用开发全过程,可以实现从图片拖拽识别到文字提取,感兴趣的小伙伴可以跟随小编一... 目录一、项目概述1.1 开发背景1.2 技术选型1.3 核心优势二、功能详解2.1 核心功能模块2.

基于Python实现一个简单的题库与在线考试系统

《基于Python实现一个简单的题库与在线考试系统》在当今信息化教育时代,在线学习与考试系统已成为教育技术领域的重要组成部分,本文就来介绍一下如何使用Python和PyQt5框架开发一个名为白泽题库系... 目录概述功能特点界面展示系统架构设计类结构图Excel题库填写格式模板题库题目填写格式表核心数据结构

Python验证码识别方式(使用pytesseract库)

《Python验证码识别方式(使用pytesseract库)》:本文主要介绍Python验证码识别方式(使用pytesseract库),具有很好的参考价值,希望对大家有所帮助,如有错误或未考虑完全... 目录1、安装Tesseract-OCR2、在python中使用3、本地图片识别4、结合playwrigh

使用Python和PaddleOCR实现图文识别的代码和步骤

《使用Python和PaddleOCR实现图文识别的代码和步骤》在当今数字化时代,图文识别技术的应用越来越广泛,如文档数字化、信息提取等,PaddleOCR是百度开源的一款强大的OCR工具包,它集成了... 目录一、引言二、环境准备2.1 安装 python2.2 安装 PaddlePaddle2.3 安装

Android实现在线预览office文档的示例详解

《Android实现在线预览office文档的示例详解》在移动端展示在线Office文档(如Word、Excel、PPT)是一项常见需求,这篇文章为大家重点介绍了两种方案的实现方法,希望对大家有一定的... 目录一、项目概述二、相关技术知识三、实现思路3.1 方案一:WebView + Office Onl

JS+HTML实现在线图片水印添加工具

《JS+HTML实现在线图片水印添加工具》在社交媒体和内容创作日益频繁的今天,如何保护原创内容、展示品牌身份成了一个不得不面对的问题,本文将实现一个完全基于HTML+CSS构建的现代化图片水印在线工具... 目录概述功能亮点使用方法技术解析延伸思考运行效果项目源码下载总结概述在社交媒体和内容创作日益频繁的

使用PyTorch实现手写数字识别功能

《使用PyTorch实现手写数字识别功能》在人工智能的世界里,计算机视觉是最具魅力的领域之一,通过PyTorch这一强大的深度学习框架,我们将在经典的MNIST数据集上,见证一个神经网络从零开始学会识... 目录当计算机学会“看”数字搭建开发环境MNIST数据集解析1. 认识手写数字数据库2. 数据预处理的

MySQL使用binlog2sql工具实现在线恢复数据功能

《MySQL使用binlog2sql工具实现在线恢复数据功能》binlog2sql是大众点评开源的一款用于解析MySQLbinlog的工具,根据不同选项,可以得到原始SQL、回滚SQL等,下面我们就来... 目录背景目标步骤准备工作恢复数据结果验证结论背景生产数据库执行 SQL 脚本,一般会经过正规的审批

Pytorch微调BERT实现命名实体识别

《Pytorch微调BERT实现命名实体识别》命名实体识别(NER)是自然语言处理(NLP)中的一项关键任务,它涉及识别和分类文本中的关键实体,BERT是一种强大的语言表示模型,在各种NLP任务中显著... 目录环境准备加载预训练BERT模型准备数据集标记与对齐微调 BERT最后总结环境准备在继续之前,确