20240115如何在线识别俄语字幕?

2024-01-15 23:52

本文主要是介绍20240115如何在线识别俄语字幕?,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

20240115如何在线识别俄语字幕?
2024/1/15 21:25


百度搜索:俄罗斯语 音频 在线识别 字幕
Bilibili:俄语AI字幕识别

音视频转文字 字幕小工具V1.2

BING:音视频转文字 字幕小工具V1.2


https://www.bilibili.com/video/BV1d34y1F7qA
https://www.bilibili.com/video/BV1d34y1F7qA/?p=4&vd_source=4a6b675fa22dfa306da59f67b1f22616
音|视频转文字|字幕小工具V1.2,新增whisper-large-V3模型,支持100多种语言,自动翻译,解压即用!

万能君的软件库
主要分享自己做的一些有意思的原创工具,工具追求解压即用,希望对您有所帮助

解压即用的音|视频转文字|字幕小工具下载地址,关注 & 私信我:字幕,即可获取。
解压即用的音|视频转文字|字幕小工具下载地址,关注 & 私信我:字幕,即可获取。
软件制作不易,不用三连,有个免费的赞就行!!!!


音视频转文字字幕小工具V1.2下载
win10、win11
(1)夸克网盘链接:https://pan.quark.cn/s/82b36b6adfa7提取码:JsyQ
(2)百度网盘链接:https://pan.baidu.com/s/1UOV0orx6GhgMfoyETcNe0g?pwd=9p2x

开发不易,有条件的可以点击软件里的打赏按钮进行打赏O(∩_∩)O


https://github.com/openai/whisper
Whisper
[Blog] [Paper] [Model card] [Colab example]

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification.

Approach
Approach

A Transformer sequence-to-sequence model is trained on various speech processing tasks, including multilingual speech recognition, speech translation, spoken language identification, and voice activity detection. These tasks are jointly represented as a sequence of tokens to be predicted by the decoder, allowing a single model to replace many stages of a traditional speech-processing pipeline. The multitask training format uses a set of special tokens that serve as task specifiers or classification targets.

Setup
We used Python 3.9.9 and PyTorch 1.10.1 to train and test our models, but the codebase is expected to be compatible with Python 3.8-3.11 and recent PyTorch versions. The codebase also depends on a few Python packages, most notably OpenAI's tiktoken for their fast tokenizer implementation. You can download and install (or update to) the latest release of Whisper with the following command:

pip install -U openai-whisper
Alternatively, the following command will pull and install the latest commit from this repository, along with its Python dependencies:

pip install git+https://github.com/openai/whisper.git 
To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
It also requires the command-line tool ffmpeg to be installed on your system, which is available from most package managers:

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
You may need rust installed as well, in case tiktoken does not provide a pre-built wheel for your platform. If you see installation errors during the pip install command above, please follow the Getting started page to install Rust development environment. Additionally, you may need to configure the PATH environment variable, e.g. export PATH="$HOME/.cargo/bin:$PATH". If the installation fails with No module named 'setuptools_rust', you need to install setuptools_rust, e.g. by running:

pip install setuptools-rust
Available models and languages
There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and inference speed relative to the large model; actual speed may vary depending on many factors including the available hardware.

Size    Parameters    English-only model    Multilingual model    Required VRAM    Relative speed
tiny    39 M    tiny.en    tiny    ~1 GB    ~32x
base    74 M    base.en    base    ~1 GB    ~16x
small    244 M    small.en    small    ~2 GB    ~6x
medium    769 M    medium.en    medium    ~5 GB    ~2x
large    1550 M    N/A    large    ~10 GB    1x
The .en models for English-only applications tend to perform better, especially for the tiny.en and base.en models. We observed that the difference becomes less significant for the small.en and medium.en models.

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of large-v3 and large-v2 models by language, using WERs (word error rates) or CER (character error rates, shown in Italic) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of the paper, as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

WER breakdown by language

Command-line usage
The following command will transcribe speech in audio files, using the medium model:

whisper audio.flac audio.mp3 audio.wav --model medium
The default setting (which selects the small model) works well for transcribing English. To transcribe an audio file containing non-English speech, you can specify the language using the --language option:

whisper japanese.wav --language Japanese
Adding --task translate will translate the speech into English:

whisper japanese.wav --language Japanese --task translate
Run the following to view all available options:

whisper --help
See tokenizer.py for the list of all available languages.

Python usage
Transcription can also be performed within Python:

import whisper

model = whisper.load_model("base")
result = model.transcribe("audio.mp3")
print(result["text"])
Internally, the transcribe() method reads the entire file and processes the audio with a sliding 30-second window, performing autoregressive sequence-to-sequence predictions on each window.

Below is an example usage of whisper.detect_language() and whisper.decode() which provide lower-level access to the model.

import whisper

model = whisper.load_model("base")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("audio.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}")

# decode the audio
options = whisper.DecodingOptions()
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)
More examples
Please use the 🙌 Show and tell category in Discussions for sharing more example usages of Whisper and third-party extensions such as web demos, integrations with other tools, ports for different platforms, etc.

License
Whisper's code and model weights are released under the MIT License. See LICENSE for further details.


百度搜索:whisper ubuntu
https://blog.csdn.net/huiguo_/article/details/133382558
ubuntu使用whisper和funASR-语者分离-二值化


https://blog.csdn.net/yangyi139926/article/details/135110390
ubuntu16.04安装语音识别whisper及whisper-ctranslate2工具(填坑篇)


https://zhuanlan.zhihu.com/p/664661510
基于arm架构图为智盒(T906G)ubuntu20.04搭建open-ai Whisper并实现语音转文字


https://www.ncnynl.com/archives/202310/6051.html
ROS2与语音交互教程-利用whisper实现ros2下发布语音转文字话题


参考资料:
https://www.bilibili.com/video/BV14C4y1F7YM
https://www.bilibili.com/video/BV14C4y1F7YM/?spm_id_from=333.337.search-card.all.click&vd_source=4a6b675fa22dfa306da59f67b1f22616
音频视频转换字幕,支持100多种语言识别与翻译,支持离线

这款音频视频转字幕工具支持100多种语言识别与翻译,翻译识别的语言支持英语、日语、韩语、德语、俄语等等,支持纯离线运行。
这款音频视频转字幕工具基于openAI的whisper的衍生项目faster whisper而做的,操作简单,转换完成后,输出目录会生成srt和TXT的字幕格式文本。


https://www.bilibili.com/video/BV1WR4y1e7Fh/?spm_id_from=333.337.search-card.all.click&vd_source=4a6b675fa22dfa306da59f67b1f22616
沙拉俄语·字幕插件如何在手机和电脑上使用?
俄语 音频 识别


https://www.bilibili.com/read/cv17827622/
俄语学习:俄语音视频转文字(vlc player +字幕专家)


【收费】
https://gglot.com/zh/russian-subtitles/
俄语字幕
准确的俄语字幕,轻松在线生成


【免费的工具额外收费了!】
https://www.98dw.com/102.html
https://www.bilibili.com/read/cv28458016/?jump_opus=1
音视频转字幕小工具V1.2,支持上百种语言,翻译神器

基于openAI的whisper的衍生项目faster whisper做成,支持100多种语言识别与翻译。
软件纯离线运行

1、软件的界面很简单,操作步骤也说的很清楚了:
2、转换完成后,输出目录会有srt字幕格式和txt纯文本格式。
3、测试一些视频语音翻译的字幕效果截图
翻译识别语言涉及到了日语、英语、韩语、俄语、德语等。 


 

这篇关于20240115如何在线识别俄语字幕?的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/610629

相关文章

水位雨量在线监测系统概述及应用介绍

在当今社会,随着科技的飞速发展,各种智能监测系统已成为保障公共安全、促进资源管理和环境保护的重要工具。其中,水位雨量在线监测系统作为自然灾害预警、水资源管理及水利工程运行的关键技术,其重要性不言而喻。 一、水位雨量在线监测系统的基本原理 水位雨量在线监测系统主要由数据采集单元、数据传输网络、数据处理中心及用户终端四大部分构成,形成了一个完整的闭环系统。 数据采集单元:这是系统的“眼睛”,

电力系统中的A类在线监测装置—APView400

随着电力系统的日益复杂和人们对电能质量要求的提高,电能质量在线监测装置在电力系统中得到广泛应用。目前,市场上的在线监测装置主要分为A类和B类两种类型,A类和B类在线监测装置主要区别在于应用场景、技术参数、通讯协议和扩展性。选择时应根据实际需求和应用场景综合考虑,并定期维护和校准。电能质量在线监测装置是用于实时监测电力系统中的电能质量参数的设备。 APView400电能质量A类在线监测装置以其多核

阿里开源语音识别SenseVoiceWindows环境部署

SenseVoice介绍 SenseVoice 专注于高精度多语言语音识别、情感辨识和音频事件检测多语言识别: 采用超过 40 万小时数据训练,支持超过 50 种语言,识别效果上优于 Whisper 模型。富文本识别:具备优秀的情感识别,能够在测试数据上达到和超过目前最佳情感识别模型的效果。支持声音事件检测能力,支持音乐、掌声、笑声、哭声、咳嗽、喷嚏等多种常见人机交互事件进行检测。高效推

JavaFX应用更新检测功能(在线自动更新方案)

JavaFX开发的桌面应用属于C端,一般来说需要版本检测和自动更新功能,这里记录一下一种版本检测和自动更新的方法。 1. 整体方案 JavaFX.应用版本检测、自动更新主要涉及一下步骤: 读取本地应用版本拉取远程版本并比较两个版本如果需要升级,那么拉取更新历史弹出升级控制窗口用户选择升级时,拉取升级包解压,重启应用用户选择忽略时,本地版本标志为忽略版本用户选择取消时,隐藏升级控制窗口 2.

Go Playground 在线编程环境

For all examples in this and the next chapter, we will use Go Playground. Go Playground represents a web service that can run programs written in Go. It can be opened in a web browser using the follow

Clion不识别C代码或者无法跳转C语言项目怎么办?

如果是中文会显示: 此时只需要右击项目,或者你的源代码目录,将这个项目或者源码目录标记为项目源和头文件即可。 英文如下:

12C 新特性,MOVE DATAFILE 在线移动 包括system, 附带改名 NID ,cdb_data_files视图坏了

ALTER DATABASE MOVE DATAFILE  可以改名 可以move file,全部一个命令。 resue 可以重用,keep好像不生效!!! system照移动不误-------- SQL> select file_name, status, online_status from dba_data_files where tablespace_name='SYSTEM'

css选择器和xpath选择器在线转换器

具体前往:Css Selector(选择器)转Xpath在线工具

C/C++ 网络聊天室在线聊天系统(整理重传)

知识点: TCP网络通信 服务端的流程: 1.创建socket套接字 2.给这个socket绑定一个端口号 3.给这个socket开启监听属性 4.等待客户端连接 5.开始通讯 6.关闭连接 解释: socket:类似于接口的东西,只有通过这个才能跟对应的电脑通信。 每一台电脑都有一个IP地址,一台电脑上有多个应用,每个应用都会有一个端口号。 socket一般分为两种类型,一种是通讯,一种是监听

BERN2(生物医学领域)命名实体识别与命名规范化工具

BERN2: an advanced neural biomedical named entity recognition and normalization tool 《Bioinformatics》2022 1 摘要 NER和NEN:在生物医学自然语言处理中,NER和NEN是关键任务,它们使得从生物医学文献中自动提取实体(如疾病和药物)成为可能。 BERN2:BERN2是一个工具,