发布日期: 2023-04-06

试玩 MockingBird

偶然间 B 站是看到的，觉得挺有意思，所以拉取下来部署玩玩看，官方文档写的比较全面，基本按照顺序走一遍就行，不过中间还是有几个小坑

一、环境准备

操作系统 CentOS Linux release 7.9
Linux 内核 3.10.0
Python 3.9.16

二、项目部署

1. 代码拉取

拉取代码

$ git clone git@github.com:babysor/$.git

我看 GitHub 仓库 main 分支是默认分支

贴一下，避免 main 分支的最新提交

$ git show 
commit b78d0d2a26a692be089f7acc8c4b7dda794bd573
Merge: 5c17fc8 d1ba355
Author: Vega <babysor00@gmail.com>
Date:   Tue Mar 7 16:41:48 2023 +0800

    Merge pull request #782 from Nier-Y/main
    
    Update README.md and README-CN.md

下面的内容都是 main 分支（默认分支）下操作的

$ cd MockingBird
$ python -m venv venv
$ source venv/bin/activate

2. 安装依赖

首先，安装下 ffmpeg，使用 yum 安装即可

$ yum -y install ffmpeg ffmpeg-devel

由于 main 分支下的 requirements.txt 内存在缺失，以及版本错误，例如：

# 版本过低，无法应用在较高版本的 Python 
monotonic-align==0.0.3
# 以及缺少依赖
pydantic
pillow
webrtcvad-wheels
audio-recorder-streamlit

所以，这里贴一下我这边的

$ cat > requirements_fix.txt <<EOF
absl-py==1.4.0
altair==4.2.2
aniso8601==9.0.1
antlr4-python3-runtime==4.9.3
anyio==3.6.2
attrs==22.2.0
audio-recorder-streamlit==0.0.8
audioread==3.0.0
blinker==1.5
cachetools==5.3.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.1.0
ci-sdr==0.0.2
click==8.0.4
cmake==3.26.0
colorama==0.4.6
commonmark==0.9.1
ConfigArgParse==1.5.3
contourpy==1.0.7
ctc-segmentation==1.7.4
cycler==0.11.0
Cython==0.29.33
decorator==5.1.1
dill==0.3.6
Distance==0.1.3
editdistance==0.6.2
einops==0.6.0
entrypoints==0.4
espnet==202301
espnet-tts-frontend==0.0.3
fast-bss-eval==0.1.3
fastapi==0.95.0
ffmpeg==1.4
filelock==3.10.7
Flask==2.2.3
Flask-Cors==3.0.10
flask-restx==1.1.0
Flask-WTF==1.1.1
fonttools==4.39.2
g2p-en==2.1.0
gevent==21.8.0
gitdb==4.0.10
GitPython==3.1.31
greenlet==1.1.3.post0
grpcio==1.51.3
h5py==3.8.0
huggingface-hub==0.13.3
humanfriendly==10.0
hydra-core==1.3.2
idna==3.4
importlib-metadata==4.13.0
inflect==6.0.2
itsdangerous==2.1.2
jaconv==0.3.4
jamo==0.4.1
Jinja2==3.1.2
joblib==1.2.0
jsonpatch==1.32
jsonpointer==2.3
jsonschema==4.17.3
kaldiio==2.17.2
kiwisolver==1.4.4
librosa==0.8.1
lit==16.0.0
llvmlite==0.39.1
loguru==0.6.0
Markdown==3.4.3
MarkupSafe==2.1.2
matplotlib==3.6.3
monotonic-align==1.0.0
mpmath==1.3.0
multiprocess==0.70.14
networkx==3.0
nltk==3.8.1
numba==0.56.4
numpy==1.19.4
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-cupti-cu11==11.7.101
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
nvidia-cufft-cu11==10.9.0.58
nvidia-curand-cu11==10.2.10.91
nvidia-cusolver-cu11==11.4.0.1
nvidia-cusparse-cu11==11.7.4.91
nvidia-nccl-cu11==2.14.3
nvidia-nvtx-cu11==11.7.91
omegaconf==2.3.0
opt-einsum==3.3.0
packaging==23.0
pandas==1.4.4
Pillow==9.4.0
platformdirs==3.1.1
pooch==1.7.0
protobuf==3.20.1
pyarrow==11.0.0
pycparser==2.21
pydantic==1.10.7
pydeck==0.8.0
Pygments==2.14.0
Pympler==1.0.1
pynndescent==0.5.8
pyparsing==3.0.9
pypinyin==0.44.0
PyQt5==5.15.9
PyQt5-Qt5==5.15.2
PyQt5-sip==12.11.1
pyrsistent==0.19.3
python-dateutil==2.8.2
pytorch-wpe==0.0.1
pytz==2022.7.1
pytz-deprecation-shim==0.1.0.post0
PyWavelets==1.4.1
pyworld==0.3.2
PyYAML==5.4.1
regex==2023.3.23
requests==2.28.2
resampy==0.4.2
rich==12.6.0
scikit-learn==1.2.2
scipy==1.9.3
semver==2.13.0
sentencepiece==0.1.97
shellingham==1.5.0.post1
six==1.16.0
smmap==5.0.0
sniffio==1.3.0
sounddevice==0.4.6
soundfile==0.12.1
starlette==0.26.1
streamlit==1.20.0
sympy==1.11.1
tensorboard==1.15.0
threadpoolctl==3.1.0
tokenizers==0.13.2
toml==0.10.2
toolz==0.12.0
torch==2.0.0
torch-complex==0.4.3
tornado==6.2
tqdm==4.65.0
transformers==4.26.0
triton==2.0.0
typeguard==3.0.2
typer==0.7.0
typing_extensions==4.5.0
tzdata==2022.7
tzlocal==4.3
umap-learn==0.5.3
Unidecode==1.3.6
urllib3==1.26.15
validators==0.20.0
visdom==0.2.4
watchdog==3.0.0
webrtcvad==2.0.10
webrtcvad-wheels==2.0.11.post1
websocket-client==1.5.1
Werkzeug==2.2.3
WTForms==3.0.1
zipp==3.15.0
zope.event==4.6
zope.interface==6.0
EOF

安装修改后 requirements.txt 依赖列表

$ pip install -r requirements_fix.txt

补充说明下，如果依赖库下载的很慢，建议配置国内镜像源

$ mkdir -p ~/.pip
$ cat > ~/.pip/pip.conf << EOF
[global]
index-url = https://mirrors.tencent.com/pypi/simple

[install]
trusted-host = mirrors.tencent.com
EOF

配置完成，重新执行 pip install -r requirements_fix.txt 即可

3. 获取模型

下载社区训练好的模型，我这边尝试过两个

pretrained-11-7-21_75k.pt 支持当前 main 分支（默认分支）最新代码
ceshi.pt 需要切换项目到 tags/v0.0.1 版本

从网盘中下载好模型文件，分别存放至以下两个目录，之所以要分开存放，这是因为不同版本的代码，会读取不同模型存储路径，这边我也懒得改他的代码，所以直接按照他的读取逻辑，创建对应目录

$ mkdir -p data/ckpt/synthesizer
$ mkdir -p synthesizer/saved_models/

最终效果

$ ls synthesizer/saved_models/
ceshi.pt
$ ls data/ckpt/synthesizer    
pretrained-11-7-21_75k.pt

4. 启动服务

启动服务，默认会监听 0.0.0.0:8080

$ python web.py

  You can now view your Streamlit app in your browser.

  Network URL: http://10.0.24.14:8080
  External URL: http://192.144.227.61:8080

打开网页，loading 之后控制台会输出扫描到的各类模型

  You can now view your Streamlit app in your browser.

  Network URL: http://10.0.24.14:8080
  External URL: http://192.144.227.61:8080

Loaded synthesizer models: 1
Loaded encoders models: 1
Loaded vocoders models: 1

三、上手体验

pretrained-11-7-21_75k

首先，先试下这个模型，这边我随便找了个女声

文字的话就用 “社会主义核心价值观”，上传录音，选择对应的模型，执行生成，loading 的过程回到控制台看下

Loaded synthesizer models: 1
Loaded encoders models: 1
Loaded vocoders models: 1

Loaded encoder "pretrained.pt" trained to step 1594501
Synthesizer using device: cpu
Building hifigan
Loading 'data/ckpt/vocoder/pretrained/g_hifigan.pt'
Complete.
Removing weight norm...
Trainable Parameters: 0.000M
Loaded synthesizer "pretrained-11-7-21_75k.pt" trained to step 75000
+----------+---+
| Tacotron | r |
+----------+---+
|   75k    | 2 |
+----------+---+
 
Read ['富强', '民主', '文明', '和谐', '倡导自由', '平等', '公正', '法治', '倡导爱国', '敬业', '诚信', '友善']
Synthesizing ['fu4 qiang2', 'min2 zhu3', 'wen2 ming2', 'he2 xie2', 'chang4 dao3 zi4 you2', 'ping2 deng3', 'gong1 zheng4', 'fa3 zhi4', 'chang4 dao3 ai4 guo2', 'jing4 ye4', 'cheng2 xin4', 'you3 shan4']

| Generating 1/1


Done.

看到 Done 说明已经生成完毕，回到 WebUI 试听一下，结果文件

嗯，怎么说呢，差点意思…

ceshi.pt

换另外一个模型试下，之前说过，这个使用模型需要切换 tags/v0.0.1 版本

切换版本

$ git checkout tags/v0.0.1 --force

这个版本有几个小问题，我们提前处理下

启动时自动打开浏览器

执行 python web.py 时会自动打开浏览器，我这边是控制台，所以打开的是 elinks 之类的

修改 web/__init__.py，注释掉这两行
```
# import webbrowser

# webbrowser.open(web_address)
```
如果需要对外开放服务，那么还修改 web/config/default.py 默认监听为 0.0.0.0
```
HOST = '0.0.0.0'
```

修改 synthesizer/utils/symbols.py 配置 _characters 变量，相关 Issue

_characters = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz12340!\'(),-.:;?

处理完上面的几个小坑后，启动服务

(venv) ☁  MockingBird [v0.0.1] ⚡  python web.py
Loaded synthesizer models: 1
Loaded encoder "pretrained.pt" trained to step 1594501
Building hifigan
Loading 'vocoder/saved_models/pretrained/g_hifigan.pt'
Complete.
Removing weight norm...
Web server:http://0.0.0.0:8080

访问 WebUI，按照顺序，填入示例文字，上传录音，选择确认模型，执行生成

终端输出

Synthesizer using device: cpu
using synthesizer model: synthesizer/saved_models/ceshi.pt
Trainable Parameters: 31.948M
Loaded synthesizer "ceshi.pt" trained to step 148200
+----------+---+
| Tacotron | r |
+----------+---+
|   148k   | 2 |
+----------+---+
 
Read ['富强', '民主', '文明', '和谐', '倡导自由', '平等', '公正', '法治', '倡导爱国', '敬业', '诚信', '友善']
Synthesizing ['fu4 qiang2', 'min2 zhu3', 'wen2 ming2', 'he2 xie2', 'chang4 dao3 zi4 you2', 'ping2 deng3', 'gong1 zheng4', 'fa3 zhi4', 'chang4 dao3 ai4 guo2', 'jing4 ye4', 'cheng2 xin4', 'you3 shan4']

| Generating 1/1


Done.

223.104.41.98 - - [2023-04-06 11:12:22] "POST /api/synthesize HTTP/1.1" 200 1730418 21.009126
223.104.41.98 - - [2023-04-06 11:12:29] "GET /static/img/bird-sm.png HTTP/1.1" 304 187 0.001725

试听一下，结果文件，唉，还不如第一个模型呢…