【深度学习】sdwebui A1111 加速方案对比，xformers vs Flash Attention 2

本文主要是介绍【深度学习】sdwebui A1111 加速方案对比，xformers vs Flash Attention 2，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

文章目录

资料支撑
资料结论
sdwebui A1111 速度对比测试
sdxl
- xformers 用contorlnet sdxl
- sdpa（--opt-sdp-no-mem-attention）用contorlnet sdxl
- sdpa(--opt-sdp-attention) 用contorlnet sdxl
- 不用xformers或者sdpa ,用contorlnet sdxl
- 不用xformers或者sdpa 纯生图 sdxl
- 用sdpa 纯生图不用contorlnet 生图时间
sd1.5
- 不用xformers或者sdpa sd1.5+hirefix2倍纯生图512
- 用sdpa sd1.5+hirefix2倍纯生图512
- 不用xformers或者sdpa sd1.5 纯生图512
- 用sdpa sd1.5 纯生图512
- 其他速度
结论

资料支撑

xformers中可以使用Flashv2
https://github.com/facebookresearch/xformers/issues/795
https://github.com/vllm-project/vllm/issues/485
https://github.com/facebookresearch/xformers/issues/832

PyTorch 支持 Flash Attention 2。
Flash Attention 2 是 Flash Attention 的改进版本，它提供了更高的性能和更好的并行性。它于 2023 年 11 月发布，并被集成到 PyTorch 2.2 中。
PyTorch 2.2 于 2024 年 2 月发布，它包含以下与 Flash Attention 2 相关的更新：

将 Flash Attention 内核更新到 v2 版本
支持 aarch64 平台上的 Flash Attention 2
修复了 Flash Attention 2 中的一些已知问题
要使用 Flash Attention 2，您需要安装 PyTorch 2.2 或更高版本。您还可以使用 torch.nn.functional.flash_attn() 函数显式调用 Flash Attention 2。
以下是一些有关如何使用 Flash Attention 2 的资源：
PyTorch 文档：https://discuss.pytorch.org/t/flash-attention/174955
Flash Attention 2 论文：https://arxiv.org/abs/2307.08691
Flash Attention 2 GitHub 存储库：https://github.com/Dao-AILab/flash-attention
https://github.com/pytorch/pytorch/pull/105602
更新日志：https://pytorch.org/blog/pytorch2-2/
https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html
Triton 内核
https://pytorch.org/blog/pytorch2-3/

SDPA vs. xformers
https://github.com/huggingface/diffusers/issues/3793
F.scaled_dot_product_attention() 是pytorch的SDPA
xformers.ops.memory_efficient_attention是xformer的对应算子
https://github.com/lucidrains/memory-efficient-attention-pytorch/blob/main/memory_efficient_attention_pytorch/memory_efficient_attention.py

https://github.com/facebookresearch/xformers/issues/950
在这里插入图片描述

sdwebui支持SDP：
https://github.com/AUTOMATIC1111/stable-diffusion-webui/pull/8367
https://qq742971636.blog.csdn.net/article/details/139772822
sdp 注意力机制与 xformers 相当，甚至略胜一筹：
[图片]

pytorch 2.0的注意力是Flash Attention 1
https://pytorch.org/docs/2.0/generated/torch.nn.functional.scaled_dot_product_attention.html
pytorch 2.2的注意力是Flash Attention 2
https://pytorch.org/docs/2.2/generated/torch.nn.functional.scaled_dot_product_attention.html

资料结论

pytorch2.2版本的 F.scaled_dot_product_attention() 即是Flash Attention 2

xformers 中新版本已经有类似实现。

sdwebui A1111 速度对比测试

参数含义看这里：
https://qq742971636.blog.csdn.net/article/details/139772822

使用ipadapter contorlnet

pytorch2.3+xformers 0.25

25轮

In a snowy mountain range, the young man is dressed in winter attire, facing the camera with a determined gaze. He sports a thick wool coat, knit hat, and gloves to keep warm in the frigid temperatures. His eyes, piercing and resolute, reflect the strength and resolve needed to conquer the elements and the challenging terrain.

paintings, sketches, worst quality, low quality, normal quality, lowres, blurry, text, logo, monochrome, grayscale, skin spots, acnes, skin blemishes, age spot, strabismus, wrong finger, bad anatomy, bad hands, error, missing fingers, cropped, jpeg artifacts, signature, watermark, username, dark skin, fused girls, fushion, bad feet, ugly, pregnant, vore, duplicate, morbid, mutilated, transexual, hermaphrodite, long neck, mutated hands, poorly drawn face, mutation, deformed, bad proportions, malformed limbs, extra limbs, cloned face, disfigured, gross proportions, missing arms, missing legs, extra arms, extra legs, plump, open mouth, tooth, teeth, nsfw,

sdxl

xformers 用contorlnet sdxl

xformers:

./webui.sh --enable-insecure-extension-access --skip-python-version-check --skip-torch-cuda-test  --listen --port 7860 --no-download-sd-model --api --no-half-vae --xformers

速度：

Time taken: 11.5 sec.

A: 13.29 GB, R: 16.77 GB, Sys: 18.5/39.3945 GB (47.0%)

sdpa（–opt-sdp-no-mem-attention）用contorlnet sdxl

sdpa

./webui.sh --enable-insecure-extension-access --skip-python-version-check --skip-torch-cuda-test  --listen --port 7860 --no-download-sd-model --api --no-half-vae --opt-sdp-no-mem-attention

Time taken: 11.1 sec.

A: 13.29 GB, R: 14.81 GB, Sys: 16.6/39.3945 GB (42.1%)

sdpa(–opt-sdp-attention) 用contorlnet sdxl

sdpa

./webui.sh --enable-insecure-extension-access --skip-python-version-check --skip-torch-cuda-test  --listen --port 7860 --no-download-sd-model --api --no-half-vae --opt-sdp-attention

Time taken: 11.4 sec.

A: 13.29 GB, R: 14.81 GB, Sys: 16.6/39.3945 GB (42.1%)

不用xformers或者sdpa ,用contorlnet sdxl

Time taken: 13.3 sec.

A: 13.28 GB, R: 15.39 GB, Sys: 17.1/39.3945 GB (43.5%)

不用xformers或者sdpa 纯生图 sdxl

Time taken: 10.1 sec.

A: 10.27 GB, R: 12.45 GB, Sys: 13.0/39.3945 GB (33.0%)

用sdpa 纯生图不用contorlnet 生图时间

Time taken: 6.7 sec.

A: 10.29 GB, R: 11.89 GB, Sys: 12.5/39.3945 GB (31.7%)

sd1.5

不用xformers或者sdpa sd1.5+hirefix2倍纯生图512

Time taken: 10.7 sec.

A: 10.37 GB, R: 10.49 GB, Sys: 11.1/39.3945 GB (28.1%)

用sdpa sd1.5+hirefix2倍纯生图512

Time taken: 6.2 sec.

A: 5.75 GB, R: 7.05 GB, Sys: 7.7/39.3945 GB (19.4%)

不用xformers或者sdpa sd1.5 纯生图512

Time taken: 3.1 sec.

A: 3.11 GB, R: 3.46 GB, Sys: 3.4/39.3945 GB (8.6%)

用sdpa sd1.5 纯生图512

Time taken: 2.3 sec.

A: 3.13 GB, R: 4.07 GB, Sys: 3.7/39.3945 GB (9.3%)

其他速度

写真四张图A100：时间： 50.00366139411926

写真，A10，1张图，生图换脸一套时间，25秒
写真，A10，2张图，生图换脸一套时间，46秒

aicy生图，不计算llm时间为，3.3秒
aicy生图，计算llm时间为，5.2秒

结论

新版的xformers 、Flash Attention 2、Pytorch 的速度都差不多。安装pytorch 2.2以上，启用sdpa（–opt-sdp-no-mem-attention，就可以不用安装xformers 了。

这篇关于【深度学习】sdwebui A1111 加速方案对比，xformers vs Flash Attention 2的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！

【深度学习】sdwebui A1111 加速方案对比，xformers vs Flash Attention 2

文章目录

资料支撑

资料结论

sdwebui A1111 速度对比测试

sdxl

xformers 用contorlnet sdxl

sdpa（–opt-sdp-no-mem-attention）用contorlnet sdxl

sdpa(–opt-sdp-attention) 用contorlnet sdxl

不用xformers或者sdpa ,用contorlnet sdxl

不用xformers或者sdpa 纯生图 sdxl

用sdpa 纯生图不用contorlnet 生图时间

sd1.5

不用xformers或者sdpa sd1.5+hirefix2倍纯生图512

用sdpa sd1.5+hirefix2倍纯生图512

不用xformers或者sdpa sd1.5 纯生图512

用sdpa sd1.5 纯生图512

其他速度

结论

相关文章

Java图片压缩三种高效压缩方案详细解析

SpringBoot首笔交易慢问题排查与优化方案

SpringCloud动态配置注解@RefreshScope与@Component的深度解析

Java进行文件格式校验的方案详解

Python 中的异步与同步深度解析(实践记录)

Python实现Microsoft Office自动化的几种方式及对比详解

IDEA中Git版本回退的两种实现方案

Java常用注解扩展对比举例详解

python中字符串拼接的几种方法及优缺点对比详解

Redis中高并发读写性能的深度解析与优化

【深度学习】sdwebui A1111 加速方案对比，xformers vs Flash Attention 2

文章目录

资料支撑

资料结论

sdwebui A1111 速度对比测试

sdxl

xformers 用contorlnet sdxl

sdpa（–opt-sdp-no-mem-attention） 用contorlnet sdxl

sdpa(–opt-sdp-attention) 用contorlnet sdxl

不用xformers或者sdpa ,用contorlnet sdxl

不用xformers或者sdpa 纯生图 sdxl

用sdpa 纯生图 不用contorlnet 生图时间

sd1.5

不用xformers或者sdpa sd1.5+hirefix2倍 纯生图512

用sdpa sd1.5+hirefix2倍 纯生图512

不用xformers或者sdpa sd1.5 纯生图512

用sdpa sd1.5 纯生图512

其他速度

结论

相关文章

sdpa（–opt-sdp-no-mem-attention）用contorlnet sdxl

用sdpa 纯生图不用contorlnet 生图时间

不用xformers或者sdpa sd1.5+hirefix2倍纯生图512

用sdpa sd1.5+hirefix2倍纯生图512