本文主要是介绍Training - 使用 WandB 配置管理模型训练过程,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
欢迎关注我的CSDN:https://spike.blog.csdn.net/
本文地址:https://blog.csdn.net/caroline_wendy/article/details/137529140
WandB (Weights&Biases) 是轻量级的在线模型训练可视化工具,类似于 TensorBoard,可以帮助用户跟踪实验,记录运行中的超参数和输出指标,可视化结果,并且,共享这些结果。WandB 支持所有主流的深度学习框架,如 TensorFlow、PyTorch、Keras 等,提供了丰富的功能。使用 WandB,可以轻松地监控模型训练过程,通过云平台同步模型输出、日志和文件,便于远程监控和协作。
WandB 的自动化配置如下,在 sh 文件中,配置账号:
wandb online
wandb login [your api key]
API Key 位于 User settings
- Danger Zone
,即:
API Key 需要与
WANDB_ENTITY
成对使用。
启动 WandB 的命令,如下:
- entity: WandB 的 UserName,需要与 API Key 配对。
- project: 工程名称,用于存储名称
- name:实验名称,用于区分不同的实验
即:
os.environ['WANDB_ENTITY'] = "[your name]"if args.wandb:logger.info(f"Initializing wandb! {os.environ['WANDB_ENTITY']}")wandb.init(entity=os.environ["WANDB_ENTITY"],settings=wandb.Settings(start_method="fork"),project="alphaflow",name=args.run_name,config=args,)
注意:同一个项目(Project),例如 alphaflow,结果才能进行比较。
WandB 的相关日志:
wandb: Currently logged in as: morndragon. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.16.6 is available! To upgrade, please run:
wandb: $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.16.5
wandb: Run data is saved locally in wandb/run-20240408_161416-fl5dmx0d
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run experiment-20240408
wandb: ⭐️ View project at https://wandb.ai/[your name]/alphaflow
wandb: 🚀 View run at https://wandb.ai/[your name]/alphaflow/runs/fl5dmx0d/workspace
WandB 页面展示:
Bug:wandb.errors.CommError: It appears that you do not have permission to access the requested resource.
,即:
wandb: Currently logged in as: morndragon. Use `wandb login --relogin` to force relogin
wandb: ERROR Error while calling W&B API: permission denied (<Response [403]>)
Problem at: /nfs_beijing_ai/chenlong/workspace/alphaflow-by-chenlong/train.py 50 main
wandb: ERROR It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 403: Forbidden)
Traceback (most recent call last):File "train.py", line 177, in <module>main()File "train.py", line 50, in mainwandb.init(File "miniconda3/envs/alphaflow/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1206, in initraise eFile "miniconda3/envs/alphaflow/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 1187, in initrun = wi.init()File "miniconda3/envs/alphaflow/lib/python3.9/site-packages/wandb/sdk/wandb_init.py", line 786, in initraise error
wandb.errors.CommError: It appears that you do not have permission to access the requested resource. Please reach out to the project owner to grant you access. If you have the correct permissions, verify that there are no issues with your networking setup.(Error 403: Forbidden)
原因是 WandB 的登录名称 WANDB_ENTITY
,与 API Key,不匹配,需要重新设置,即可。
os.environ['WANDB_ENTITY'] = "[your name]"
这篇关于Training - 使用 WandB 配置管理模型训练过程的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!