本文主要是介绍habitat challenge rearrangement代码复现细节及踩坑实录,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!
具身智能移动操作
Habitat-Challenge是2022年Meta发起的具身智能挑战赛之一,主要是重拍任务。具体细节可以参见以下两篇论文:
1、Habitat 2.0: Training Home Assistants to Rearrange their Habitat,这篇论文中提出了任务细节,以及对应的Baseline方法MonolithicRL和TP-SRL,其中MonolithicRL是采用端到端RL的方法,TP-SRL是采用分层的方法,上层任务规划下层子技能;
对应github官网
2、Multi-skill mobile manipulation for object rearrangement,这篇论文是目前成功率最高的方法,后续简称M3;
对应gibhub官网
具体实现细节参照论文后续只描述代码复现过程中遇到的一些坑,可能可以帮助后续学者节省时间。
环境安装:
1.安装habitat-sim:
如果直接采用官网给的conda install habitat-sim withbullet -c conda-forge -c aihabitat命令,很有可能由于网络问题导致配置失败。
有两种替代的安装方式:
方式一:直接去Habitat-sim Conda官网下载对应的包。
方式二:可以直接下载对应的Habitat-sim包,采用如下命令安装:
cd habitat-sim
pip install -r requirements.txt
python setup.py install --bullet --headless
cd ..
选择Habitat-sim时需要注意一是要与Habitat的版本相匹配。一般要选择withbullet版本,而headless参数取决于是否需要显示,如没有显示器可以安装headless的版本。最好根据github界面中对应的readme指示来,如withbullet和headless就要下载conda对应的版本。
2.安装Habitat-lab
这里需要特别注意的是因为habitat-lab不是一个库,所以一个conda环境可能就对应了一个habitat-lab环境。直接在安装包里下载即可。
git clone --branch stable https://github.com/facebookresearch/habitat-lab.git
cd habitat-lab
pip install -e habitat-lab # install habitat_lab
或者
python -m pip install -e .
3.安装成功结果:
可以看到二者对应的版本其实是不一样的,
我这里hab-mm对应的是M3的conda环境,对应的habitat和habitat-sim版本都是0.2.1;
而在habitat对应的是habitat-challenge官方环境,对应的habitat和habitat-sim版本都是0.2.2;
habitat仿真器对于环境要求较为严格,因此如果不对应可能会出现意向不到的错误。
habitat-challenge仿真踩坑
安装环境后可能出现的问题:
安装环境时可能出现的小问题:
OSError: /home/lu/.conda/envs/habitat/lib/python3.7/site-packages/nvidia/cublas/lib/libcublas.so.11: undefined symbol: cublasLtHSHMatmulAlgoInit, version libcublasLt.so.11
需要在~/.bashrc文件里加上一句:
export LD_LIBRARY_PATH=/home/lu/.conda/envs/habitat/lib/python3.7/site-packages/nvidia/cublas/lib/:$LD_LIBRARY_PATH
命令一:执行MonolithicRL时:
执行命令:
#/bin/bashexport MAGNUM_LOG=quiet
export HABITAT_SIM_LOG=quietset -x
python habitat-lab/habitat_baselines/run.py \--exp-config configs/methods/ddppo_monolithic.yaml \--run-type train \BASE_TASK_CONFIG_PATH configs/tasks/rearrange.local.rgbd.yaml \TASK_CONFIG.DATASET.SPLIT 'train' \TASK_CONFIG.TASK.TASK_SPEC_BASE_PATH configs/pddl/ \TENSORBOARD_DIR tb \CHECKPOINT_FOLDER checkpoints \LOG_FILE train.log
问题一:提示Not a gzipped file:
检查路径是否有问题:
因为对应了pointnav_dataset.py函数中,
datasetfile_path = config.DATA_PATH.format(split=config.SPLIT)
with gzip.open(datasetfile_path, "rt") as f:self.from_json(f.read(), scenes_dir=config.SCENES_DIR)
问题二:在训练过程中总报错EOFError:
Traceback (most recent call last):File "habitat-lab/habitat_baselines/run.py", line 81, in <module>main()File "habitat-lab/habitat_baselines/run.py", line 40, in mainrun_exp(**vars(args))File "habitat-lab/habitat_baselines/run.py", line 77, in run_expexecute_exp(config, run_type)File "habitat-lab/habitat_baselines/run.py", line 60, in execute_exptrainer.train()File "/home/lu/.conda/envs/habitat/lib/python3.7/contextlib.py", line 74, in innerreturn func(*args, **kwds)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 715, in trainself._init_train()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 254, in _init_trainself._init_envs()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/rl/ppo/ppo_trainer.py", line 204, in _init_envsworkers_ignore_signals=is_slurm_batch_job(),File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat_baselines/common/construct_vector_env.py", line 97, in construct_envsworkers_ignore_signals=workers_ignore_signals,File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 200, in __init__read_fn() for read_fn in self._connection_read_fnsFile "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 200, in <listcomp>read_fn() for read_fn in self._connection_read_fnsFile "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 103, in __call__res = self.read_fn()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recvbuf = self.recv_bytes()File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytesbuf = self._recv_bytes(maxlength)File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytesbuf = self._recv(4)File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 379, in _recvchunk = read(handle, remaining)
ConnectionResetError: [Errno 104] Connection reset by peer
Exception ignored in: <function VectorEnv.__del__ at 0x7fafedb180e0>
Traceback (most recent call last):File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 584, in __del__self.close()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 452, in closeread_fn()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 103, in __call__res = self.read_fn()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/pickle5_multiprocessing.py", line 68, in recvbuf = self.recv_bytes()File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 216, in recv_bytesbuf = self._recv_bytes(maxlength)File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 407, in _recv_bytesbuf = self._recv(4)File "/home/lu/.conda/envs/habitat/lib/python3.7/multiprocessing/connection.py", line 383, in _recvraise EOFError
EOFError:
在Github上读到:
可能是由于GPU训练不了,可以修改:
habitat-challenge/habitat-lab/habitat_baselines/common/construct_vector_env.py文件
分析中的74行可以看到这里做了一个判断:
if int(os.environ.get("HABITAT_ENV_DEBUG", 0)):logger.warn("Using the debug Vector environment interface. Expect slower performance.")vector_env_cls = ThreadedVectorEnvelse:vector_env_cls = VectorEnvenvs = vector_env_cls(make_env_fn=make_gym_from_config,env_fn_args=tuple((c,) for c in configs),workers_ignore_signals=workers_ignore_signals,)
因为VectorEnv不是所有gpu都带得动,直接把vector_env_cls强行指定为ThreadedVectorEnv就好。
envs = ThreadedVectorEnv(make_env_fn=make_gym_from_config,env_fn_args=tuple((c,) for c in configs),workers_ignore_signals=workers_ignore_signals,)
具体原因可以看官网给出的解释:
Debugging an environment issue
Our vectorized environments are very fast, but they are not very verbose. When using VectorEnv
some errors may be silenced, resulting in process hanging or multiprocessing errors that are hard to interpret. We recommend setting the environment variable HABITAT_ENV_DEBUG
to 1 when debugging (export HABITAT_ENV_DEBUG=1
) as this will use the slower, but more verbose ThreadedVectorEnv
class. Do not forget to reset HABITAT_ENV_DEBUG
(unset HABITAT_ENV_DEBUG
) when you are done debugging since VectorEnv
is much faster than ThreadedVectorEnv
.
且可以看habitat.core.vector_env:
命令二:分层强化学习代码(TP-SRL):
问题一:无法找到路径
执行命令该命令需要在habitat-lab文件夹下执行,否则需要修改对应的.yaml文件:
python habitat_baselines/run.py \--exp-config habitat-lab/habitat_baselines/config/rearrange/ddppo_open_cab.yaml \--run-type train \TENSORBOARD_DIR ../pick_tb/ \CHECKPOINT_FOLDER ../pick_checkpoints/ \LOG_FILE ../pick_train.log
因为它给的config都是相对路径
比如上面我要运行habitat-lab/habitat_baselines/config/rearrange/ddppo_open_cab.yaml文件我就需要修改BASE_TASK_CONFIG_PATH部分,将其修改为从habitat-challenge下运行的路径。其他yaml文件同理。
如果直接在habitat-lab文件下执行也需要注意,需要创建一个执行数据的软链接,因为它会直接在该目录下找数据:
ln -s ../data data
问题二:AssertionError: Object attributes not uniquely matched to shortened handle.
这个问题是由于objects/ycb的版本导致的:
Traceback (most recent call last):File "habitat_baselines/run.py", line 81, in <module>
Process ForkServerProcess-26:
Traceback (most recent call last):File "/home/lu/.conda/envs/hab-mm/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrapself.run()File "/home/lu/.conda/envs/hab-mm/lib/python3.7/multiprocessing/process.py", line 99, in runself._target(*self._args, **self._kwargs)File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in innerreturn func(*args, **kwds)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 262, in _worker_envobservations = env.reset()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/gym_env_episode_count_wrapper.py", line 50, in resetreturn self.env.reset(**kwargs)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/gym_env_obs_dict_wrapper.py", line 32, in resetreturn self.env.reset(**kwargs)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/utils/gym_adapter.py", line 287, in resetobs = self._env.reset()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/environments.py", line 47, in resetobservations = super().reset()File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in innerreturn func(*args, **kwds)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 402, in resetreturn self._env.reset()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 250, in resetself.reconfigure(self._config)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 336, in reconfigureself._sim.reconfigure(self._config.SIMULATOR)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 223, in reconfigureself._add_objs(ep_info, should_add_objects)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/tasks/rearrange/rearrange_sim.py", line 409, in _add_objs), f"Object attributes not uniquely matched to shortened handle. '{obj_handle}' matched to {matching_templates}. TODO: relative paths as handles should fix some duplicates. For now, try renaming objects to avoid collision."
AssertionError: Object attributes not uniquely matched to shortened handle. '005_tomato_soup_can.object_config.json' matched to {}. TODO: relative paths as handles should fix some duplicates. For now, try renaming objects to avoid collision.
在pick.yaml文件中:
ADDITIONAL_OBJECT_PATHS:
- "data/objects/ycb/configs/"
而存在两个ycb,ycb_1.1和ycb_1.2,其中ycb_1.1中没有configs的文件夹,在ycb_1.2中有。可以看到在data/versioned_data文件夹下有两个版本的ycb:
因此解决这个错误只需要链接正确的ycb到objects目录下:
cd objects
ln -s ../versioned_data/ycb_1.2 ycb
问题三:
这就是纯粹gpu带不起:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 7.77 GiB total capacity; 5.21 GiB already allocated; 191.38 MiB free; 5.22 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
可以试一试修改参数:
可以修改habitat_baselines/config/rearrange/ddppo_pick.yaml中的NUM_ENVIRONMENTS参数,原本是32改成了16可能可以训练。
M3仿真踩坑
M3中相对问题较少,基本上安装就能使用。
问题一:EOF问题
这个问题和Habitat-challenge中出现问题的原因如出一辙,几乎一样。只是在代码中需要修改的位置不一样。
需要修改mobile_manipulation/utils//env_utils.py中的文件:
直接把它原本的代码注释,换成vec_env_cls = ThreadedVectorEnv,强制指定环境为ThreadedVectorEnv即可。
#vec_env_cls = ThreadedVectorEnv if debug else VectorEnvvec_env_cls = ThreadedVectorEnvenvs = vec_env_cls(make_env_fn=make_env_fn,env_fn_args=tuple(zip(configs, env_classes, [wrappers] * num_envs)),workers_ignore_signals=workers_ignore_signals,auto_reset_done=auto_reset_done,)
问题二:ycb的问题
Exception in thread Thread-26:
Traceback (most recent call last):File "/home/lu/.conda/envs/hab-mm/lib/python3.7/threading.py", line 926, in _bootstrap_innerself.run()File "/home/lu/.conda/envs/hab-mm/lib/python3.7/threading.py", line 870, in runself._target(*self._args, **self._kwargs)File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in innerreturn func(*args, **kwds)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/vector_env.py", line 262, in _worker_envobservations = env.reset()File "/home/lu/.conda/envs/hab-mm/lib/python3.7/site-packages/gym/core.py", line 337, in resetreturn self.env.reset(**kwargs)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/env.py", line 34, in resetobservations = super().reset()File "/home/lu/.conda/envs/hab-mm/lib/python3.7/contextlib.py", line 74, in innerreturn func(*args, **kwds)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 405, in resetreturn self._env.reset()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 253, in resetself.reconfigure(self._config)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 339, in reconfigureself._sim.reconfigure(self._config.SIMULATOR)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 165, in reconfigureself._add_rigid_objects()File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 190, in _add_rigid_objectsobj.transformation = mn_utils.orthogonalize(T)
AttributeError: 'NoneType' object has no attribute 'transformation'
这里要特别注意M3采用的是ycb1.1而非habitat-challenge中的1.2,所以在跑M3的使用一定要用1.1的版本。否则会出现找不到数据的错误。
cd objects
rm ycb
ln -s ../versioned_data/ycb_1.1 ycb
问题三:下载数据集
下载benchmark数据。
可以参考datasets_download.py文件中有写对应文件的link和version。
突然出现错误:
python -m habitat_sim.utils.datasets_download --uids hab2_bench_assets --data-path <path to download folder>
(hab-mm) lu@lu:~/Desktop/embodied_ai/hab-mobile-manipulation$ python habitat_extensions/tasks/rearrange/play.py
pybullet build time: Sep 22 2020 00:55:20
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/play.yaml
Merging /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml into /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/play.yaml
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml
Merging /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/__base__.py into /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/base.yaml
Loaded /home/lu/Desktop/embodied_ai/hab-mobile-manipulation/configs/rearrange/tasks/__base__.py
2023-09-20 17:46:41,099 Initializing dataset RearrangeDataset-v0
2023-09-20 17:46:41,917 initializing sim RearrangeSim-v0
Traceback (most recent call last):File "habitat_extensions/tasks/rearrange/play.py", line 271, in <module>main()File "habitat_extensions/tasks/rearrange/play.py", line 221, in mainenv: RearrangeRLEnv = env_cls(config)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/env.py", line 31, in __init__super().__init__(self._core_env_config, dataset=dataset)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 374, in __init__self._env = Env(config, dataset)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/core/env.py", line 105, in __init__id_sim=self._config.SIMULATOR.TYPE, config=self._config.SIMULATORFile "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/sims/registration.py", line 19, in make_simreturn _sim(**kwargs)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat_extensions/tasks/rearrange/sim.py", line 63, in __init__super().__init__(config)File "/home/lu/Desktop/embodied_ai/hab-mobile-manipulation/habitat-lab/habitat/sims/habitat_simulator/habitat_simulator.py", line 282, in __init__for path in self.habitat_config.ADDITIONAL_OBJECT_PATHS:File "/home/lu/.conda/envs/hab-mm/lib/python3.7/site-packages/yacs/config.py", line 141, in __getattr__raise AttributeError(name)
AttributeError: ADDITIONAL_OBJECT_PATHS
是因为版本问题,只能用它自带的版本,不能用habitat-challenge中的版本。
有其他问题欢迎一起交流学习!
这篇关于habitat challenge rearrangement代码复现细节及踩坑实录的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!