bioinformatics小技巧

本文主要是介绍bioinformatics小技巧，希望对大家解决编程问题提供一定的参考价值，需要的开发者们随着小编来一起学习吧！

文章目录

- 1. 软件安装
- - - 1.1 linux上python2的安装
    - 1.2 Mercurial 安装及使用
    - 1.3 tRNAscan的安装和使用
    - 1.4 Linux上安装miniconda
- 2.数据下载
- - - 2.1 linux上通过ftp下载一个文件夹下的全部文件
    - 2.2 GEO数据库数据下载
- 3.操作系统
- - - 3.1 Windows下将R设置为环境变量。
    - 3.2 Linux 下怎样快速查看一个超大文件夹的文件总大小？
    - 3.3 调节VNC Viewer的分辨率
    - 3.4 服务器上的jupyter notebook找不到服务器怎么办？
    - 3.5 服务器上如何使用matlab的可视化界面？
- 4. Perl
- - - 4.1 查看perl中已安装的包：
- 5. Python
- - - 5.1 ImportError: /lib64/libm.so.6: version `GLIBC_2.23' not found
    - 5.2 argparse模块用法实例详解
- 6. 宏基因组分析流程技巧
- - - 6.1 MetaPhlAn2数据库安装
    - 6.2 metagenemark的使用
    - 6.3 微生物组与PCA
    - 6.4 微生物组lefse分析
- 7. R语言使用技巧
- - - 7.1 根据两列来合并表格
    - 7.2 R语言绘图
    - 7.3 网络构建与分析初探
- 8. 序列分析技巧
- - - 8.1 序列平均长度和长度分布统计

1. 软件安装

1.1 linux上python2的安装

Installing a custom version of Python 2：
https://help.dreamhost.com/hc/en-us/articles/115000218612-Installing-a-custom-version-of-Python-2

1.2 Mercurial 安装及使用

参考：https://blog.csdn.net/moonspiritacm/article/details/80863421

1.3 tRNAscan的安装和使用

参考：https://www.plob.org/article/7905.html

1.4 Linux上安装miniconda

参考：https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html

2.数据下载

2.1 linux上通过ftp下载一个文件夹下的全部文件

How to recursively download a folder via FTP on Linux [closed]
https://stackoverflow.com/questions/113886/how-to-recursively-download-a-folder-via-ftp-on-linux
代码为：

wget -r -nH --cut-dirs=5 -nc ftp://user:pass@server//absolute/path/to/directory

测试案例：

wget -r -nH --cut-dirs=5 -nc ftp://ftp.ebi.ac.uk/pub/databases/chembl/KinaseSARfari/latest/

2.2 GEO数据库数据下载

GEO 数据介绍及在线下载：https://www.jianshu.com/p/74d570cb8c29
Download Geo Tar File Automatically From Linux/Unix：https://www.biostars.org/p/61329/

3.操作系统

3.1 Windows下将R设置为环境变量。

参考：https://stackoverflow.com/questions/47539125/how-to-add-rtools-bin-to-the-system-path-in-r

library(devtools)
Sys.setenv(PATH = paste("F:/software/R-3.6.1/bin", Sys.getenv("PATH"), sep=";"))
Sys.setenv(BINPREF = "F:/software/R-3.6.1/mingw_$(WIN)/bin/")

3.2 Linux 下怎样快速查看一个超大文件夹的文件总大小？

参考：https://www.v2ex.com/t/515218

 du -h --max-depth=1

3.3 调节VNC Viewer的分辨率

xrandr -s 1360x768

3.4 服务器上的jupyter notebook找不到服务器怎么办？

ifconfig

输出结果中包含：inet 192.168.1.2 netmask 255.255.255.0 broadcast 192.168.1.255

jupyter notebook --no-browser --ip=192.168.1.2

3.5 服务器上如何使用matlab的可视化界面？

ssh -X node02  #以可视化界面的方式连接到node02
matlab         #启动matlab

4. Perl

4.1 查看perl中已安装的包：

find `perl -e 'print "@INC"'` -name '*.pm' -print

5. Python

5.1 ImportError: /lib64/libm.so.6: version `GLIBC_2.23’ not found

conda create -n tf-cpu tensorflow

安装了python3.6.10和tensorflow-base-2.2.

5.2 argparse模块用法实例详解

参考：https://zhuanlan.zhihu.com/p/56922793

6. 宏基因组分析流程技巧

6.1 MetaPhlAn2数据库安装

参考：https://groups.google.com/g/metaphlan-users/c/7TfY_h-SELQ

#下载数据库（已下载，位于/home1/jialh/tools/metaphlan2/metaphlan2/metaphlan2_databases/mpa_v20_m200.tar）
https://bitbucket.org/biobakery/metaphlan2/downloads/
#解压
tar -xvf mpa_v20_m200.tar
#再解压
bzip2 -dk mpa_v20_m200.fna.bz2
#建立bowtie2的索引
bowtie2-build --threads 4 mpa_v20_m200.fna mpa_v20_m200

6.2 metagenemark的使用

参考：
（1）metagenemark（注意-m后面有分隔）：https://www.jianshu.com/p/f9b085e30d94
（2）MetaGeneMark秘钥更新：https://www.jianshu.com/p/bff284d04c3e

6.3 微生物组与PCA

参考：223.主成分分析PCA
https://blog.csdn.net/woodcorpse/article/details/106866501

6.4 微生物组lefse分析

参考：https://github.com/biobakery/biobakery/wiki/lefse
注意事项：
（1）lefse适用于python 2.7环境。
（2）部分代码名称发生变化，如下图所示：
在这里插入图片描述
工作目录：/home1/jialh/mNetwork/MNDnetwork/PRJEB17784/lefse

lefse-format_input.py 03biom_transform.txt 03biom_transform.in -c 2 -s -1 -u 1 -o 1000000run_lefse.py -l 3 03biom_transform.in 03biom_transform.reslefse-plot_res.py --dpi 300 --feature_font_size 12 03biom_transform.res 03biom_transform.png

注意可能的报错：

AttributeError: Unknown property axis_bgcolor

原因：matplotlib == 2.2.0 起把部分功能函数移除了，我们需要回退 matplotlib 版本。
解决办法： pip install matplotlib==1.5
参考：https://www.yuque.com/shenweiyan/cookbook/kefse-install

7. R语言使用技巧

重要学习资源：
（1）Data Analysis（R/Python/数据分析）：https://www.zhihu.com/column/Data-AnalysisR
（2）R语言中文社区：https://www.zhihu.com/column/Ryuyanshequ
（3）林茂廷老師《ggplot2 介紹》： https://bookdown.org/tpemartin/minicourse_ggplot2/#section-1.1

7.1 根据两列来合并表格

参考：https://stackoverflow.com/questions/6709151/how-do-i-combine-two-data-frames-based-on-two-columns

7.2 R语言绘图

（1）ggraph的使用

一文读懂 ggraph 的使用： https://r.bio-spring.info/2019/12/04/ggraph-manual/
ggraph画网络图： https://www.shenxt.info/zh/post/2019-11-27-r-ggraph/

（2）分层边缘捆绑标签图增加标签（Add labels to Hierarchical Edge Bundling）
参考：

分层边聚合图：https://www.r-graph-gallery.com/311-add-labels-to-hierarchical-edge-bundling.html
R数据可视化21:Edge Bundling图： https://www.jianshu.com/p/3990496e7e47

7.3 网络构建与分析初探

（1）Correlation matrix : R function to do all you need： http://www.sthda.com/english/wiki/wiki.php?id_contents=7572
（2）将edge lists转化为有权的邻接矩阵：https://stackoverflow.com/questions/16584948
（3）Correlation between OTUs with SparCC： https://rachaellappan.github.io/16S-analysis/correlation-between-otus-with-sparcc.html

8. 序列分析技巧

8.1 序列平均长度和长度分布统计

参考：https://bioinformatics.stackexchange.com/questions/4911/calculating-read-average-length-in-a-fastq-file-with-bioawk-awk/4918

awk '{if(NR%4==2) {count++; bases += length} } END{print bases/count}' <fastq_file>

参考：https://www.biostars.org/p/72433/

awk 'NR%4 == 2 {lengths[length($0)]++} END {for (l in lengths) {print l, lengths[l]}}' file.fastq

这篇关于bioinformatics小技巧的文章就介绍到这儿，希望我们推荐的文章对编程师们有所帮助！