加入收藏 | 设为首页 | 会员中心 | 我要投稿 财气旺网 - 财气网 (https://www.caiqiwang.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

基因数据处理48之ART使用实例

发布时间:2021-03-11 02:14:22 所属栏目:大数据 来源:网络整理
导读:相关参数请见上一篇 1.使用实例1: hadoop @Master :~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem $ art_illumina -ss HS20 -i GRCH38chr1L3556522 .fna -l 100 -f 20 -o G38L100F20Nhs20 ==================== ART ==================== ART_Illumina ( 200

4.生成一条数据:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17,2016)       
     Contact: Weichun Huang <whduke@gmail.com> 
    -------------------------------------------

                  Single-end Simulation

Total CPU time used: 15.82

The random seed for the run: 1464918910

Parameters used during run
    Read Length:    100
    Genome masking 'N' cutoff frequency:    1 in 100
    Fold Coverage:            0X
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2000 Length 100 R1 (built-in profile) 

Output files

  FASTQ Sequence File:
    G38L100c1Nhs20.fq

  ALN Alignment File:
    G38L100c1Nhs20.aln

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.
cat: G38L100c1Nhs20.: No such file or directory
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.fq
@chr1-1
CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT
+
@C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)<D?HCHBFAE?GEDC5D5ECD<CD<DBADDBE hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20. G38L100c1Nhs20.aln G38L100c1Nhs20.fq hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20. G38L100c1Nhs20.aln G38L100c1Nhs20.fq hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.aln ##ART_Illumina read_length 100 @CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20 -rs 1464918910 @SQ chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 248956422 ##Header End >chr1 chr1-1 225496693 + CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT

5.使用bwa验证:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.sam
@SQ SN:chr1 LN:248956422
@PG ID:bwa  PN:bwa  VN:0.7.13-r1126 CL:bwa samse GRCH38chr1L3556522.fna G38L100c1Nhs20.sai G38L100c1Nhs20.fq
chr1-1  0   chr1    225496694   37  100M    *   0   0   CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT    @C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)<D?HCHBFAE?GEDC5D5ECD<CD<DBADDBE XT:A:U NM:i:1 X0:i:1 X1:i:0 XM:i:1 XO:i:0 XG:i:0 MD:Z:61A38 hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.aln ##ART_Illumina read_length 100 @CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20 -rs 1464918910 @SQ chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 248956422 ##Header End >chr1 chr1-1 225496693 + CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT 

可以发现art产生的数据是从位置0开始,跟Adam一致,bwa是从一开始
如何自动判断bwa等算法的准确率?

6.用snap验证:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.snap.sam 
@HD VN:1.4  SO:unsorted
@RG ID:FASTQ    PL:Illumina PU:pu   LB:lb   SM:sm
@PG ID:SNAP PN:SNAP CL:single index G38L100c1Nhs20.fq -o G38L100c1Nhs20.snap.sam    VN:1.0beta.23
@SQ SN:chr1__AC:CM000663.2__gi:568336023__LN:248956422__rl:Chromosome__M5:6aef897c3d6ff0c78aff06ac189178dd__AS:GRCh38   LN:248956422
chr1-1  0   chr1__AC:CM000663.2__gi:568336023__LN:248956422__rl:Chromosome__M5:6aef897c3d6ff0c78aff06ac189178dd__AS:GRCh38  225496694   70  100M    *   0   0   CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT    @C@D@FFDFHHHHIJ.JBIJJGJGIJ:G47JHJ@IJJ91BJJIGHHHEIJDGD=IJJJBJJ'DG=3D)<D?HCHBFAE?GEDC5D5ECD<CD<DBADDBE PG:Z:SNAP NM:i:1 RG:Z:FASTQ PL:Z:Illumina PU:Z:pu LB:Z:lb SM:Z:sm hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ cat G38L100c1Nhs20.aln ##ART_Illumina read_length 100 @CM art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 1 -o G38L100c1Nhs20 -rs 1464918910 @SQ chr1 AC:CM000663.2 gi:568336023 LN:248956422 rl:Chromosome M5:6aef897c3d6ff0c78aff06ac189178dd AS:GRCh38 248956422 ##Header End >chr1 chr1-1 225496693 + CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGAAAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT CATATTTACCAATTAAAGTCACAAAATATTTCTCATTATTTATTCATGCAGGTAACTGAGACAAAGATAGTGCAGAAATCAACTTTAAATAAAAAATTAT

(编辑:财气旺网 - 财气网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!