加入收藏 | 设为首页 | 会员中心 | 我要投稿 财气旺网 - 财气网 (https://www.caiqiwang.com/)- 科技、建站、经验、云计算、5G、大数据,站长网!
当前位置: 首页 > 大数据 > 正文

基因数据处理48之ART使用实例

发布时间:2021-03-11 02:14:22 所属栏目:大数据 来源:网络整理
导读:相关参数请见上一篇 1.使用实例1: hadoop @Master :~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem $ art_illumina -ss HS20 -i GRCH38chr1L3556522 .fna -l 100 -f 20 -o G38L100F20Nhs20 ==================== ART ==================== ART_Illumina ( 200
副标题[/!--empirenews.page--]

相关参数请见上一篇

1.使用实例1:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -f 20 -o G38L100F20Nhs20

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17,2016)       
     Contact: Weichun Huang <whduke@gmail.com> 
    -------------------------------------------

                  Single-end Simulation

Total CPU time used: 1162.71

The random seed for the run: 1464879720

Parameters used during run
    Read Length:    100
    Genome masking 'N' cutoff frequency:    1 in 100
    Fold Coverage:            20X
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2000 Length 100 R1 (built-in profile) 

Output files

  FASTQ Sequence File:
    G38L100F20Nhs20.fq

  ALN Alignment File:
    G38L100F20Nhs20.aln

2.使用实例2:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS25 -sam -i GRCH38chr1L3556522.fna -p -l 150 -f 20 -m 200 -s 10 -o paired_dat

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17,2016)       
     Contact: Weichun Huang <whduke@gmail.com> 
    -------------------------------------------

                  Paired-end sequencing simulation

Total CPU time used: 1070.33

The random seed for the run: 1464880583

Parameters used during run
    Read Length:    150
    Genome masking 'N' cutoff frequency:    1 in 150
    Fold Coverage:            20X
    Mean Fragment Length:     200
    Standard Deviation:       10
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2500 Length 150 R1 (built-in profile) 
    First Read:   HiSeq 2500 Length 150 R2 (built-in profile) 

Output files

  FASTQ Sequence Files:
     the 1st reads: paired_dat1.fq
     the 2nd reads: paired_dat2.fq

  ALN Alignment Files:
     the 1st reads: paired_dat1.aln
     the 2nd reads: paired_dat2.aln

  SAM Alignment File:
    paired_dat.sam

查看文件:

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ ll -h
total 50G
drwxrwxr-x 2 hadoop hadoop 4.0K  6月  2 23:16 ./
drwxrwxr-x 6 hadoop hadoop 4.0K  6月  2 22:59 ../
-rw-rw-r-- 1 hadoop hadoop  11G  6月  2 23:29 G38L100F20Nhs20.aln
-rw-rw-r-- 1 hadoop hadoop 9.4G  6月  2 23:29 G38L100F20Nhs20.fq
-rw-r--r-- 1 hadoop hadoop 241M  6月  2 23:00 GRCH38chr1L3556522.fna
-rw-rw-r-- 1 hadoop hadoop 2.5K  6月  2 23:09 GRCH38chr1L3556522.fna.amb
-rw-rw-r-- 1 hadoop hadoop  144  6月  2 23:09 GRCH38chr1L3556522.fna.ann
-rw-rw-r-- 1 hadoop hadoop 238M  6月  2 23:09 GRCH38chr1L3556522.fna.bwt
-rw-rw-r-- 1 hadoop hadoop  60M  6月  2 23:09 GRCH38chr1L3556522.fna.pac
-rw-rw-r-- 1 hadoop hadoop 119M  6月  2 23:10 GRCH38chr1L3556522.fna.sa
-rw-rw-r-- 1 hadoop hadoop 4.9G  6月  2 23:42 paired_dat1.aln
-rw-rw-r-- 1 hadoop hadoop 4.6G  6月  2 23:42 paired_dat1.fq
-rw-rw-r-- 1 hadoop hadoop 4.8G  6月  2 23:42 paired_dat2.aln
-rw-rw-r-- 1 hadoop hadoop 4.6G  6月  2 23:42 paired_dat2.fq
-rw-rw-r-- 1 hadoop hadoop  11G  6月  2 23:42 paired_dat.sam

生成文件都好大

3.制定每条序列产生的reads数: (产生的数据变小了)

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ art_illumina -ss HS20 -i GRCH38chr1L3556522.fna -l 100 -c 50 -o G38L100c50Nhs20

    ====================ART====================
             ART_Illumina (2008-2016)          
          Q Version 2.5.1 (Apr 17,2016)       
     Contact: Weichun Huang <whduke@gmail.com> 
    -------------------------------------------

                  Single-end Simulation

Total CPU time used: 15.96

The random seed for the run: 1464918709

Parameters used during run
    Read Length:    100
    Genome masking 'N' cutoff frequency:    1 in 100
    Fold Coverage:            0X
    Profile Type:             Combined
    ID Tag:                   

Quality Profile(s)
    First Read:   HiSeq 2000 Length 100 R1 (built-in profile) 

Output files

  FASTQ Sequence File:
    G38L100c50Nhs20.fq

  ALN Alignment File:
    G38L100c50Nhs20.aln

hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ ls
G38L100c50Nhs20.aln  G38L100F20Nhs20.aln  GRCH38chr1L3556522.fna      GRCH38chr1L3556522.fna.ann  GRCH38chr1L3556522.fna.pac  paired_dat1.aln  paired_dat2.aln  paired_dat.sam
G38L100c50Nhs20.fq   G38L100F20Nhs20.fq   GRCH38chr1L3556522.fna.amb  GRCH38chr1L3556522.fna.bwt  GRCH38chr1L3556522.fna.sa   paired_dat1.fq   paired_dat2.fq
hadoop@Master:~/cloud/adam/xubo/data/GRCH38Sub/cs-bwamem$ ll
total 51506772
drwxrwxr-x 2 hadoop hadoop        4096  6月  3 09:51 ./
drwxrwxr-x 6 hadoop hadoop        4096  6月  2 22:59 ../
-rw-rw-r-- 1 hadoop hadoop       11400  6月  3 09:52 G38L100c50Nhs20.aln
-rw-rw-r-- 1 hadoop hadoop       10428  6月  3 09:52 G38L100c50Nhs20.fq

(编辑:财气旺网 - 财气网)

【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!