Feeds:
Posts
Comments

Archive for July, 2012

echo “1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y M”|awk ‘{for(i=1;i<=25;i++)print “samtools view -h filename.bam|awk \x27{if($3 == \x22″”chr”$i”\x22 || $0 ~ \x22^@\x22)print}\x27|samtools view -bS – -o chr”$i”.bam”}’

note:
\x27 is the ascii of single quote (‘)

\x22 is the ascii of double quote(“)

Read Full Post »

1. ####ERROR MESSAGE: SAM/BAM file xxx.bam.bam is malformed: SAM file doesn’t have any read groups defined in the header. The GATK no longer supports SAM files without read groups

solution:

specify -r “@RG\tID:sample\tLB:sample\tPL:ILLUMINA\tSM:sample” when running bwa alignment

or  use picard (AddOrReplaceReadGroups.jar) to add read group such as "ID:1    PL:illumina     PU:NA   LB:BC   SM:140"
2.##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.
##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
Solution:
use picard (ReorderSam.jar) to reorder the bam file, however, this does not reorder the chromosome order in the bam header (The picard version I used was picard-tools-1.72), to solve this problem, run picard ReorderSam.jar first, and then run samtools redeader to replace the bam header (karyotypically ordered).
samtools redeader header.sam in.sam
Notice: the content in header.sam is tab separated
cat header.sam

@HD VN:1.0 SO:coordinate
@SQ SN:chr1 LN:249250621
@SQ SN:chr10 LN:135534747
@SQ SN:chr11 LN:135006516
@SQ SN:chr12 LN:133851895
@SQ SN:chr13 LN:115169878
@SQ SN:chr14 LN:107349540
@SQ SN:chr15 LN:102531392
@SQ SN:chr16 LN:90354753
@SQ SN:chr17 LN:81195210
@SQ SN:chr18 LN:78077248
@SQ SN:chr19 LN:59128983
@SQ SN:chr2 LN:243199373
@SQ SN:chr20 LN:63025520
@SQ SN:chr21 LN:48129895
@SQ SN:chr22 LN:51304566
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
@SQ SN:chr8 LN:146364022
@SQ SN:chr9 LN:141213431
@SQ SN:chrM LN:16571
@SQ SN:chrX LN:155270560
@SQ SN:chrY LN:59373566
@RG ID:1 PL:illumina PU:NA LB:BC SM:140




Read Full Post »

x ~ y
True if the string x matches the regexp denoted by y.

For example, there is the following text in a file, and I want to get the lines with exactly two “PASS”

1       2339427 .       C       T       .       .        1/1:.:PASS     1/1:.:PASS:.
1       2339428 .       G       T       .       .        1/1:.:PASS     1/1:.:. :.
1       2339428 .       A       C       .       .        1/1:.:PASS     1/1:.:.PASS:.

Here is the command line:
more fileName.txt|awk ‘{for(i=1;i<=NF;i++){if($i ~ “PASS”){A++}} {if(A==2)print $0}A=0}’

 

Read Full Post »