Todo list (week 7)

13 02 2006
1. Work out the complete UCRs for Fugu/Tetradon/Mouse/Salmon(if the genome data is available);
2. Continue the SalmonHox project with Altuna;
3. UCSC training course;
4. Perl learning;
5. Design homepage in UiB;
6. Medicine for MM;
7. Make myself be happy;

* Bad efficiency in the previous weekend.




Totally attracted by the style of CBRG,Oxford University. That would be my template!Today, I got an important modification to the script; when do blastall, we should turn off the filter option. The default setting is to filter the query sequence in DUST with blastn, SEG with others.-F str (bl2seq, blast, blastall, blastpgp, blastcl3, impala, megablast, rpsblast) Filter options for DUST or SEG; defaults to T for bl2seq, blast, blastall, blastcl3, and megablast, and to F for blastpgp, impala, and rpsblast. (from: explaination about Blastall is here, comments:1. Never ignore/give up one error, where maybe hidden a learning chance.2. Par is really a nice guy, calm and modest.

Note — Coordinate system in Genomics1. axt format alignment file (like in UCSC)Example:The following segment from an axt file shows the first 2 sets of alignments of the human assembly (the aligning assembly) to mouse chromsome 19 (the primary assembly).93 chr19 3220979 3221040 chr11 66089412 66089473 – 3301TGGTTTTGTTTTAAGTAAAAAGCAATACAAGCATATTGTGCAAAATTAGAAAGGCAAAAATGTTGTTTTGTTTTTACAAATAAGCGATATAAGCATACGATGCAAAGTTAAAGAGGCACAGATGIf the strand value is "-", the values of the aligning organism\’s start and end fields are relative to the reverse-complemented coordinates of its chromosome.That\’s to say, on the minus strand of chr11(human), from 66089412 to 66089473 is the sequence:TTGTTTTGTTTTTACAAATAAGCGATATAAGCATACGATGCAAAGTTAAAGAGGCACAGATG. This coordinate is referred to the minus strand(also from 5\’ to 3\’, and the coordinate of 5\’ end is 1)To check this, we convert the coordinate of minus strand into the plus strand, and then get the sequence by Genome Browser.First, get the length of human chr11,mysql> select max(end) from GENOME_chr11;+———–+| max(end)  |+———–+| 134452384 |+———–+then, -(66089412 66089473) ======> +(134452384 – 66089473 + 1 , 134452384 – 66089412 + 1)that\’s +(68362912 , 68362973)>hg17_dna range=chr11:68362912-68362973 5\’pad=0 3\’pad=0 revComp=FALSE strand=? repeatMasking=lowerCATCTGTGCCTCTTTAACTTTGCATCGTATGCTTATATCGCTTATTTGTAAAAACAAAACAAThis is the coresponding sequence in plus strand. If we take its recComplementary one, we could see that>hg17_dna range=chr11:68362912-68362973 5\’pad=0 3\’pad=0 revComp=TRUE strand=? repeatMasking=lowerTTGTTTTGTTTTTACAAATAAGCGATATAAGCATACGATGCAAAGTTAAAGAGGCACAGATGWhich is totally same as the one in the alignment file.In conclusion, the coordinate of axt file is based on the strand; for plus strand, the coordinate 1 is the first base in plus strand (5\’ of the plus strand), for minus strand, the coordinate 1 is also the first base of minus strand(5\’ of the minus strand). Importantly, it\’s exactly the complementary base of the 3\’ of the plus strand.2. genome fasta filewe often download the genome data with fasta format in UCSC, NCBI etc. In our group, we use Par\’s AT modul to convert it into MySQL table, like the following wayperl ~/lib/perl/AT/SCRIPTS/ -u xianjund -d XJD_FR_AUG02 chrUn.faWe often use that table to retrieve sequence. Par had offered lots of convenient interfaces based on the data. What need our attendtion is, the sequence in the table is coordinated on the plus strand (from 5\’ to 3\’, the first  of plus strand).For example, mysql> select fragment_id,chr,start,end from GENOME_chr11 limit 10;+————-+——-+——–+——–+| fragment_id | chr   | start  | end    |+————-+——-+——–+——–+|           1 | chr11 |      1 |  50000 | This is the sequence from 1-50000 in plus strand.

The last one is the most important! happy Valentine\’s Day!

