Bacterial Viral Parasitic Host Genomics Crash Course in Omics Terminology and Concepts Jessica Kissinger June 4, 2007 Microarray Proteomic Genetics Immunology Genome assembly 5X random genome shotgun Library insert size Mated end pairs Contigs Scaffolds Shotgun DNA Sequencing (Technology) DNA target sample Primer End Reads (Mates) 550bp SHEAR SEQUENCE SIZE SELECT LIGATE & CLONE Vector e.g., 10Kbp ± 8% std.dev. Whole Genome Shotgun Sequencing Collect 10x sequence in a 1-to-1 ratio of two types of read pairs: Short Long ~ 70million reads for Human. 2Kbp 10Kbp Collect another 20X in clone coverage of 50Kbp end sequence pairs: ~ 1.2million pairs for Human. Early simulations showed that if repeats were considered black boxes, one could still cover 99.7% of the genome unambiguously. Clone-by-Clone Genome Sequencing Physical Mapping Shotgun Assembly Target Minimum Tiling Set (~33,000 BACs for human) BAC 5 BAC 3 2 separate processes Physical map of clones sequencing libraries must be made for every clone assembly problem easy and well understood 1
Pairs Give Order & Orientation Anatomy of a WGS Assembly Assembly without pairs results in contigs whose order and orientation are not known. Contig? Pairs, especially groups of corroborating ones, link the contigs into scaffolds where the size of gaps is well characterized. Consensus (15-30Kbp) Reads 2-pair Mean & Std.Dev. is known Read pair (mates) STS STS-mapped Scaffolds Contig Gap (mean & std. dev. Known) Consensus Chromosome Reads (of several haplotypes) Scaffold SNPs External Reads An Assembly of reads Reads and their trace files AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTGGTCACGTGT AACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACCAGCGAGGCGCTCGCTTTG CTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGAAACTGGCGGTCACAGGCACCAGATAAC GCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTCTTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGC GCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCATGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCG TTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTCAGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCC ACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTGTATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCG CACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGGACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTC AAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCGCAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGA AAATGACGCCATTTTTGGGAAAATCGGGGAACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCAC TTTCCGTCTCTCCTCCGTGCTTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGA GCACTGCTCTGAGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATG ACCCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGG GTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAAAGCATGCCTCGAAAGT TTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCTCCAAGAGTGTGGACGCTGTTCCACG TCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCCGGGGGGCACACAACTATCCTCAACTCTCGAACGA ACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGATTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCC GATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACTATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGT TTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATCAGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAG TCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAG AAACGTACTGAAACTGTATACATGTATATACAGATGTATGGATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGC GAGAGGGACGCCCGTTATGCTGTGTGATGTGGCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTC TATTCCCTTCACTGCGAAAGCGCGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCT CTCTTTCCTGTTGGTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCT GTACATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTG TGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACGACTTTGTGGTTCTC GAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAACACGAAGAGAAGGGAAAATGCGGA GAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTGTAGAAATAACTGCGACCCTGGAGACAGAGATG CCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAGTCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGC CTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGTTTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGT CCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTG ATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTG CGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCAAGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTT CTTCAGTCTTTCCATCTTCCCCTGTTACCTCTGTCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGA CAAGCAACTGTCATTTGTACGCGCCTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAA GAGGCGAAGGAACAAAGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTG TTTAGGCGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGTGAA TGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGTAGACTAACGCAC GAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGCGCGCACTGGTTTGCATGTGGC CTGAGAGAGCACAGATCGAAGGTGGGGTGATGTGGCGTCGCTGCAGAGAAACTCCGGCGAAGGCGACAGATAAAGGAGAGTGGAAATCATTGAACAGTGTCGGTCGTCTGTTGTTTCGC AGGGTCCTCGAAGAGTTGCTGTGGTTCATTCGCGGCGACACGAACGCAAACCATCTTTCTGAGAAGGGCGTGAAGGCAAGTCTACGTTGTACCTCTTGTCTCTGCCGAAGCTCAGATGT CTCCACGGCGTTGGTTTCTTTTCGTTTTTGCTTTCGTGGCATTACCATCGAGTCACCACTCATAGTTGCGTGTGTCTACATGTTTTCTAGAACGTCCGTTGTGTTGCCTCGTGGCGACC Six Frame Translation 2
Reading Frames & ORFs 5' 3' atgcccaagctgaatagcgtagaggggttttcatcatttgaggacgatgtataa 1 atg ccc aag ctg aat agc gta gag ggg ttt tca tca ttt gag gac gat gta taa M P K L N S V E G F S S F E D D V * 2 tgc cca agc tga ata gcg tag agg ggt ttt cat cat ttg agg acg atg tat C P S * I A * R G F H H L R T M Y 3 gcc caa gct gaa tag cgt aga ggg gtt ttc atc att tga gga cga tgt ata A Q A E * R R G V F I I * G R C I Display of Met and Stops 500 3 3 2 2 1 1-1 -1-2 -2-3 -3 1000 500 1000 AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTG GTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACC AGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGA AACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTC TTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCA TGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTC AGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTG TATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGG ACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCG CAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGG AACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGC TTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTG AGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGAC CCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCC TGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAA AGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCT CCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCC GGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGA TTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACT ATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATC AGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAA GGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATG GATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTG GCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCG CGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTG GTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTAC ATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGA GTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACG ACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAAC ACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTG TAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAG TCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGT TTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTG CCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTAC CTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCA AGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTG TCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGC CTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAA AGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGG CGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGT GAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGT AGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGC AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTG GTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACC AGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGA AACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTC TTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCA TGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTC AGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTG TATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGG ACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCG CAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGG AACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGC TTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTG AGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGAC CCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCC TGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAA AGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCT CCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCC GGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGA TTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACT ATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATC AGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAA GGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATG GATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTG GCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCG CGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTG GTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTAC ATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGA GTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACG ACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAAC ACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTG TAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAG TCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGT TTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTG CCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTAC CTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCA AGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTG TCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGC CTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAA AGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGG CGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGT GAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGT AGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGC AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTG GTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACC AGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGA AACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTC TTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCA TGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTC AGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTG TATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGG ACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCG CAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGG AACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGC TTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTG AGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGAC CCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCC TGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAA AGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCT CCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCC GGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGA TTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACT ATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATC AGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAA GGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATG GATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTG GCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCG CGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTG GTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTAC ATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGA GTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACG ACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAAC ACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTG TAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAG TCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGT TTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTG CCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTAC CTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCA AGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTG TCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGC CTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAA AGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGG CGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGT GAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGT AGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGC ATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGACCCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCC ACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGGGTGGCT TCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATG GGACGGAAAACCTGGGAAAGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCC TCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGC AGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGAGGAGCGGGACTGTACGAG GCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTACATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCC CTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTGTGTTCGT TCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCA GACAACGGGGTACCCTACGACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGAGCA ACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCAT TGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCAT GTTCACTTTAGAGGCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATG ACCGAACGG >Translation Frame 1 MQKPVCLVVAMTPKRGIGINNGLPWPHLTTDFKHFSRVTKTTPEEASRLN GWLPRKFAKTGDSGLPSPSVGKRFNAVVMGRKTWESMPRKFRPLVDRLNI VVSSSLKEEDIAAEKPQAEGQQRVRVCASLPAALSLLEEEYKDSVDQIFV VGGAGLYEAALSLGVASHLYITRVAREFPCDVFFPAFPGDDILSNKSTAA QAAAPAESVFVPFCPELGREKDNEATYRPIFISKTFSDNGVPYDFVVLEK RRKTDDAATAEPSNAMSSLTSTRETTPVHGLQAPSSAAAIAPVLAWMDEE DRKKREQKELIRAVPHVHFRGHEEFQYLDLIADIINNGRTMDDRT 3
Terminology Different prediction methods often generate different results exon 1 exon 2 union intersection 4
1996 P. falciparum Chromosome 2 - past and present Chromosomal Synteny between organisms 1998 Other C. parvum lacks HXGPRT Apicomplexans C.p. Thymidine Kinase represents a lateral transfer from a bacterium Cryptosporidium Striepen et al, 2004 PNAS 101(9): 3154-3159 Comparative genomics of pyrimidine biosynthesis in Apicomplexa C. parvum genome encodes two unique salvage enzymes Striepen et al, 2004 PNAS 101(9): 3154-3159 Striepen et al, 2004 PNAS 101(9): 3154-3159 5
Evolutionary relationships Homology - related by evolutionary descent not equivalent to similarity Orthology - same gene in different organisms, e.g. alpha hemoglobin in humans and chimps Paralogy - genes within an organism related by gene duplication, e.g. alpha and beta hemoglobin in humans Xenology - genes related by gene transfer Functional Genomics Always has a time and space component Expression Profiles The pattern of expression of one or more genes over time or a set of experimental conditions, e.g. during development or a drug treatment or in a genetic mutant such as a gene knock-out. Microarrays cdna microarrays GeneChip in situ synthesized oligonucleotide arrays Oligomer (~70mer) arrays Experiments are almost always Competitions between conditions or stages cdna array grid cdna Microarrays Robotic microarrayer 6
Chip Oligo Array Hybridization mrna Probe labeling AAAAAAA TTTTTTT AAAAAAA Oligo dt Cy5 dutp Reverse transcriptase or Cy3 dutp TTTTTTT AAAAAAA Labeled cdna General Scanning ScanArray 3000 Scans, Red, Green and Merged 7
Ratios of experimental to control expression are often expressed as colors rather than numbers A Dendrogram of clustered expression profiles Clustered Microarray Data Genes with Similar Expression Profiles are Grouped together 8
Typical 2 D gel Protein Expression High throughput mass spectrometry How Tandem MS Works Direct identification of proteins from biological sample. Capillary liquid chromatography apparatus (LC) coupled with... Electrospray tandem mass spectroscopy (MS/MS) Sequest software links mass spectra with genomic sequence database. Complex mixture Protein Ionized Peptides Liquid chromatography Isolation Collision Inducted Dissociation (CID) Fragmentation Measurement Peptides I P A P K Individual mass spectra Tandem MS protein data I P A P K I P A P K I P A P K Combined mass spectrum I P A P K 9
Sequest Database Search Mass Spectrometer Protein Database Nucleic Acid Database EST Database Tandem Mass Spectrum Theoretical Mass Spectrum Peptide database ENNPCKLQYDYNTNVTHGFGQEYPCETDIVERFSDTEGAQCDKKKIKDNSEGACAPYRRL HVCVRNLENINDYSKINNKHNLLVEVCLAAKYEGESITGRYPQHQETNPDTKSQLCTVLA RSFADIGDIIRGKDLYRGGNTKEKKKRKKLEENLKTIFGHIYDELKNGKTNGEEELQKRY RGDKDNDFYQLREDWWDANRETVWKAITCNAGSYQYSQPTCGRGEIPYVTLSKCQCIAGE VPTYFDYVPQYLRWFEEWAEDFCRKKKKKIPNVKTNCRQVQRGKEKYCDRDGYNCDGTIR KQYIYRLDTDCTKCSLACKTFAEWIDNQKEQFDKQKQKYQNEISGGGGRRQKRSTHSTKE YEGYEKHFNEELRNEGKDVRSFLQLLSKEKICKERIQVGEETANYGNFENESNTFSHTEY CDRCPLCGVDCSSDNCRKKPDKSCDEQITDKEYPPENTTKIPKLTAEKRKTGILKKYEKF CKNSDGNNGGQIKKWECHYEKNDKDDGNGDINNCIQGDWKTSKNVYYPISYYSFFYGSII DMLNESIEWRERLKSCINDAKLGKCRKGCKNPCECYKRWVEKKKDEWDKIKEFFRKQKDL LKDIAGMDAGELLEFYLENIFLEDMKNANGDPKVIEKFKEILGKENEEVQDPLKTKKTID DFLEKELNEAKNCVEKNPDNECPKQKAPGDGAAPSDPPREDITHHDGEHSSDEDEEEEEE EEQQPPAEGTEQGEEKSESKEVVEQQETPQKDTEKTVPTTTPTVDVCDTVKTALADTGSL NAACSLKYVTGKNYGWRCIAPSGTTSGKDGAICVPPRTQELCLYYLKELSDTTQKGLREA FIKTAAQETYLLWQKYKEDKQNETASTELDIDDPQTQLNGGEIPEDFKRQMFYTFGDYRD LFLGRYIGNDLDKVNNNITAVFQNGDHIPNGQKTDRQRQEFWGTYGKDIWKGMLCALQEA GGKKTLTETYNYSNVTFNGHLTGTKLNEFASRPSFLRWMTEWGDQFCRERITQLQILKER CMVYQYNGDKGKDDKKEKCTEACTYYKEWLTNWQDNYKKQNQRYTEVKGTSPYKEDSDVK ESKYAHGYLRKILKNIICTSGTDIAYCNCMEGTSTTDSSNNDNIPESLKYPPIEIEEGCT CKDPSPGEVIPEKKVPEPKVLPKPPKLPKRQPKERDFPTPALKNAMLSSTIMWSIGIGFA TFTYFYLKKKTKSTIDLLRVINIPKSDYDIPTKLSPNRYIPYTSGKYRGKRYIYLEGDSG TDSGYTDHYSDITSSSESEYEELDINDIYAPRAPKYKTLIEVVLEPSGNNTTASGNNTPS DTQNDIQNDGIPSSKITDNEWNTLKDEFISQYLQSEQPNDVPNDYSSGDIPLNTQPNTLY FDNPDEKPFITSIHDRDLYSGEEYSYNVNMVNTNNDIPISGKNGTYSGIDLINDSLNSNN Correlation Analysis Note: ORFs in addition to predicted Genes must be searched Ranked Score of Matched Peptides A protein network Protein Networks, often established by Y2H studies Networks have edges and nodes Ontologies Terms to link equivalent concepts that have different names Gene Ontologies Sequence Ontologies BioMedical Ontologies Cell component Ontologies http://www.bioontology.org/ Community Databases NCBI -http://www.ncbi.nlm.nih.gov/ KEGG -http://www.genome.jp/kegg/pathway.html PFAM - http://pfam.janelia.org/ GO - http://www.geneontology.org/go.doc.shtml Interpro - http://www.ebi.ac.uk/interpro/ ProDom - http://prodom.prabi.fr/prodom/current/html/hom e.php Cyc - http://metacyc.org/ 10