It is like any other experiment! What is a bioinformatics experiment? You need to know your data/input sources You need to understand your methods and their assumptions You need a plan to get from point A to point B You need to understand your equipment You need to be critical and understand potential sources of error You need to interpret your results Your results need to be reproduceable Your results should be testable Remember the Goal Infectious Disease Paradigm Phenotype Experimental systems Genotype Pathogen Host Vector 1
Bacterial Viral Parasitic Host Genetics Know your data Immunology Genomics Microarray Proteomic >Pfa3D7 chr1_000035 2001.01.03 GENOMIC Sanger CCAGTTTGTGATGTTTCTCCTTGTTGCACATAGCCACTTCCTTTCTTCCTCTGTTAAAAAAGCTCCTCTCCCCAGAATACTCCCTTGTAGCACCATGATTGAATGACCACTCTTTCTTAAGTGAACTTCTTGAAAAC TAGCTTGAATCCATCCTCTTCATTTCCTTACCTCTTATTTCAGTTTCAGCCTATTGCTGTCTAGCATATGATATATCTGCCATTCTGCATGTGATATATCTCAATGCCTGTCATCATTTCCAGCATTCCACCCTATA TGTGGCCTATATACCGCCATCTTCTGTCCCATGCTCTTACTGAAGTCTGCCTAGATTAGACCATTTTAAAGTCCATCACTACATACTTCTTCCTCTTCTGGTCCTTTAGCTTGATGGACACACCTGGTTTCATACTT ACTAATTATTGCATTTGGCAACTGTATTTACTACTTAAACCATTCTTTCACCCTTACGTGGTTTGGGTTGGCTGTCGCTCTCAGGAGTACCTGGCTAGCTTGGCTGCATGGTTTCCCTGCCAAAATAGATACAGGAA GGGTAGTACCTCCTAGGGGATGCTATGCTTTTTGAGATAAGGCTCCTGGAAATAGGTGTCATCCAGTCTGAACCTATGAGAAATGGAGATCGCTTGAGATTATTGCCTACTTAAAGAACCTTAGTGAACTGTTCTAC TAGGATTTACTTTTCACAGCTCCTTGATGGGGAAAAAAAAAAATTACACATTCGAGTTCTTCCCTCAGGGATAAATACGAAAAACTTGTCCATCAGCTTAGCTAGAAGTTAGCAACCACAGACAGAAACTTTGTAAT ACTTTTTTTTTAAAGTTTTATAATATTCTTAGTTCTCATATTAGTTTTTTTTTTTCATCTTTTCCCATTTTTTTGTGATTATAAAACTTTTTACATTGGAATTGTTCTGTTACTGTGTGGAGCAAAAAATAAGAGGG GAAATTGTTTAGACCTTTTATACTAAGTCTTAGTATATCCAGAGGGGAGTGTAGGTGGTGGCAGCTTTACAATGGAAAGGAAACGTGGGGAGTCCCCAGGACCTGCAGTCAATGAATAACAGGCTCCCTCCGTGACT GGTAAGAGAGTTTTGGGAGGTGAGTATAACATGACTCATTCAGGTTCTTTCTTGCTCTTTAATTTCTGTGATATTTTGCTGGTCTATGAAACTGATAAGCCTTAATGGTTGCAGTTTATTCTGACCCAGAGTTTATT TTCTAGTGATGCCTTTATTTTTTTGTTGGGATGTTGCTGGATAGTAAAGTAAAACTGAAGATCCTGGCCTTTCTCGTTCTCCTTCAAGAAAATGGGATTTCAGAAAACACTTGTCTTTAGCCTTCTCATGAAATTAA TTTCATAGACCTGTTTCTTGTTTTGATGGAACCCCTGTAGAGTTACCTAATATAAAGGTATATTAAGATTTTCTAGAGAAACAGAACCCATAGGCTAGATGAAGAGATAGATTTAAGAGGAGAGTTCTTGTAGGTGT TGGCTCACATGGTTATAAGGAAGTCCCATGATCTGTAGCCTGGAGAACCAGGACAGCTGGTGATGTGTTTCAGTCTAGGTTCAAAGGTCTGAGAGTCTGAGAATCAGGTGGACGGGAGGGTGTTAGTCCTGGGCTGG GTCTGAAAGCTTAAGAACTTAAGGCCGGATGTCTAGGGGCAGGAGACGAGTGTCTCAGCTCTAGCACAGAGCGAATTTGCCCTGCTGCCTCTTTGTTTATTCAGGCCCTCAGTGTATTGGAAGATGCCCACTTAAAG TAGGCAGTCTACTTAAAGAGGACCAGTCTACTTTGGTCTTCCCCAAAGTGGATTAGGGAGGATCATCTACTTTACTCAGTTATCAATTTAAGTGCTAATCTCTTCTGGAAATGCCCTCATAGACACACCCTGAAATA ATGGTTTACCAGCTATCTGGGCACCTCTTACTTTACATTCACAAGTAATGTTAACCATCACAACAGAGTAAAAAAACCTGTTGTTGTTGTTATTGTCTTGCTTTTTTAAAGGAGAGATTTATTTCTGCTGCTAGTTT AACTTCCTCCTAAAACTGGTTTGGTAATATTCGTGAACTCCCATGACTAGAGAAACTTCAGGTGTCTGTAGAGCTCTTTGACTTTCAGAACCGTGTTGCAAGTGTCCTTAACTGATTTGAAAGTTCTAATAACAACC AACGTTGGAATGTTGATCGTGTTCTAGGTGCTGTGCCAAATGCTCCCCGCATAACCTCTCACCTGATTGCTAAAACAGCTTTATGAGCTCCGTATTGTCTACATTTTACAGGAGTGCCAACTGAAGCCTGGACAAGT AAATTATCTTCTCAGGTTACACATAGGCTGCTGGGCCTGGCTTCTGGCCTTCAATTCTAACTATTGTTATTTTCATGAAAGTGACACCTTAAGTGCTTTCTTTGGTAGTGGTGTTGGGGTAAGCCTTTGTAGAACAG AACAGTTGTTACAGAAAACTTGTTTACATGGAAGCATTCCTTCAGCGATGACTGACAGACGGGAAAAGCAAAGTGCAGGTCGACCATCTCAAATATGAAAATGTGAAATCCAAAATGCTCCAAAATCCAAAACTTCT TGGGATTGACATGATGCTCATAGGAAATGCTCTTTGGAGCATTTTGGATTTTAGATTTTTGGGTTCGGGATGCTCAACCTGTAAGTATAATGCAAATATTCCAAATTCTGAAAAAAGGCGAAATCAGAAACACTTTT GGTCCCAGACATTTAGAATGAGGGCTATGCAACCTGTAACTGAGAATTTTTACCTAGTGCATATTATGTCCAAGTAACTAACAACTGTTGAAGGAAAGAATTTTAACATCCCATTTTACTCTCATTAAGTGGTTGTG GAAATGACCAATGGCATTTATACTTAGGTTTGTAACATCATCCATTTATTATACGGTCTTTCTTTGCTTATCTGCTGCATTCTTGAGATTGAAAATTTTATCCTGGAATAATAAATGACCCTATCTCAAACAGCTGC CATGTTAAGATGAATAAGAACATCATAGGGGGAGTAGATGCATTTTTGGGAGGCCTCCATCTGAAGTGACATGAATTCATAACACTCTAGTTCTGTCTACATGTCATGCTGTTACTAGGTGAGCAGGGAACTGTCAT TCCTACACCTTATTTAATAGAGGTGATCAGAATGGAGGATAAAGGGAAATAGCATGAGACTGTGAATGGATGTGGGGATTCTCATTGGTTTTGCTGCCAAGTAGAATCGTGTCACCTAGCAAATCACAACATTTCTG GCCTTCACTTTCCTGACTAGTAAAACGAGGTTTTTGAACTAGGCTGTCTTTACTGATTCTTTAACTGCTAAAGTTCTATGATTTTACATATGAAACCAAACCTAACAACATTGCTAACATGTATTTTTCAAAGCCAC AGAAGTTACATGCACATTTAATGAAGTTCCAGTGGCTTTATTAGAATTGGCTGATTGTACCATTATATTGCATTATAATAGCAAGGGTGAGGGTTGTTTACTTGTTCGGGGAAGGGGGGCATTGGGGCTACTTGTAC TTAAGCCTCAGGCCTGCCTGCTTCATGATCTTTGCTTGCCTTTTCTCACTACTAATTGCCCCTCACTTACAAGCTGAGACCTGCCCTCTTTCCCCTAGGGCTAATGCCTGTGTTGGGATCTTGAGCTCTCTTTTTGT TAACTGATTCTCTGTGTTTTTTTGTTTTTTTGTTTTTTTTTTGAGACGAGTCTCGCTGTGTCGCCCAAGCTGGAGCGCAGTGGTGTGATCTCTGCTCACTGAAACCTCTACCTGCCGGGTTCAAGCAATTCTCCTGC CTCAGCCTCCTGAGTAGCTGGGATTACAGGCATGCACCACCACGCCTGGCTAATTTTTGTATTTTTAGTAGAGACGGGGTTTCACCATGTTGGTCAGGCTGGTCTCAAACTCCTGACCTTGTGATCCGCCCACCTTG GCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCTGGCCCTCCTTTTTTCTTTTTTGGGACGGTGTGTGGCTCATTGCTTAGGCTGGATTGCAGTGGCACAATCTCGGCTTACTACAACCTCCGCCTTC CTGGTTCAAGTGATTCTTCTCCCTCGCTTCCCGAGTAGCTAGGATTACAGGCGCCCGCCACCATGCCCTGCTAATTTTGTATTTTTAGTAGAGACGAGGTTTCACCAAGTTGGTCTGGCTGGTCTTGAACTCCTGAC CTCAGGTGATCCACCCAACTTGGCCTCCCAAAGTGCTGGGATTACAGGTGTGAGCCACTGCACCCGGCCATGTTAACTGGTTTTCTTTTTTTGTCCCTGAGTTCCTGTCTTTAGGATCTGAACACTTGATTTTTATT TATTTATTTATTTTTTTTGTTTGTGAAACTCGTTTCGCTTTTGTTGCCCAGGCTGTAGTGCAATGGCATGATCTCGGCTCACTGTGCAACCTCTGCCTTCTGGGTTCAAGCAATTCTCCTGCCCCACCCAGCCTCCT GAGTATCTGGGATTACAGGCTCCTGCCAGCACGCCCGGCTAATTTTTGTATTTTTTAGTAGAGATGGGGTTCACCACTTTGGCCTGGCTGGTCTTGAACTCCCGACCCCAGATGATCCGCTTGCCTCGGCCTGCCAA AGTGTTGGGATTACAGCTGTGAGCCACTGTCCCCGGCCTTTTTTTTTTTTTTTTAGATGGGGTCTTTTTCTATTGCCCAGGCTGGAGTGCAGTGGTTTGATCATAGCTCACTGTAGCCTTGAACTCCTGGGCTCAAA CAATCCCCCACCTCAGCCTCCCAAAGCGCTGGGATTATAGGCATGAGCTACCACACCCGGCCTGAACACTTGATCCCTTTTTTTTTTTTTTTTTTTTTTGAGACAGCAATGATGCGATCTTGGCTCACTGTAACGTA CGCCTCAAGTAGCTGGGATTATAGGCGCCCGCCACCATGCCTGGCCAATTTTTTTGGATTTTAGTAGAGACAGGGTTTCACCATGTTGGCCGGGCTGGTCTGGAACTCCTGACCTCAGGTGATCTTCCCGCCTTGGC TTCCGAAAATGGGATTACTGGCGTGAGCCACCGTGCCCGGCCTCACTGGAGCTCTTTTAATAGGTGAACTCTGGTTGCCCCTTTGCATGTCTCTTATTCCTTCCTCTGCTATAGGAATATAGGCTTTTAAACCCCAA CTCCGTGAGTAGACCAGCCTGCTTCTCTGAATTTCTGAGTACCAGGTGAACCTGCAGGGTGTCATGTCAGAAACAGAGACTTTTTTTTTTTATAGTGAAGATGTCCTTGATGACTGTGTATACAAATACACACACAT ACACACTTTTTAAAAAAAGTTAATTTCCAGACTTTATGGACAGTGTGCAGATTCTTTATTATATCACAGTGTTATTTTTCTTGCCTGCATTTCCCCCCACCTTCTATGGCTTTGCCTGTATTACCACATATTTATTA CAGAATCCTTTGACACCAGTGTTCTGGCTGATTCCCTGTCAACCCTCTGTTGTCTCCCTCTGTTCCCCACCTAACTCTCTCTAAGTGGGCAGGCTTGTTTTTGGTTATGATTCGCCCCAAAAGTTATAAAAGTACAT TTGGATCATAGTTGCCTTTGATGGTTTCTGCGGTAGAACCAGTGGTGCCAGTTAATTTCTTGAATGGCTGCCCCCATAAATTGGGAGTAGCTATTGGAAGTGCTTTGTGAGCTTATCAGGGAAATGACAGGACTGAA TAATGATCTGTCATGGGCATGGTATGGGGGGTGGTGGCACATGTGCCATCATTTGCCAGTGGCCCCGGAAGCCCAACACTCTGTTTATATATGTGTATTAATTGTTTCTTTGGTTGTCCAGCATTGGACTCATAATG GCCTTTTGTATATATCAGGGTTCCTCACCGTTTGAAGTAGAGTTTCCAATACCTACTTTAACATTGGCTCAGCCACTTATATTTACAAAAGGTCTCAAGATTTCTTACTGGTAGAATTATTTAGATTCTATACTTAA TATTAAGCAATTTCACCCTTGAGTCATAATTTCCAAAGTGTGCTCTCCCAGTATATTCTAATAGCGGTTCCCAGGATTTGGACCACGGACTGTATTGAGGAAAAATGCTGGTTGCTAGGTATTAAGAACTGATGTAA ATTAGTAAGAAAAGACAGATGATCCATTGAAAATGTGGTAAAATAATAATAGGTAATGTTTGCCGAGTGTGCCAGATCCTGTGGTAAGTGTTTTAAATGTTGTGTTGGTTGCTTTTCATAGTTCCCTAATGAGATCA TTATGATTATCCCTAATTTGTGCTTGAGGAAGTGAGGCACAGAAGCTCATTAAGTTCCCTGAGGTCACCCATACTTAAGTGATGGAACCAGGACTTGAGCCGAGTCAGCCCAACTCCAGAGCCTGTCCTCATAACCA ATGTGTTGTAAAGGTCAAAGGAGATTTCCGGATCTTCACAGAAAGGGAACACAAATTCACATTGACAGATATAAATTATTTTGAGGTACCGCTTTTCACTTCTGAGATTCAAGTGTGACTCTGGCAAGAAGGTGATG TATATACTTACATTAATGGAATATATAATATCTTTTTTTAAAAAAATGATGTTTAACAGCTGTTGGTATCATTGCCTAAATCAATTATATTATTAGTGTTGCAGAATGATGATACTCTAATTGTATCATTATTTTTT CATGTATTAACTCTGATACTTTTTTTTTTTTTGAGACGGAGTCTCGCTCTGTCCCCCAGGCTGCAGTGTAGTGGCGTGATCTCTGTTCACTGCAAGCTCCGCCTCTCGGGTTCACGCCATTCTCCTGCCTCAGCCTC CTGAGTAGCTGGGACTACAGGTGCCGGCCACCACGCCCGGCTAATTTTTTTGTATTTTTAGTAGCGACGGGGTTTCACCGTGTTAGCCAGGATGATCTCGATCTCCTGACCTCGTCATCCACCCGCCCTGGCCTCCC AAAGTGCTGGGATTACAGGCGTGAGCCACCACGCCCGGCCTATAACTTTGATACTTTTATAAAAGAAATTTACTCCTGATCAATTACTTTGCTTTCTGGAAGTCACTTTATCCAGGAAGGCCAAGATAAGTCCTTGT TTGTTTTCCTTTTTTGTCTATTTCCAAAATGGTAGTCCCCCACCTTATTCATGGTTTTGCTTTCTGTGGTTTCAGTTAAATGGAAAATTCCAGAAATAAATAGTTCATAAGTTTTACTTATTTATT Typical 2 D gel 2
Know your method Can you tell when something has gone wrong? Can you determine why? 3
4
Know your equipment Do you know how it works? Do you know how to fix it? 5
Know your procedure How do you get from point A to point B? 6
7
Remember the assumptions Know your technique How were the results/data generated? What sources of error do these techniques produce? EST s: A snapshot of all detectable RNA s [usually Poly(A)+] present in a give cell type, tissue, disease state, experimental condition or developmental stage. Microarrays: A snapshot of all detectable RNA s present in a given cell type, tissue, disease state, experimental conditions or developmental stage relative to a control. Proteomics: A snapshot of all detectable proteins in a given cell type, tissue, disease state, experimental condition or developmental stage Microarrays cdna Microarrays Robotic microarrayer cdna microarrays GeneChip in situ synthesized oligonucleotide arrays Oligomer (~70mer) arrays 8
Chip Oligo Array Hybridization General Scanning ScanArray 3000 SEQUEST Database Search What about protein expression? Mass Spectrometer Tandem Mass Spectrum Protein Database Nucleic Acid Database EST Database Theoretical Mass Spectrum Correlation Analysis Ranked Score of Matched Peptides 9
Peptide database ENNPCKLQYDYNTNVTHGFGQEYPCETDIVERFSDTEGAQCDKKKIKDNSEGACAPYRRL HVCVRNLENINDYSKINNKHNLLVEVCLAAKYEGESITGRYPQHQETNPDTKSQLCTVLA RSFADIGDIIRGKDLYRGGNTKEKKKRKKLEENLKTIFGHIYDELKNGKTNGEEELQKRY RGDKDNDFYQLREDWWDANRETVWKAITCNAGSYQYSQPTCGRGEIPYVTLSKCQCIAGE VPTYFDYVPQYLRWFEEWAEDFCRKKKKKIPNVKTNCRQVQRGKEKYCDRDGYNCDGTIR KQYIYRLDTDCTKCSLACKTFAEWIDNQKEQFDKQKQKYQNEISGGGGRRQKRSTHSTKE YEGYEKHFNEELRNEGKDVRSFLQLLSKEKICKERIQVGEETANYGNFENESNTFSHTEY CDRCPLCGVDCSSDNCRKKPDKSCDEQITDKEYPPENTTKIPKLTAEKRKTGILKKYEKF CKNSDGNNGGQIKKWECHYEKNDKDDGNGDINNCIQGDWKTSKNVYYPISYYSFFYGSII DMLNESIEWRERLKSCINDAKLGKCRKGCKNPCECYKRWVEKKKDEWDKIKEFFRKQKDL LKDIAGMDAGELLEFYLENIFLEDMKNANGDPKVIEKFKEILGKENEEVQDPLKTKKTID DFLEKELNEAKNCVEKNPDNECPKQKAPGDGAAPSDPPREDITHHDGEHSSDEDEEEEEE EEQQPPAEGTEQGEEKSESKEVVEQQETPQKDTEKTVPTTTPTVDVCDTVKTALADTGSL NAACSLKYVTGKNYGWRCIAPSGTTSGKDGAICVPPRTQELCLYYLKELSDTTQKGLREA FIKTAAQETYLLWQKYKEDKQNETASTELDIDDPQTQLNGGEIPEDFKRQMFYTFGDYRD LFLGRYIGNDLDKVNNNITAVFQNGDHIPNGQKTDRQRQEFWGTYGKDIWKGMLCALQEA GGKKTLTETYNYSNVTFNGHLTGTKLNEFASRPSFLRWMTEWGDQFCRERITQLQILKER CMVYQYNGDKGKDDKKEKCTEACTYYKEWLTNWQDNYKKQNQRYTEVKGTSPYKEDSDVK ESKYAHGYLRKILKNIICTSGTDIAYCNCMEGTSTTDSSNNDNIPESLKYPPIEIEEGCT CKDPSPGEVIPEKKVPEPKVLPKPPKLPKRQPKERDFPTPALKNAMLSSTIMWSIGIGFA TFTYFYLKKKTKSTIDLLRVINIPKSDYDIPTKLSPNRYIPYTSGKYRGKRYIYLEGDSG TDSGYTDHYSDITSSSESEYEELDINDIYAPRAPKYKTLIEVVLEPSGNNTTASGNNTPS DTQNDIQNDGIPSSKITDNEWNTLKDEFISQYLQSEQPNDVPNDYSSGDIPLNTQPNTLY FDNPDEKPFITSIHDRDLYSGEEYSYNVNMVNTNNDIPISGKNGTYSGIDLINDSLNSNN Interpret your results Are they what you expected? Might your data have violated some assumptions? 10
You can t Break it What is the worst that can happen? Crash the program? Restart the computer? Waste three hours of compute time? Learn something? Bioinformatics is a technique, it must be learned and learning involves exploration and mistakes - The important thing is to learn from your mistakes You can misinterpret it! Biological data are not random Molecules or parts of molecules do not evolve/behave independently Data and computer programs were generated by humans The computer ALWAYS has an answer, our job is to be sure it is the best possible answer given what we know. 11