Crash Course in Omics Terminology, Concepts & Data Types

Hasonló dokumentumok
Crash Course in Omics Terminology, Concepts & Data Types

Crash Course in Omics Terminology, Concepts & Data Types

Welcome! EuPathDB Workshop Crash Course in Omics Terminology, Concepts & Data Types

Phenotype. Genotype. It is like any other experiment! What is a bioinformatics experiment? Remember the Goal. Infectious Disease Paradigm

Welcome! EuPathDB Workshop 2010

Crash Course in Omics Terminology and Concepts

Crash Course in Omics Terminology and Concepts. Genome assembly. Clone-by-Clone Genome Sequencing. Shotgun DNA Sequencing (Technology)

Trinucleotide Repeat Diseases: CRISPR Cas9 PacBio no PCR Sequencing MFMER slide-1

Mapping Sequencing Reads to a Reference Genome

Supporting Information

Genome 373: Hidden Markov Models I. Doug Fowler

Supplementary Figure 1

discosnp demo - Peterlongo Pierre 1 DISCOSNP++: Live demo

Tutorial 1 The Central Dogma of molecular biology

SOLiD Technology. library preparation & Sequencing Chemistry (sequencing by ligation!) Imaging and analysis. Application specific sample preparation

Bioinformatics: Blending. Biology and Computer Science

Construction of a cube given with its centre and a sideline

Correlation & Linear Regression in SPSS

On The Number Of Slim Semimodular Lattices

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Factor Analysis

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Nonparametric Tests

Nan Wang, Qingming Dong, Jingjing Li, Rohit K. Jangra, Meiyun Fan, Allan R. Brasier, Stanley M. Lemon, Lawrence M. Pfeffer, Kui Li

Supporting Information

Supplementary Table 1. Cystometric parameters in sham-operated wild type and Trpv4 -/- rats during saline infusion and

Cluster Analysis. Potyó László

3. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Flowering time. Col C24 Cvi C24xCol C24xCvi ColxCvi

EN United in diversity EN A8-0206/419. Amendment

Forensic SNP Genotyping using Nanopore MinION Sequencing

Statistical Inference

The beet R locus encodes a new cytochrome P450 required for red. betalain production.

STUDENT LOGBOOK. 1 week general practice course for the 6 th year medical students SEMMELWEIS EGYETEM. Name of the student:

Supporting Information

Performance Modeling of Intelligent Car Parking Systems

Sebastián Sáez Senior Trade Economist INTERNATIONAL TRADE DEPARTMENT WORLD BANK

Orvosi Genomtudomány 2014 Medical Genomics Április 8 Május 22 8th April 22nd May

Involvement of ER Stress in Dysmyelination of Pelizaeus-Merzbacher Disease with PLP1 Missense Mutations Shown by ipsc-derived Oligodendrocytes

Correlation & Linear Regression in SPSS

A modern e-learning lehetőségei a tűzoltók oktatásának fejlesztésében. Dicse Jenő üzletfejlesztési igazgató

Klaszterezés, 2. rész

Supplemental Table S1. Overview of MYB transcription factor genes analyzed for expression in red and pink tomato fruit.

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Correlation & Linear. Petra Petrovics.

First experiences with Gd fuel assemblies in. Tamás Parkó, Botond Beliczai AER Symposium

Rezgésdiagnosztika. Diagnosztika

Statistical Dependence

Angol Középfokú Nyelvvizsgázók Bibliája: Nyelvtani összefoglalás, 30 kidolgozott szóbeli tétel, esszé és minta levelek + rendhagyó igék jelentéssel

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Nonparametric Tests. Petra Petrovics.

Expansion of Red Deer and afforestation in Hungary

1. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Modular Optimization of Hemicellulose-utilizing Pathway in. Corynebacterium glutamicum for Consolidated Bioprocessing of

Expression analysis of PIN genes in root tips and nodules of Lotus japonicus

Using the CW-Net in a user defined IP network

Can/be able to. Using Can in Present, Past, and Future. A Can jelen, múlt és jövő idejű használata

University of Bristol - Explore Bristol Research

Manuscript Title: Identification of a thermostable fungal lytic polysaccharide monooxygenase and

Supplemental Information. Investigation of Penicillin Binding Protein. (PBP)-like Peptide Cyclase and Hydrolase

Proxer 7 Manager szoftver felhasználói leírás

Széchenyi István Egyetem

Contact us Toll free (800) fax (800)

Utolsó frissítés / Last update: február Szerkesztő / Editor: Csatlós Árpádné

USER MANUAL Guest user

A cell-based screening system for RNA Polymerase I inhibitors

Cashback 2015 Deposit Promotion teljes szabályzat

10. Genomika 2. Microarrayek és típusaik

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Hypothesis Testing. Petra Petrovics.

7 th Iron Smelting Symposium 2010, Holland

ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA I. VIZSGÁZTATÓI PÉLDÁNY

Budapest By Vince Kiado, Klösz György

Create & validate a signature

Markerless Escherichia coli rrn Deletion Strains for Genetic Determination of Ribosomal Binding Sites

Választási modellek 3

már mindenben úgy kell eljárnunk, mint bármilyen viaszveszejtéses öntés esetén. A kapott öntvény kidolgozásánál még mindig van lehetőségünk

Architecture of the Trypanosome RNA Editing Accessory Complex, MRB1

ANGOL NYELVI SZINTFELMÉRŐ 2013 A CSOPORT. on of for from in by with up to at

A rosszindulatú daganatos halálozás változása 1975 és 2001 között Magyarországon

Supplementary materials to: Whole-mount single molecule FISH method for zebrafish embryo

A jövedelem alakulásának vizsgálata az észak-alföldi régióban az évi adatok alapján

Supplemental Information. RNase H1-Dependent Antisense Oligonucleotides. Are Robustly Active in Directing RNA Cleavage

ELIXIR-Magyarország: lehetőségek és kihívások: Bálint Bálint L, Debreceni Egyetem, ELIXIR-Magyarország oktatási koordinátor

Utolsó frissítés / Last update: Szeptember / September Szerkesztő / Editor: Csatlós Árpádné

A fő egészségügyi kihívások

Bird species status and trends reporting format for the period (Annex 2)

Gottsegen National Institute of Cardiology. Prof. A. JÁNOSI

Eredeti gyógyszerkutatás. ELTE TTK vegyészhallgatók számára Dr Arányi Péter 2009 március, 4.ea. Szerkezet optimalizálás (I.)

PIACI HIRDETMÉNY / MARKET NOTICE

Tudományos Ismeretterjesztő Társulat

Pletykaalapú gépi tanulás teljesen elosztott környezetben

DI-, TETRA- ÉS HEXAPLOID TRITICUM FAJOK GENOMJAINAK ELEMZÉSE ÉS AZOK ÖSSZEHASONLÍTÁSA FLUORESZCENS IN SITU HIBRIDIZÁCIÓVAL

kpis(ppk20) kpis(ppk46)

EEA, Eionet and Country visits. Bernt Röndell - SES

Tavaszi Sporttábor / Spring Sports Camp május (péntek vasárnap) May 2016 (Friday Sunday)


Lecture 11: Genetic Algorithms

Szoftver-technológia II. Tervezési minták. Irodalom. Szoftver-technológia II.

SQL/PSM kurzorok rész

Results of the project Sky-high schoolroom SH/4/10

Smaller Pleasures. Apróbb örömök. Keleti lakk tárgyak Répás János Sándor mûhelyébõl Lacquerware from the workshop of Répás János Sándor

JEROMOS A BARATOM PDF

A évi fizikai Nobel-díj

EPILEPSY TREATMENT: VAGUS NERVE STIMULATION. Sakoun Phommavongsa November 12, 2013

Átírás:

Crash Course in Omics Terminology, Concepts & Data Types 13 th Annual EuPathDB Workshop Jessie Kissinger June 17th, 2018

FungiDB MicrosporidiaDB AmoebaDB PlasmoDB ToxoDB CryptoDB PiroplasmaDB TrichDB GiardiaDB TriTrypDB 307 Organisms! Pawlowski BMC Biology 2013

Infectious Disease Paradigm Experimental systems Host Pathogen Vector

Protein Expression Genome and RNA Sequencing Annotation Climate Interactions BLAST Similarities parasitophorous! vacuole! ChIP-Chip SNPs Host distribution Structure host cell! nucleus! Expression profiling arraysfunction Putative Host(many platforms) (GO) Pathogen transcription factors! (FOS, EGR2, etc)! Models Orthologs Phylogenetics apoptosis! Subcellular Location Comparative mitochondrial function! cytokines, signaling, etc! Phenotype Genomics sterol biosynthesis! cell cycle, DNA replication/repair! Host response Modified from slide Provided by David Roos Pathways Isolates

Sequence Technologies/platforms Hybridization (Southern, Northern, Microarray, chip Arrays (Affymetrix), now.. bait + Seq Sanger sequencing, 454, Solexa/Illumina, Solid, Ion torrent, PacBio, Nanopore, 10X Genomics Chromium DNA sequencing technologies: 2006 2016 Elaine R Mardis NATURE PROTOCOLS VOL.12 NO.2 2017 213

30,000 ft View Genome Assembly Chromosomes I II III IV Scaffolds (Ordered and oriented Contigs) Contigs (Linked by paired-end reads NGS or large insert clones) 5X genome sequence means that sequences equivalent to 5X the genome size were generate e.g. Genome size = 10 Mbp, then 50Mbp of random sequences were generated

Anatomy of a WGS Assembly STS Chromosome STS-mapped Scaffolds Contig Read pair (mates) Gap (mean & std. dev. Known) Consensus Reads SNPs

Pairs Give Order & Orientation Scaffold Gaps in scaffolds are traditionally indicated by 100 N s 2-pair End Reads (Mates) 550bp Mean & Std.Dev. is known Primer NGS Plasmid Fosmid Distance? SEQUENCE Consensus Reads SNPs

30,000 ft View - Annotation I II III IV Genes A B B D F C E

AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTGGTCACGTGTA ACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACCAGCGAGGCGCTCGCTTTGCT CACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGAAACTGGCGGTCACAGGCACCAGATAACGCC CTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTCTTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCAC CTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCATGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTC GAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTCAGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACA CCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTGTATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTC AACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGGACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGC CCGCGGAAGATCCGATCTTGCTGCTGTTCGCAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGC CATTTTTGGGAAAATCGGGGAACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTC TCCTCCGTGCTTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTG AGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGACCCCCAAGAGG GGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGGGTGGCTTCCCAGG AAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAAAGCATGCCTCGAAAGTTTAGACCCCTCGTG GACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCTCCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCC CAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCCGGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGC GAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGATTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTG TGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACTATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGT TACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATCAGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGC AGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATA CATGTATATACAGATGTATGGATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCT GTGTGATGTGGCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCG CGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTGGTGCGTCCAG ACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTACATCACGCGTGTAGCCCGCGA GTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTGTGTTCGTTCCCTTTTGTCCGGAGCT CGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACGACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGC AGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAACACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAA GAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTGTAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCC TCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAGTCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCC ACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGTTTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGT TGCAGGCTCCTTCTTCGGCCGCAGCCATTGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAG GCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGAT TCTGTCCGCAAGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTG TCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGCCTCCCTCCAC CGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAAAGAGATATTTCGGCGCATCT TTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGGCGTTGGTGTCATCTCCAAATTCGGCTGCAC TATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGTGAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGA CAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGTAGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGT TTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGCGCGCACTGGTTTGCATGTGGCCTGAGAGAGCACAGATCGAAGGTGGGGTGATGTGGCGTC GCTGCAGAGAAACTCCGGCGAAGGCGACAGATAAAGGAGAGTGGAAATCATTGAACAGTGTCGGTCGTCTGTTGTTTCGCAGGGTCCTCGAAGAGTTGCTGTGGTTCATTCGCGGCGACA CGAACGCAAACCATCTTTCTGAGAAGGGCGTGAAGGCAAGTCTACGTTGTACCTCTTGTCTCTGCCGAAGCTCAGATGTCTCCACGGCGTTGGTTTCTTTTCGTTTTTGCTTTCGTGGCA TTACCATCGAGTCACCACTCATAGTTGCGTGTGTCTACATGTTTTCTAGAACGTCCGTTGTGTTGCCTCGTGGCGACCGGCGCGAGTGTATGTACCCTGCGCTGTCAGAAGTTGATCCTT

Six Frame Translation ORF-finding ORFs Genes

ATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGACCCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCC ACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGGGTGGCT TCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATG GGACGGAAAACCTGGGAAAGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCC TCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGC AGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGAGGAGCGGGACTGTACGAG GCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTACATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCC CTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTGTGTTCGT TCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCA GACAACGGGGTACCCTACGACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGAGCA ACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCAT TGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCAT GTTCACTTTAGAGGCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATG ACCGAACGG >Translation Frame 1 MQKPVCLVVAMTPKRGIGINNGLPWPHLTTDFKHFSRVTKTTPEEASRLN GWLPRKFAKTGDSGLPSPSVGKRFNAVVMGRKTWESMPRKFRPLVDRLNI VVSSSLKEEDIAAEKPQAEGQQRVRVCASLPAALSLLEEEYKDSVDQIFV VGGAGLYEAALSLGVASHLYITRVAREFPCDVFFPAFPGDDILSNKSTAA QAAAPAESVFVPFCPELGREKDNEATYRPIFISKTFSDNGVPYDFVVLEK RRKTDDAATAEPSNAMSSLTSTRETTPVHGLQAPSSAAAIAPVLAWMDEE DRKKREQKELIRAVPHVHFRGHEEFQYLDLIADIINNGRTMDDRT

Terminology

30,000 ft View - Synteny I II III IV Species 1 A B B D F C E Species 2 A B B D F C E

Synteny among Plasmodia

Homology Early Globin Gene α-chain gene Gene Duplication β-chain gene α frog α chick α mouse β mouse β chick β frog PARALOGS ORTHOLOGS ORTHOLOGS HOMOLOGS

Synteny shows relationships in positioning: Ontologies show relationships in meaning The Gene Ontology GO provides terms to link genes with similar functions and/or locations in the cell. An ontology was needed because the cultural traditions in different organisms led to different gene naming schemes that made it difficult to identify orthologous genes with the same function.

For Example: D. melanogaster gene CG3340 annotated as: Kruppel and P. falciparum gene PF3D7_1209300 annotated a putative KROX1 Can both be annotated with GO term: GO:0003705 (RNA polymerase II distal enhancer sequence-specific DNA binding transcription factor activity) Both proteins, functionally, are Zinc Fingers despite their different names

Note that the Gene Ontologies themselves contain only information about terms in the ontology and their relationships to other terms

Expression Profiles (RNA and Protein) The pattern of expression of one or more genes over time or a set of experimental conditions, e.g. during development or a drug treatment or in a genetic mutant such as a gene knock-out. Always has a time and location component

RNA expression RNA-Seq (NGS) Little sequence bias Quantitative Usually are strandspecific Can be used to identify UTR s and exon splice junctions Expressed Sequence Tags, ESTs Usually represent partial cdna Often clustered Come from libraries that may, or may not be normalized Often used to identify genes in genomes and locations of introns SAGE tags Serial Analysis of Gene Expression

30,000 ft View RNA-Seq Annotation of genome features I II III IV A B B D F RNA-Seq reads C E FPKM = Fragments per kilobase of exon per million fragments mapped

Clustered Microarray Data Genes with Similar Expression Profiles are Grouped together

Genes can be located on either DNA strand Convention - Gene location = non-template strand, i.e. same as the mrna

Overview of transcription: Either strand can serve as a template for a gene Figure 8-4

Complex patterns of eukaryotic mrna splicing: What is a Gene? Figure 8-14

Protein Expression/Sequence Data MW-Isoelectric point MW Sequence/spans Technology 2D gel electrophoresis Mass spectrometry Tandem MS (MS-MS, LC MS-MS etc) Typical 2 D gel

How Tandem MS Works Liquid chromatography Collision Inducted Dissociation (CID) Complex mixture Protein + + + + + + + + + + + + + + Ionized Peptides Isolation Fragmentation Measurement Peptides

Tandem MS protein data

Sequest Database Search Mass Spectrometer Protein Database Nucleic Acid Database EST Database Tandem Mass Spectrum Theoretical Mass Spectrum Correlation Analysis Ranked Score of Matched Peptides

Peptide database ENNPCKLQYDYNTNVTHGFGQEYPCETDIVERFSDTEGAQCDKKKIKDNSEGACAPYRRL HVCVRNLENINDYSKINNKHNLLVEVCLAAKYEGESITGRYPQHQETNPDTKSQLCTVLA RSFADIGDIIRGKDLYRGGNTKEKKKRKKLEENLKTIFGHIYDELKNGKTNGEEELQKRY RGDKDNDFYQLREDWWDANRETVWKAITCNAGSYQYSQPTCGRGEIPYVTLSKCQCIAGE VPTYFDYVPQYLRWFEEWAEDFCRKKKKKIPNVKTNCRQVQRGKEKYCDRDGYNCDGTIR KQYIYRLDTDCTKCSLACKTFAEWIDNQKEQFDKQKQKYQNEISGGGGRRQKRSTHSTKE YEGYEKHFNEELRNEGKDVRSFLQLLSKEKICKERIQVGEETANYGNFENESNTFSHTEY CDRCPLCGVDCSSDNCRKKPDKSCDEQITDKEYPPENTTKIPKLTAEKRKTGILKKYEKF CKNSDGNNGGQIKKWECHYEKNDKDDGNGDINNCIQGDWKTSKNVYYPISYYSFFYGSII DMLNESIEWRERLKSCINDAKLGKCRKGCKNPCECYKRWVEKKKDEWDKIKEFFRKQKDL LKDIAGMDAGELLEFYLENIFLEDMKNANGDPKVIEKFKEILGKENEEVQDPLKTKKTID DFLEKELNEAKNCVEKNPDNECPKQKAPGDGAAPSDPPREDITHHDGEHSSDEDEEEEEE EEQQPPAEGTEQGEEKSESKEVVEQQETPQKDTEKTVPTTTPTVDVCDTVKTALADTGSL NAACSLKYVTGKNYGWRCIAPSGTTSGKDGAICVPPRTQELCLYYLKELSDTTQKGLREA FIKTAAQETYLLWQKYKEDKQNETASTELDIDDPQTQLNGGEIPEDFKRQMFYTFGDYRD LFLGRYIGNDLDKVNNNITAVFQNGDHIPNGQKTDRQRQEFWGTYGKDIWKGMLCALQEA GGKKTLTETYNYSNVTFNGHLTGTKLNEFASRPSFLRWMTEWGDQFCRERITQLQILKER CMVYQYNGDKGKDDKKEKCTEACTYYKEWLTNWQDNYKKQNQRYTEVKGTSPYKEDSDVK ESKYAHGYLRKILKNIICTSGTDIAYCNCMEGTSTTDSSNNDNIPESLKYPPIEIEEGCT CKDPSPGEVIPEKKVPEPKVLPKPPKLPKRQPKERDFPTPALKNAMLSSTIMWSIGIGFA TFTYFYLKKKTKSTIDLLRVINIPKSDYDIPTKLSPNRYIPYTSGKYRGKRYIYLEGDSG TDSGYTDHYSDITSSSESEYEELDINDIYAPRAPKYKTLIEVVLEPSGNNTTASGNNTPS DTQNDIQNDGIPSSKITDNEWNTLKDEFISQYLQSEQPNDVPNDYSSGDIPLNTQPNTLY FDNPDEKPFITSIHDRDLYSGEEYSYNVNMVNTNNDIPISGKNGTYSGIDLINDSLNSNN Note: ORFs in addition to predicted Genes must be searched

30,000 ft View - Proteomics I II III IV A B B D F C E Proteomic Sequence spans

Mass Spectrometry can be used to measure metabolic and other chemical compounds

Complex mixtures can be analyzed and interpreted

Metabolites can be linked to metabolic pathways and enzymes

Alleles and Phenotype Some phenotypes are caused by a single locus in the genome and a single allele at that locus (e.g. some flower colors, or Drosophila eye color) Other phenotypes (Type-I diabetes, heart disease are multi-locus or complex (i.e. many genes are involved, each potentially with many alleles)

Homologous chromsomes (in a diploid) A a AAGCCTCATC ACGCCTCATC SNP =Single Nucleotide Polymorphism

30,000 ft View- NGS SNPs I II III IV NGS sequence reads from different clinical/environ mental isolates = SNP

Population data Data Single Nucleotide Polymorphisms, SNPs Alleles Allele frequency Haplotypes Technology Chip-Seq NGS

Alleles have frequencies in different populations

Populations and alleles have geographic boundaries A parasite isolate comes from a particular population, a particular location and will have a specific haplotype (e.g. representation of alleles) often characterized via SNPs

Bioinformatics uses algorithms Algorithms are sets of rules for solving problems or identifying patterns Algorithms can be general or case specific and often need to be trained Computational analysis, like wet-bench analyses are only as good as the tools, techniques and material allow, and all interpretations come with caveats (like the experimental conditions, often call parameters in bioinformatics.

How to find an intron Usually begins with GT and end with AG Must be longer than 19 nucleotides Must contain a branchpoint A Donor GT often followed by a sequence pattern. This pattern is species-specific Acceptor AG often proceeded by pyrimidine stretch Has a mean length of X as is observed in this species

Different prediction methods often generate different results Prediction 1 Prediction 2 We provide lots evidence so that you can decide or design an experiment to confirm!

Metadata The next Frontier Data about the data are critical What makes a data set valuable? (The reason it was generated but often this is missing) Introducing the data set How can you find the data set you need? Pull down Menu? A search of data set properties? Data generator Clinical outcome Geographic location Phenotype

Data sharing standards

The End If you have questions, I and the other instructors will be around and we are happy to talk to you. These slides are available to you as a PDF on the workshop web site.