Welcome! EuPathDB Workshop Crash Course in Omics Terminology, Concepts & Data Types

Hasonló dokumentumok
Welcome! EuPathDB Workshop 2010

Crash Course in Omics Terminology and Concepts

Crash Course in Omics Terminology, Concepts & Data Types

Crash Course in Omics Terminology and Concepts. Genome assembly. Clone-by-Clone Genome Sequencing. Shotgun DNA Sequencing (Technology)

Crash Course in Omics Terminology, Concepts & Data Types

Crash Course in Omics Terminology, Concepts & Data Types

Phenotype. Genotype. It is like any other experiment! What is a bioinformatics experiment? Remember the Goal. Infectious Disease Paradigm

Trinucleotide Repeat Diseases: CRISPR Cas9 PacBio no PCR Sequencing MFMER slide-1

SOLiD Technology. library preparation & Sequencing Chemistry (sequencing by ligation!) Imaging and analysis. Application specific sample preparation

Mapping Sequencing Reads to a Reference Genome

discosnp demo - Peterlongo Pierre 1 DISCOSNP++: Live demo

Supplementary Figure 1

Bioinformatics: Blending. Biology and Computer Science

Tutorial 1 The Central Dogma of molecular biology

Genome 373: Hidden Markov Models I. Doug Fowler

Forensic SNP Genotyping using Nanopore MinION Sequencing

Supplementary Table 1. Cystometric parameters in sham-operated wild type and Trpv4 -/- rats during saline infusion and

Supplemental Table S1. Overview of MYB transcription factor genes analyzed for expression in red and pink tomato fruit.

Nan Wang, Qingming Dong, Jingjing Li, Rohit K. Jangra, Meiyun Fan, Allan R. Brasier, Stanley M. Lemon, Lawrence M. Pfeffer, Kui Li

Cluster Analysis. Potyó László

Flowering time. Col C24 Cvi C24xCol C24xCvi ColxCvi

Supporting Information

Supporting Information

Expression analysis of PIN genes in root tips and nodules of Lotus japonicus

Markerless Escherichia coli rrn Deletion Strains for Genetic Determination of Ribosomal Binding Sites

Correlation & Linear Regression in SPSS

Rezgésdiagnosztika. Diagnosztika

Supporting Information

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Factor Analysis

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Correlation & Linear. Petra Petrovics.

tccattaattcgacagaccagagttaaataatccttgtatgccattgtgatcacatctacagttcagattttgtatttca

Correlation & Linear Regression in SPSS

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Nonparametric Tests

(A F) H&E staining of the duodenum in the indicated genotypes at E16.5 (A, D), E18.5

The beet R locus encodes a new cytochrome P450 required for red. betalain production.

7 th Iron Smelting Symposium 2010, Holland

Limitations and challenges of genetic barcode quantification

Performance Modeling of Intelligent Car Parking Systems

Statistical Inference

10. Genomika 2. Microarrayek és típusaik

Involvement of ER Stress in Dysmyelination of Pelizaeus-Merzbacher Disease with PLP1 Missense Mutations Shown by ipsc-derived Oligodendrocytes

Supplemental Information. Investigation of Penicillin Binding Protein. (PBP)-like Peptide Cyclase and Hydrolase

A HUMÁN GENOM PROJEKT Sasvári-Székely Mária* Semmelweis Egyetem, Orvosi Vegytani, Molekuláris Biológiai és Pathobiokémiai Intézet

Klaszterezés, 2. rész

Supplementary materials to: Whole-mount single molecule FISH method for zebrafish embryo

Expansion of Red Deer and afforestation in Hungary

A cell-based screening system for RNA Polymerase I inhibitors

DI-, TETRA- ÉS HEXAPLOID TRITICUM FAJOK GENOMJAINAK ELEMZÉSE ÉS AZOK ÖSSZEHASONLÍTÁSA FLUORESZCENS IN SITU HIBRIDIZÁCIÓVAL

On The Number Of Slim Semimodular Lattices

EN United in diversity EN A8-0206/419. Amendment

Lecture 11: Genetic Algorithms

University of Bristol - Explore Bristol Research

SZTE-ELTE PROTEOMIKAI INNOVÁCIÓ: KONCEPCIÓ, EREDMÉNYEK, JÖVŐKÉP

A rosszindulatú daganatos halálozás változása 1975 és 2001 között Magyarországon

Manuscript Title: Identification of a thermostable fungal lytic polysaccharide monooxygenase and

Szoftver-technológia II. Tervezési minták. Irodalom. Szoftver-technológia II.

FAMILY STRUCTURES THROUGH THE LIFE CYCLE

CLUSTALW Multiple Sequence Alignment

Orvosi Genomtudomány 2014 Medical Genomics Április 8 Május 22 8th April 22nd May

Architecture of the Trypanosome RNA Editing Accessory Complex, MRB1

Whole mount in situ hybridization, sectioning, and immunostaining Ribonuclease protection assay (RPA) Collagen gel tube formation assay

GENERATÍV TEST (VIRÁGOS NÖVÉNYEK)

Proxer 7 Manager szoftver felhasználói leírás

Bevezetés a kvantum-informatikába és kommunikációba 2015/2016 tavasz

3. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Diagnosztikai célú molekuláris biológiai vizsgálatok

AZ ÁRPA SZÁRAZSÁGTŰRÉSÉNEK VIZSGÁLATA: QTL- ÉS ASSZOCIÁCIÓS ANALÍZIS, MARKER ALAPÚ SZELEKCIÓ, TILLING

Cashback 2015 Deposit Promotion teljes szabályzat

Eredeti gyógyszerkutatás. ELTE TTK vegyészhallgatók számára Dr Arányi Péter 2009 március, 4.ea. Szerkezet optimalizálás (I.)

Nagy áteresztő képességű SNP mérések hasznosítása. Dr. Szalai Csaba

Modular Optimization of Hemicellulose-utilizing Pathway in. Corynebacterium glutamicum for Consolidated Bioprocessing of

Report on esi Scientific plans 7 th EU Framework Program. José Castell Vice-President ecopa, ES

Pletykaalapú gépi tanulás teljesen elosztott környezetben

4 vana, vanb, vanc1, vanc2

A modern e-learning lehetőségei a tűzoltók oktatásának fejlesztésében. Dicse Jenő üzletfejlesztési igazgató

USER MANUAL Guest user

PIACI HIRDETMÉNY / MARKET NOTICE

Ensemble Kalman Filters Part 1: The basics

FÖLDRAJZ ANGOL NYELVEN

Angol Középfokú Nyelvvizsgázók Bibliája: Nyelvtani összefoglalás, 30 kidolgozott szóbeli tétel, esszé és minta levelek + rendhagyó igék jelentéssel

Statistical Dependence

A évi fizikai Nobel-díj

Supplemental Information. RNase H1-Dependent Antisense Oligonucleotides. Are Robustly Active in Directing RNA Cleavage

Construction of a cube given with its centre and a sideline

pjnc-pgluc Codon-optimized Gaussia luciferase (pgluc) Vector backbone: JM83 pjnc-pgluc is resistant to ampicillin and neomycin High or low copy:

Computer Architecture

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Hypothesis Testing. Petra Petrovics.

(NGB_TA024_1) MÉRÉSI JEGYZŐKÖNYV

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Nonparametric Tests. Petra Petrovics.

Szívkatéterek hajlékonysága, meghajlítása

INDEXSTRUKTÚRÁK III.

16F628A megszakítás kezelése

T-helper Type 2 driven Inflammation Defines Major Subphenotypes of Asthma

ELEKTRONIKAI ALAPISMERETEK ANGOL NYELVEN

Juhász Angéla MTA ATK MI Alkalmazott Genomikai Osztály SZEKVENCIA ADATBÁZISOK

Can/be able to. Using Can in Present, Past, and Future. A Can jelen, múlt és jövő idejű használata

VL Bioinformatik für Nebenfächler SS2018 Woche 2

1. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Felnőttképzés Európában

Sebastián Sáez Senior Trade Economist INTERNATIONAL TRADE DEPARTMENT WORLD BANK

Átírás:

Welcome! EuPathDB Workshop 2012 Crash Course in Omics Terminology, Concepts & Data Types Jessie Kissinger June 17, 2012 1

Protein Expression! Interactions! Genome and Annotation! RNA Sequencing! Structure! BLAST Similarities! SNPs! Expression! ChIP-Chip! profiling arrays! Putative (many platforms)! Function (GO)! Phylogenetics! Orthologs! Subcellular Location! Comparative! Genomics! Phenotype! Pathways! Isolates! Genome assembly 5X random genome shotgun Library insert size Paired-end? (Mated end pairs) Contigs Scaffolds 2

Pairs Give Order & Orientation Scaffold Gaps in scaffolds are traditionally indicated by 100 N s 2-pair End Reads (Mates) 550bp Mean & Std.Dev. is known Primer Plasmid Fosmid Consensus Reads NGS Distance? SEQUENCE SNPs Anatomy of a WGS Assembly STS Chromosome STS-mapped Scaffolds Contig Read pair (mates) Gap (mean & std. dev. Known) Consensus Reads SNPs 3

4 AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTGGTCACGTGT AACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACCAGCGAGGCGCTCGCTTTG CTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGAAACTGGCGGTCACAGGCACCAGATAAC GCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTCTTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGC GCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCATGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCG TTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTCAGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCC ACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTGTATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCG CACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGGACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTC AAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCGCAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGA AAATGACGCCATTTTTGGGAAAATCGGGGAACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCAC TTTCCGTCTCTCCTCCGTGCTTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGA GCACTGCTCTGAGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATG ACCCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGG GTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAAAGCATGCCTCGAAAGT TTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCTCCAAGAGTGTGGACGCTGTTCCACG TCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCCGGGGGGCACACAACTATCCTCAACTCTCGAACGA ACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGATTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCC GATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACTATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGT TTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATCAGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAG TCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAG AAACGTACTGAAACTGTATACATGTATATACAGATGTATGGATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGC GAGAGGGACGCCCGTTATGCTGTGTGATGTGGCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTC TATTCCCTTCACTGCGAAAGCGCGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCT CTCTTTCCTGTTGGTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCT GTACATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTG TGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACGACTTTGTGGTTCTC GAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAACACGAAGAGAAGGGAAAATGCGGA GAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTGTAGAAATAACTGCGACCCTGGAGACAGAGATG CCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAGTCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGC CTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGTTTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGT CCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTG ATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTG CGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCAAGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTT CTTCAGTCTTTCCATCTTCCCCTGTTACCTCTGTCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGA CAAGCAACTGTCATTTGTACGCGCCTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAA GAGGCGAAGGAACAAAGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTG TTTAGGCGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGTGAA TGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGTAGACTAACGCAC GAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGCGCGCACTGGTTTGCATGTGGC CTGAGAGAGCACAGATCGAAGGTGGGGTGATGTGGCGTCGCTGCAGAGAAACTCCGGCGAAGGCGACAGATAAAGGAGAGTGGAAATCATTGAACAGTGTCGGTCGTCTGTTGTTTCGC AGGGTCCTCGAAGAGTTGCTGTGGTTCATTCGCGGCGACACGAACGCAAACCATCTTTCTGAGAAGGGCGTGAAGGCAAGTCTACGTTGTACCTCTTGTCTCTGCCGAAGCTCAGATGT CTCCACGGCGTTGGTTTCTTTTCGTTTTTGCTTTCGTGGCATTACCATCGAGTCACCACTCATAGTTGCGTGTGTCTACATGTTTTCTAGAACGTCCGTTGTGTTGCCTCGTGGCGACC Six Frame Translation ORF-finding ORFs Genes

5 AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTG GTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACC AGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGA AACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTC TTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCA TGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTC AGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTG TATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGG ACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCG CAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGG AACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGC TTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTG AGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGAC CCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCC TGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAA AGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCT CCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCC GGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGA TTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACT ATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATC AGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAA GGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATG GATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTG GCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCG CGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTG GTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTAC ATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGA GTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACG ACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAAC ACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTG TAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAG TCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGT TTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTG CCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTAC CTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCA AGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTG TCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGC CTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAA AGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGG CGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGT GAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGT AGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGC AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTG GTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACC AGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGA AACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTC TTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCA TGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTC AGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTG TATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGG ACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCG CAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGG AACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGC TTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTG AGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGAC CCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCC TGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAA AGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCT CCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCC GGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGA TTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACT ATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATC AGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAA GGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATG GATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTG GCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCG CGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTG GTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTAC ATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGA GTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACG ACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAAC ACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTG TAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAG TCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGT TTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTG CCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTAC CTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCA AGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTG TCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGC CTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAA AGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGG CGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGT GAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGT AGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGC

6 AAGCTTCGCCAGGCTGTAAATCCCGTGAGTCGTCCTCACAAATCATCAAGCAGGTGTCCTCAGGGAGACTGCCTGACTGAGTTATGCTAATTCCTTTCTACTTTGGCGTG GTCACGTGTAACCATATCCGAATCATTTCTCTAGCCCTACGAACAGGTAAGAGCGCTAGGGATGTCCGTGGAGTAGTGTGCTTACTCGATAATATTCAGTTGGGACTACC AGCGAGGCGCTCGCTTTGCTCACGCAATGCCTGAGACAGTTGCAGAATGAATGGTAACCGACAAACGCGTTCATATGCGTTTTCAAACTTAGTAGACGCGTACTGTCTGA AACTGGCGGTCACAGGCACCAGATAACGCCCTTGGCATCGGCATGTCTCGTACAGAGGTCCGTATGTAGTGCCACGACTTCTAAATCCGGCGACAGGCTGGTCTTTTGTC TTACCACGTATTAGCCCGCGTGCGATTTCTCGGAGCGCACCTGTTCAACACTAGAAAACGGAGTTTCCTGATCGAGAAGCCACCACCTTTCCAGAAGTTGAACGCTAGCA TGTCATTCGATTTTCACCCCCCGCGTAGTTCCTGTGTGTCATTCGTTGTCGAGACAACTCTGTCCCGCCCCGGTGCTGTTCCATATGCGTGACTTTCCCGCAATTTTTTC AGACTTTCAGGAAAGACAGGCTCCGGAACGATCTCGTCCATGACTGGTAAATCCACGACACCGCAATGGCCCCCAGCACCTCTATCTCTCGTGCCAGGGGACTAACGTTG TATGCGTCTGCGTCTTGTCTTTTTGCATTCGCTTTCCAAAAAAGAGAGCCATCCGTTCCCCCGCACATTCAACGCCGCGAGTGCGGTTTTTGTCTTTTTTGAGTGGTAGG ACGCTTTTCATGCGCGAACTACGTGGACATTAAGTTCCATTCTCTTTTTCGACAGCACGAAACCTTGCATTCAAACCCGCCCGCGGAAGATCCGATCTTGCTGCTGTTCG CAGTCCCAGTAGCGTCCTGTCGGCCGCGCCGTCTCTGTTGGTGGGCAGCCGCTACACCTGTTATCTGACTGCCGTGCGCGAAAATGACGCCATTTTTGGGAAAATCGGGG AACTTCATTCTTTAAAAGTATGCGGAGGTTTCCTTTTTCTTCTGTTCGTTTCTTTTTCTCGGGTTTGATAACCGTGTTCGATGTAAGCACTTTCCGTCTCTCCTCCGTGC TTTGTTCGACATCGAGACCAGGTGTGCAGATCCTTCGCTTGTCGATCCGGAGACGCGTGTCTCGTAGAACCTTTTCATTTTACCACACGGCAGTGCGGAGCACTGCTCTG AGTGCAGCAGGGACGGGTGAAGTTTCGCTTTAGTAGTGCGTTTCTGCTCTACGGGGCGTTGTCGTGTCTGGGAAGATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGAC CCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCCACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCC TGAACGGGTGGCTTCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATGGGACGGAAAACCTGGGAA AGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCCTGTGAGCACACACAGTAGTCGCCACACGCTGTTTGAGACGTGTCAATCT CCAAGAGTGTGGACGCTGTTCCACGTCTTCAAATGTTTCCCAACATCCGTCGTCTAGTAGACACACCAACAAAAAGCACACGGCGAATCTGCTCATCGGAGGGAGGAGCC GGGGGGCACACAACTATCCTCAACTCTCGAACGAACATATCCGGGGCCGCGAAGACGTCCAGTCTCTCAAATCCAACCCGGAACGCAAACATTTCTGCATCAAGTCACGA TTGCGCCGGTACCTCCATGTGTAAGCAGTTCCATGAAACCTCCGATATTACACACGACTGTGGATATGAATTATATGCAGATGCATATATACTGAGACGCCGATGCAACT ATAGGTTTCCTGGCCCTCCATGGATATTTCAGACCTTCCTCTCACATTTGGTTTGCCCGTACACCTCCGTTACGCTTTTTTTCTGGCTTTCTTCTTCGTCTCTGTTTATC AGCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGCAGCTCTCAGCCTTCTGGAGGAAGAGTACAA GGATTCTGTCGACCAGATTTTTGTCGTGGGTATGTTGTCCTAAACTCCTTGGAACTCCATTCTTGGTCAGAAACGTACTGAAACTGTATACATGTATATACAGATGTATG GATAATATCTAGAGAAGATACAGGGAAGACTGGCAAGGATGAAAAGACATGCAGCTTTAACGAAGCAGAGGGCATTGGCGAGAGGGACGCCCGTTATGCTGTGTGATGTG GCTGTGAATCTTACCTCGCCGTTTGACTTGCTGCAGCGCTTTGTCCACTTGAACGTGACTTCTTGTTTCTACCTTCCCCAACGCCTTCTATTCCCTTCACTGCGAAAGCG CGCTCAGTGGGCCGTCACCGAACACCCTTGGTTCTTTCGTTCAGCTGTTGTCCTCTTTCTCGCGTTGCTTCCTGTGGCGTCGTGGCTCGGCTTCTCTCTCTTTCCTGTTG GTGCGTCCAGACTATGTCGCCTGTTTCCCCACCCTTCTCGGCTTGTGCTTTCAGGAGGAGCGGGACTGTACGAGGCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTAC ATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCCCTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGA GTCTGTGTTCGTTCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCAGACAACGGGGTACCCTACG ACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGGTAAGAGGCAACCGAAGCGCGTAGATAAGAAAAACAACAAAGAGAAGGTGAAAC ACGAAGAGAAGGGAAAATGCGGAGAAACCGTGGATTTACAAAGATATCAAGAGCAATGCTTTGTGGAGATTTTTTTTAATTCAGTAGAGACACCCGCCGTGCGAGGTGTG TAGAAATAACTGCGACCCTGGAGACAGAGATGCCGCGAGTACACCACTTGTCGTTTTTCCTCCTATGTTCATGACGGGTGCTGAACGTCTATCGTACTTAATTGGAGGAG TCGTCTCCGAAGCAGCTTTGGCTGGCCATCCGTGTGTTTGCCTTGTTCCTGAAAAGCCAGAAGGCGCTCCACAGTGAGGCGATATACAGGGACGCCTACCGGAGCCCCGT TTTCTGCCTTTGTCGACTCTTGCAGAGCAACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCATTG CCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCATGTTCACTTTAGAGGCCATGAAGAATTCCAGTAC CTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATGACCGAACGGGTAACGGCGACTGCGAGAAAAAGCCACACCGTTTTCTCCTGTGATTCTGTCCGCA AGCCCTCTTTTGCTTCATCCACCCTTTGCTATTCTCCGCCGCCTTCCTTTTCTGCTCCATGTTCAATTCGTTCGCTTCTTCAGTCTTTCCATCTTCCCCTGTTACCTCTG TCATTCGTTTTCTTGCCTCTATTTAACTGTGTTCTACTCACAGTCTGCATTCCGCGATAGACGAGCTTCCACGTCTTGCGTCTCGACAAGCAACTGTCATTTGTACGCGC CTCCCTCCACCGTGAATCGGATTGTCGGTTCGCCGGTTCCTGGGTCAGAAAAGGCCTGCGCCAGTATTCTGAATAATACCCTTCGCCATTGTAAAGAGGCGAAGGAACAA AGAGATATTTCGGCGCATCTTTTGTGCGGCGCGTTTCCTCGTGCTTCACACCGATGCCCTTCTGTGCATGTCTTCTGCTCCTCGTCCTTCTCTCTTTTTCCCTGTTTAGG CGTTGGTGTCATCTCCAAATTCGGCTGCACTATGCGCTACTCGCTGGATCAGGCCTTTCCACTTCTCACCACAAAGCGTGTGTTCTGGAAAGGGTAAGGGCGTCTTCAGT GAATGCATATATTTGACTTCAGACATTCTTAACTGTTTGACAACCAACGTACAAATTTGTTTGTCCGTGTGCGTGTTCGACATGTCAAGTATGTGAAGAGTCGCTACTGT AGACTAACGCACGAACCAGATTTGTTTATCTGCATGCGCTGTGCACCCGTTTCTGAGTGTCTGGAGTTTCCGCAACCTTCCTTTGAATTTCTGGGTTCGTTTTTTTATGC ATGCAGAAACCGGTGTGTCTGGTCGTCGCGATGACCCCCAAGAGGGGCATCGGCATCAACAACGGCCTCCCGTGGCCCC ACTTGACCACAGATTTCAAACACTTTTCTCGTGTGACAAAAACGACGCCCGAAGAAGCCAGTCGCCTGAACGGGTGGCT TCCCAGGAAATTTGCAAAGACGGGCGACTCTGGACTTCCCTCTCCATCAGTCGGCAAGAGATTCAACGCCGTTGTCATG GGACGGAAAACCTGGGAAAGCATGCCTCGAAAGTTTAGACCCCTCGTGGACAGATTGAACATCGTCGTTTCCTCTTCCC TCAAAGAAGAAGACATTGCGGCGGAGAAGCCTCAAGCTGAAGGCCAGCAGCGCGTCCGAGTCTGTGCTTCACTCCCAGC AGCTCTCAGCCTTCTGGAGGAAGAGTACAAGGATTCTGTCGACCAGATTTTTGTCGTGGGAGGAGCGGGACTGTACGAG GCAGCGCTGTCTCTGGGCGTTGCCTCTCACCTGTACATCACGCGTGTAGCCCGCGAGTTTCCGTGCGACGTTTTCTTCC CTGCGTTCCCCGGAGATGACATTCTTTCAAACAAATCAACTGCTGCGCAGGCTGCAGCTCCTGCCGAGTCTGTGTTCGT TCCCTTTTGTCCGGAGCTCGGAAGAGAGAAGGACAATGAAGCGACGTATCGACCCATCTTCATTTCCAAGACCTTCTCA GACAACGGGGTACCCTACGACTTTGTGGTTCTCGAGAAGAGAAGGAAGACTGACGACGCAGCCACTGCGGAACCGAGCA ACGCAATGAGCTCCTTGACGTCCACGAGGGAGACAACTCCCGTGCACGGGTTGCAGGCTCCTTCTTCGGCCGCAGCCAT TGCCCCGGTGTTGGCGTGGATGGACGAAGAAGACCGGAAAAAACGCGAGCAAAAGGAACTGATTCGGGCCGTTCCGCAT GTTCACTTTAGAGGCCATGAAGAATTCCAGTACCTTGATCTCATTGCCGACATTATTAACAATGGAAGGACAATGGATG ACCGAACGG >Translation Frame 1 MQKPVCLVVAMTPKRGIGINNGLPWPHLTTDFKHFSRVTKTTPEEASRLN GWLPRKFAKTGDSGLPSPSVGKRFNAVVMGRKTWESMPRKFRPLVDRLNI VVSSSLKEEDIAAEKPQAEGQQRVRVCASLPAALSLLEEEYKDSVDQIFV VGGAGLYEAALSLGVASHLYITRVAREFPCDVFFPAFPGDDILSNKSTAA QAAAPAESVFVPFCPELGREKDNEATYRPIFISKTFSDNGVPYDFVVLEK RRKTDDAATAEPSNAMSSLTSTRETTPVHGLQAPSSAAAIAPVLAWMDEE DRKKREQKELIRAVPHVHFRGHEEFQYLDLIADIINNGRTMDDRT

Terminology Eukaryotic Relationships ca. 2005 Keeling, 2005 7

Synteny = large regions of chromosomes containing the same genes Synteny among Plasmodia 8

Homology Early Globin Gene α-chain gene Gene Duplication β-chain gene α frog α chick α mouse β mouse β chick β frog PARALOGS ORTHOLOGS ORTHOLOGS HOMOLOGS Evolutionary relationships Homology - related by evolutionary descent not equivalent to similarity Orthology - same gene in different organisms, e.g. alpha hemoglobin in humans and chimps Paralogy - genes within an organism related by gene duplication, e.g. alpha and beta hemoglobin in humans Xenology - genes related by gene transfer 9

RNA sequence/expression Data Data cdna Expressed Sequence Tags (EST) RNA-Seq Microarray Ditag (SAGE-tags) Small RNA s (various types) Technolgy Sanger Next-gen (454, Illumina, etc) Microarray-slides Microarray-chips SAGE Expression Profiles The pattern of expression of one or more genes over time or a set of experimental conditions, e.g. during development or a drug treatment or in a genetic mutant such as a gene knock-out. Always has a time and space component 10

Microarrays cdna microarrays GeneChip in situ synthesized oligonucleotide arrays Oligomer (~70mer) arrays Experiments are almost always Competitions between conditions or stages The RNA samples from the test and the control are labeled with different colors in a reverse-transcription reaction and then hybridized, together, competitively to a slide or chip containing gene sequences in multiple copies. 11

Ratios of experimental to control expression are often expressed as colors rather than numbers! Clustered Microarray Data Genes with Similar Expression Profiles are Grouped together 12

Other RNA expression Expressed Sequence Tags, ESTs Usually represent partial cdna Often clustered Come from libraries that may, or may not be normalized Often used to identify genes in genomes and locations of introns SAGE-tags (Serial Analysis of Gene Expression) Primary purpose is relative levels of gene expression RNA-Seq (NGS) Little sequence bias Quantitative Can be strandspecific Genes can be located on either DNA strand 13

Overview of transcription: Either strand can serve as a template for a gene Figure 8-4 Sequences of DNA and transcribed RNA Convention Gene location = non-template strand, i.e. same as the mrna 14

Complex patterns of eukaryotic mrna splicing: What is a Gene? Figure 8-14 Bioinformatics uses algorithms Algorithms are sets of rules for solving problems or identifying patterns Algorithms can be general or case specific and often need to be trained Computational analysis, like wet-bench analyses are only as good as the tools, techniques and material allow, and all interpretations come with caveats (like the experimental conditions, often call parameters in bioinformatics. 15

How to find an intron Usually begins with GT and end with AG Must be longer than 19 nucleotides Must contain a branchpoint A Donor GT often followed by a sequence pattern. This pattern is species-specific Acceptor AG often proceeded by pyrimidine stretch Has a mean length of X as is observed in this species 16

Different prediction methods often generate different results Prediction 1 Prediction 2 Protein Expression/Sequence Data MW-Isoelectric point MW Sequence/spans Technology 2D gel electrophoresis Mass spectrometry Tandem MS (MS-MS, LC MS-MS etc) Typical 2 D gel 17

High throughput mass spectrometry Direct identification of proteins from biological sample. Capillary liquid chromatography apparatus (LC) coupled with... Electrospray tandem mass spectroscopy (MS/MS) Sequest, Mascot, or other software links mass spectra with genomic sequence database. How Tandem MS Works Liquid chromatography Collision Inducted Dissociation (CID) Complex mixture Protein + + + + + + + + + + + + + + Ionized Peptides Isolation Fragmentation Measurement Peptides 18

Sequest Database Search Mass Spectrometer Protein Database Nucleic Acid Database EST Database Tandem Mass Spectrum Theoretical Mass Spectrum Correlation Analysis Ranked Score of Matched Peptides Peptide database ENNPCKLQYDYNTNVTHGFGQEYPCETDIVERFSDTEGAQCDKKKIKDNSEGACAPYRRL! HVCVRNLENINDYSKINNKHNLLVEVCLAAKYEGESITGRYPQHQETNPDTKSQLCTVLA! RSFADIGDIIRGKDLYRGGNTKEKKKRKKLEENLKTIFGHIYDELKNGKTNGEEELQKRY! RGDKDNDFYQLREDWWDANRETVWKAITCNAGSYQYSQPTCGRGEIPYVTLSKCQCIAGE! VPTYFDYVPQYLRWFEEWAEDFCRKKKKKIPNVKTNCRQVQRGKEKYCDRDGYNCDGTIR! KQYIYRLDTDCTKCSLACKTFAEWIDNQKEQFDKQKQKYQNEISGGGGRRQKRSTHSTKE! YEGYEKHFNEELRNEGKDVRSFLQLLSKEKICKERIQVGEETANYGNFENESNTFSHTEY! CDRCPLCGVDCSSDNCRKKPDKSCDEQITDKEYPPENTTKIPKLTAEKRKTGILKKYEKF! CKNSDGNNGGQIKKWECHYEKNDKDDGNGDINNCIQGDWKTSKNVYYPISYYSFFYGSII! DMLNESIEWRERLKSCINDAKLGKCRKGCKNPCECYKRWVEKKKDEWDKIKEFFRKQKDL! LKDIAGMDAGELLEFYLENIFLEDMKNANGDPKVIEKFKEILGKENEEVQDPLKTKKTID! DFLEKELNEAKNCVEKNPDNECPKQKAPGDGAAPSDPPREDITHHDGEHSSDEDEEEEEE! EEQQPPAEGTEQGEEKSESKEVVEQQETPQKDTEKTVPTTTPTVDVCDTVKTALADTGSL! NAACSLKYVTGKNYGWRCIAPSGTTSGKDGAICVPPRTQELCLYYLKELSDTTQKGLREA! FIKTAAQETYLLWQKYKEDKQNETASTELDIDDPQTQLNGGEIPEDFKRQMFYTFGDYRD! LFLGRYIGNDLDKVNNNITAVFQNGDHIPNGQKTDRQRQEFWGTYGKDIWKGMLCALQEA! GGKKTLTETYNYSNVTFNGHLTGTKLNEFASRPSFLRWMTEWGDQFCRERITQLQILKER! CMVYQYNGDKGKDDKKEKCTEACTYYKEWLTNWQDNYKKQNQRYTEVKGTSPYKEDSDVK! ESKYAHGYLRKILKNIICTSGTDIAYCNCMEGTSTTDSSNNDNIPESLKYPPIEIEEGCT! CKDPSPGEVIPEKKVPEPKVLPKPPKLPKRQPKERDFPTPALKNAMLSSTIMWSIGIGFA! TFTYFYLKKKTKSTIDLLRVINIPKSDYDIPTKLSPNRYIPYTSGKYRGKRYIYLEGDSG! TDSGYTDHYSDITSSSESEYEELDINDIYAPRAPKYKTLIEVVLEPSGNNTTASGNNTPS! DTQNDIQNDGIPSSKITDNEWNTLKDEFISQYLQSEQPNDVPNDYSSGDIPLNTQPNTLY! FDNPDEKPFITSIHDRDLYSGEEYSYNVNMVNTNNDIPISGKNGTYSGIDLINDSLNSNN! Note: ORFs in addition to predicted Genes must be searched 19

Homologous chromsomes (in a diploid) Loci, alleles and SNPs in a population A! a! AAGCCTCATC! ACGCCTCATC! SNP =Single Nucleotide Polymorphism 20

Alleles and Phenotype Some phenotypes are caused by a single locus in the genome and a single allele at that locus (e.g. some flower colors, or Drosophila eye color) Other phenotypes (Type-I diabetes, heart disease are multi-locus or complex (i.e. many genes are involved, each potentially with many alleles) Population data Data Single Nucleotide Polymorphisms, SNPs Alleles Allele frequency Haplotypes Technology Chip-Seq NGS 21

Alleles have frequencies in different populations Populations and alleles have geographic boundaries A parasite isolate comes from a particular population, a particular location and will have a specific haplotype (e.g. representation of alleles) often characterized via SNPs 22

Parasite Isolates Data Species, Strain, Isolate Location, Date SNP Sequence Allele phenotpe Technology PCR-RFLP Sequencing SNP chip GPS Infectious Disease Paradigm Experimental systems Host Pathogen Vector 23

24