Genome 373: Hidden Markov Models I. Doug Fowler

Hasonló dokumentumok
Correlation & Linear Regression in SPSS

On The Number Of Slim Semimodular Lattices

Performance Modeling of Intelligent Car Parking Systems

Mapping Sequencing Reads to a Reference Genome

ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA I. VIZSGÁZTATÓI PÉLDÁNY

Lopocsi Istvánné MINTA DOLGOZATOK FELTÉTELES MONDATOK. (1 st, 2 nd, 3 rd CONDITIONAL) + ANSWER KEY PRESENT PERFECT + ANSWER KEY

Using the CW-Net in a user defined IP network

Angol Középfokú Nyelvvizsgázók Bibliája: Nyelvtani összefoglalás, 30 kidolgozott szóbeli tétel, esszé és minta levelek + rendhagyó igék jelentéssel

3. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Please stay here. Peter asked me to stay there. He asked me if I could do it then. Can you do it now?

ANGOL NYELVI SZINTFELMÉRŐ 2013 A CSOPORT. on of for from in by with up to at

Angol érettségi témakörök 12.KL, 13.KM, 12.F

Unit 10: In Context 55. In Context. What's the Exam Task? Mediation Task B 2: Translation of an informal letter from Hungarian to English.

Minta ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA II. Minta VIZSGÁZTATÓI PÉLDÁNY

Tutorial 1 The Central Dogma of molecular biology

(Asking for permission) (-hatok/-hetek?; Szabad ni? Lehet ni?) Az engedélykérés kifejezésére a következő segédigéket használhatjuk: vagy vagy vagy

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Correlation & Linear. Petra Petrovics.

1. MINTAFELADATSOR KÖZÉPSZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

Phenotype. Genotype. It is like any other experiment! What is a bioinformatics experiment? Remember the Goal. Infectious Disease Paradigm

ANGOL NYELV KÖZÉPSZINT SZÓBELI VIZSGA I. VIZSGÁZTATÓI PÉLDÁNY

Construction of a cube given with its centre and a sideline

Word and Polygon List for Obtuse Triangular Billiards II

Tudományos Ismeretterjesztő Társulat

Correlation & Linear Regression in SPSS

TestLine - Angol teszt Minta feladatsor

Intézményi IKI Gazdasági Nyelvi Vizsga

ANGOL NYELVI SZINTFELMÉRŐ 2012 A CSOPORT. to into after of about on for in at from

ANGOL NYELVI SZINTFELMÉRŐ 2014 A CSOPORT

Emelt szint SZÓBELI VIZSGA VIZSGÁZTATÓI PÉLDÁNY VIZSGÁZTATÓI. (A részfeladat tanulmányozására a vizsgázónak fél perc áll a rendelkezésére.

Cluster Analysis. Potyó László

Bevezetés a kvantum-informatikába és kommunikációba 2015/2016 tavasz

Bioinformatics: Blending. Biology and Computer Science

Supporting Information

Széchenyi István Egyetem

Eladni könnyedén? Oracle Sales Cloud. Horváth Tünde Principal Sales Consultant március 23.

Utasítások. Üzembe helyezés

Csima Judit április 9.

JEROMOS A BARATOM PDF

First experiences with Gd fuel assemblies in. Tamás Parkó, Botond Beliczai AER Symposium


Contact us Toll free (800) fax (800)

Relative Clauses Alárendelő mellékmondat

Dependency preservation

Felhívás. érted is amit olvasol? (Apostolok Cselekedetei 8:30)

KERÜLETI DIÁKHETEK VERSENYKIÍRÁS 2017.

Angol C nyelvi programkövetelmény

Angol szóbeli Információkérés

Tájékoztató a évi határon átnyúló pénzügyi fogyasztói jogviták rendezésével összefüggő és egyéb nemzetközi tevékenységről

Budapest By Vince Kiado, Klösz György

Markov chains Part 2. Prof. Noah Snavely CS1114

Utolsó frissítés / Last update: február Szerkesztő / Editor: Csatlós Árpádné

Előszó.2. Starter exercises. 3. Exercises for kids.. 9. Our comic...17

A modern e-learning lehetőségei a tűzoltók oktatásának fejlesztésében. Dicse Jenő üzletfejlesztési igazgató

EN United in diversity EN A8-0206/419. Amendment

Statistical Inference

USER MANUAL Guest user

3. MINTAFELADATSOR EMELT SZINT. Az írásbeli vizsga időtartama: 30 perc. III. Hallott szöveg értése

PONTOS IDŐ MEGADÁSA. Néha szükséges lehet megjelölni, hogy délelőtti vagy délutáni / esti időpontról van-e szó. Ezt kétféle képpen tehetjük meg:

T Á J É K O Z T A T Ó. A 1108INT számú nyomtatvány a webcímen a Letöltések Nyomtatványkitöltő programok fülön érhető el.

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet Nonparametric Tests

A teszt a következő diával indul! The test begins with the next slide!

Miskolci Egyetem Gazdaságtudományi Kar Üzleti Információgazdálkodási és Módszertani Intézet. Hypothesis Testing. Petra Petrovics.

EGY KIS ZŰRZAVAR. Lecke (Középhaladó 1. / 1.) SOMETIMES, SOMETIME VAGY SOME TIME?

EGYSZERŰNEK TŰNIK IV. RÉSZ

SQL/PSM kurzorok rész

mondat ami nélkül ne indulj el külföldre

PAST ÉS PAST PERFECT SUBJUNCTIVE (múlt idejű kötőmód)

Választási modellek 3

Daloló Fülelő Halász Judit Szabó T. Anna: Tatoktatok Javasolt nyelvi szint: A2 B1 / Resommended European Language Level: A2 B1

16F628A megszakítás kezelése

ANGOL MAGYAR PARBESZEDEK ES PDF

Travel Getting Around

Ensemble Kalman Filters Part 1: The basics

ANGOL NYELVI SZINTFELMÉRŐ 2008 A CSOPORT

Tavaszi Sporttábor / Spring Sports Camp május (péntek vasárnap) May 2016 (Friday Sunday)

discosnp demo - Peterlongo Pierre 1 DISCOSNP++: Live demo

Cashback 2015 Deposit Promotion teljes szabályzat

Tudok köszönni tegezve és önözve, és el tudok búcsúzni. I can greet people in formal and informal ways. I can also say goodbye to them.

7. osztály Angol nyelv

Utazás Szállás. Szállás - Keresés. Szállás - Foglalás. Útbaigazítás kérése. ... kiadó szoba?... a room to rent? szállásfajta.

Hogyan használja az OROS online pótalkatrész jegyzéket?

Utolsó frissítés / Last update: Szeptember / September Szerkesztő / Editor: Csatlós Árpádné

TÉRGAZDÁLKODÁS - A TÉR MINT VÉGES KÖZÖSSÉGI ERŐFORRÁS INGATLAN NYILVÁNTARTÁS - KÜLFÖLDI PÉLDÁK H.NAGY RÓBERT, HUNAGI

Rezgésdiagnosztika. Diagnosztika

MATEMATIKA ANGOL NYELVEN

EXKLUZÍV AJÁNDÉKANYAGOD A Phrasal Verb hadsereg! 2. rész

Can/be able to. Using Can in Present, Past, and Future. A Can jelen, múlt és jövő idejű használata

Társasjáték az Instant Tanulókártya csomagokhoz

SZOFTVEREK A SORBANÁLLÁSI ELMÉLET OKTATÁSÁBAN

MINDENGYEREK KONFERENCIA

There is/are/were/was/will be

Regional Expert Meeting Livestock based Geographical Indication chains as an entry point to maintain agro-biodiversity

2-5 játékos részére, 10 éves kortól

Tudományos Ismeretterjesztő Társulat

2. MINTAFELADATSOR EMELT SZINT. Az írásbeli vizsga időtartama: 30 perc


ENROLLMENT FORM / BEIRATKOZÁSI ADATLAP

MATEMATIKA ANGOL NYELVEN

Cloud computing. Cloud computing. Dr. Bakonyi Péter.

MATEMATIKA ANGOL NYELVEN

Átírás:

Genome 373: Hidden Markov Models I Doug Fowler

Review From Gene Prediction I transcriptional start site G open reading frame transcriptional termination site promoter 5 untranslated region 3 untranslated region We briefly revisited what a gene is and what the key parts of genes are

Review From Gene Prediction I Given a sequence, we want to be able to predict the major features of genes in the sequence (e.g. create gene models) Start GCGGGGGGCCG GGGGCCGGGCGGGCCCCCCGCCGC CGGGGCCCGGGCGGCGGC GCCGGCCCCGCCCCCGCGG GGCCGCGGGGCGGGCCCC CGGCGCGGCCGGCGCCGGGCCC CCGCGCCCGCCCGG GGGCGGCCGCCCCGCCCGCGGCC CGGCGGCCGGGCCGGC GCGCCCCGCCGGCGG CCCCGCGGGGCCCGG GGGGCGCGGCCCGGCCGC GGCGGCCCGGGCGCCCGCCCCCCC CCGGGCCGCCGGCCGGCC GCGCGCGGCGGCCGCCCG GCGCGCCGGGGCGG GCGCGCGCCCCCGCCGGGC GGGCGGCCCCCCGGCCGCGGCCGG GCCGCCCGCCG CCCCCCGCCGGGGGGC GCCCCCCGGCCCCG CGCCCGCCCCCCCGGCGGG CCGCCCGC Exon 1 Intron 1 Exon 2 Stop GCGGGGGGCCG GGGGCCGGGCGGGCCCCCCGCCGC CGGGGCCCGGGCGGCGGC GCCGGCCCCGCCCCCGCGG GGCCGCGGGGCGGGCCCC CGGCGCGGCCGGCGCCGGGCCC CCGCGCCCGCCCGG GGGCGGCCGCCCCGCCCGCGGCC CGGCGGCCGGGCCGGC GCGCCCCGCCGGCGG CCCCGCGGGGCCCGG GGGGCGCGGCCCGGCCGC GGCGGCCCGGGCGCCCGCCCCCCC CCGGGCCGCCGGCCGGCC GCGCGCGGCGGCCGCCCG GCGCGCCGGGGCGG GCGCGCGCCCCCGCCGGGC GGGCGGCCCCCCGGCCGCGGCCGG GCCGCCCGCCG CCCCCCGCCGGGGGGC GCCCCCCGGCCCCG CGCCCGCCCCCCCGGCGGG CCGCCCGC

Review From Gene Prediction I We want a model that can predict whether each base in a sequence is in one of a known set of states (intergenic, start exon, intron, stop) Start GCGGGGGGCCG GGGGCCGGGCGGGCCCCCCGCCGC CGGGGCCCGGGCGGCGGC GCCGGCCCCGCCCCCGCGG GGCCGCGGGGCGGGCCCC CGGCGCGGCCGGCGCCGGGCCC CCGCGCCCGCCCGG GGGCGGCCGCCCCGCCCGCGGCC CGGCGGCCGGGCCGGC GCGCCCCGCCGGCGG CCCCGCGGGGCCCGG GGGGCGCGGCCCGGCCGC GGCGGCCCGGGCGCCCGCCCCCCC CCGGGCCGCCGGCCGGCC GCGCGCGGCGGCCGCCCG GCGCGCCGGGGCGG GCGCGCGCCCCCGCCGGGC GGGCGGCCCCCCGGCCGCGGCCGG GCCGCCCGCCG CCCCCCGCCGGGGGGC GCCCCCCGGCCCCG CGCCCGCCCCCCCGGCGGG CCGCCCGC Exon 1 Intron 1 Exon 2 Stop GCGGGGGGCCG GGGGCCGGGCGGGCCCCCCGCCGC CGGGGCCCGGGCGGCGGC GCCGGCCCCGCCCCCGCGG GGCCGCGGGGCGGGCCCC CGGCGCGGCCGGCGCCGGGCCC CCGCGCCCGCCCGG GGGCGGCCGCCCCGCCCGCGGCC CGGCGGCCGGGCCGGC GCGCCCCGCCGGCGG CCCCGCGGGGCCCGG GGGGCGCGGCCCGGCCGC GGCGGCCCGGGCGCCCGCCCCCCC CCGGGCCGCCGGCCGGCC GCGCGCGGCGGCCGCCCG GCGCGCCGGGGCGG GCGCGCGCCCCCGCCGGGC GGGCGGCCCCCCGGCCGCGGCCGG GCCGCCCGCCG CCCCCCGCCGGGGGGC GCCCCCCGGCCCCG CGCCCGCCCCCCCGGCGGG CCGCCCGC

n d hoc Model We could just build an ad hoc model that would incorporate each of the pieces of information we talked about last time (e.g. start, stop, length of ORF, splice site motifs, etc)

n d hoc Model We could just build an ad hoc model that would incorporate each of the pieces of information we talked about last time (e.g. start, stop, length of ORF, splice site motifs, etc) For example, we could label all starts, stops and potential ORFs. hen we could slide across 100 base pair windows and compute the probability of splice site motifs. Finally, we could combine these two pieces of information to find genes

n d hoc Model We could just build an ad hoc model that would incorporate each of the pieces of information we talked about last time (e.g. start, stop, length of ORF, splice site motifs, etc) For example, we could label all starts, stops and potential ORFs. hen we could slide across 100 base pair windows and compute the probability of splice site motifs. Finally, we could combine these two pieces of information to find genes What are the problems here?

n d hoc Model We could just build an ad hoc model that would incorporate each of the pieces of information we talked about last time (e.g. start, stop, length of ORF, splice site motifs, etc) Many problems arise with this strategy: How should we weight each part of the model? What happens if we want to add new information (alternative splicing, etc)? d hoc models get messy very quickly!

n Overview of Markov Models Markov models are a formal framework for assigning states to a linear sequence of symbols (like DN) GGCGG state = start state = stop

n Overview of Markov Models Markov models are a formal framework for assigning states to a linear sequence of symbols (like DN) state = start GGCGG state = stop Markov models are probabalistic, meaning that we can use them to pick out the most likely states for a particular sequence

n Overview of Markov Models Markov models are a formal framework for assigning states to a linear sequence of symbols (like DN) state = start GGCGG state = stop Markov models are probabalistic, meaning that we can use them to pick out the most likely states for a particular sequence (this is exactly what we want to do to find genes!)

n Overview of Markov Models Markov models are a formal framework for assigning states to a linear sequence of symbols (like DN) state = start GGCGG state = stop Markov models are probabalistic, meaning that we can use them to pick out the most likely states for a particular sequence (this is exactly what we want to do to find genes!) Markov models have diverse applications in genomics including gene finding, sequence alignment, regulatory site identification, protein secondary structure prediction, etc

Outline Markov Chains/Models Hidden Markov Models

Markov Chain Markov chain is a random process of transitions from one state to another in a state space

Markov Chain 0.9 0.9 his model describes a Markov chain with two states, and Markov chain is a random process of transitions from one state to another in a state space

Markov Chain 0.9 0.9 here are four possible transitions: ->, ->, ->, -> Markov chain is a random process of transitions from one state to another in a state space

Markov Chain 0.9 0.9 he transitions describe the linear order in which we expect states to occur

Markov Chain 0.9 0.9 his model describes a sequence composed of s and s, and you could get any sequence from this model

Markov Chain What type of sequence would this model describe?

Markov Chain One that alternated between and

Markov Chain 0.9 nd this one?

Markov Chain 0.9 Runs of interrupted by one

Markov Model 0.9 0.9 Markov chain is a random process of transitions from one state to another in a state space In other words, transitions between states are probabilistic

Markov Model 0.9 0.9 Formally, a transition between states two states s and t is associated with a probability (a st, the transition probability) a st = P (x i = t x i 1 = s)

Markov Model 0.9 0.9 his expresses a key property of a Markov chain: the probability of any symbol x i depends only on the previous symbol x i-1 a st = P (x i = t x i 1 = s)

Markov Model 0.9 0.9 his is also referred to as the Markov property a st = P (x i = t x i 1 = s)

Markov Model 0.9 0.9 Given that we start with an, we can write down the probability of any sequence of symbols P (sequence) =0.9

Markov Model 0.9 0.9 Given that we start with an, we can write down the probability of any sequence of symbols P (sequence) =?

Markov Model 0.9 0.9 Given that we start with an, we can write down the probability of any sequence of symbols P (sequence) =

Markov Model 0.9 0.9 Given that we start with an, we can write down the probability of any sequence of symbols P (sequence) =0.9 0.9... 0.9

Markov Model 0.9 0.9 Formally, the probability of observing any particular sequence is the product of the transition probabilities for the sequence P (sequence) =P (x 1 ) LY i=2 a xi 1 x i

Markov Model 0.9 0.9 Probability for the beginning state Product of the second through the L th transition probabilities LY P (sequence) =P (x 1 ) i=2 a xi 1 x i

Markov Model Can ell Us the Most Likely Sequence 0.9 0.9 Which is the more likely sequence given our model?

Markov Model Can ell Us the Most Likely Sequence 0.9 0.9 Clearly, the first is more likely to occur and we can write down the exact probability of each! You all calculate them!

Markov Model Can ell Us the Most Likely Sequence 0.9 0.9 Clearly, the first is more likely to occur and we can write down the exact probability of each! P =0.9 6 =0.053 P = 7 =0.0000001

Markov Model Can ell Us the Most Likely Sequence 0.9 0.9 nd, starting with an what is the most likely eight symbol sequence of all?

Markov Model Can ell Us the Most Likely Sequence 0.9 0.9 nd, starting with an what is the most likely eight symbol sequence of all? P =0.9 7 =0.47

Beginning and Ending States in a Markov Model B 0.9 0.9 E We can add begin (B) and end (E) states with their own transition probabilities a Bs, a se

Beginning and Ending States in a Markov Model B 0.9 0.9 E What is the consequence of modeling the end state?

Beginning and Ending States in a Markov Model B 0.9 0.9 E What is the consequence of modeling the end state? We add sequence length to the model (there is a non-zero probability that the next state is end )

Outline Markov Chains Hidden Markov Models

What is Hidden in an HMM?

What is Hidden in an HMM? In our simple Markov model we had full knowledge of both the symbols (x i ) and the model states 0.9 0.9

What is Hidden in an HMM? In fact, they were identical and we talked about them interchangeably! 0.9 0.9 Symbols: States:

What is Hidden in an HMM? In our simple Markov model we had full knowledge of both the symbols (x i ) and the model states In a hidden Markov model (HMM), the model states are unknown (e.g. hidden from us) We will see that given a set of transition probabilities and a set of symbols we can use an HMM to identify the most likely sequence of states and that this will let us solve our gene finding problem!

HMM for vs. Rich Regions Let s extend our initial example to one where, given a sequence composed of s and s we want to discriminate between - rich and -rich regions

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Now we have a model where there are two states: rich (a) and rich (t)

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 he states no longer correspond directly to the symbols or. In an -rich region, for example, we ll still observe some s and vice versa.

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 he states no longer correspond directly to the symbols or. Instead, they are associated with emission probabilities that dictate the the frequency with which or will be observed.

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 hat is, when in the rich state the model will emit an 80% of the time and a 20% of the time

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Formally, we denote the probability that we will see the symbol b when the model is in state k: e k (b) =P (x i = b i = k) where π is the sequence of model states

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Just like before, we can use the model to generate sequence

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t t However, now multiple state paths (π) could give rise to a particular sequence

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t t State path #2: t t t t aaaa Given the model, transition probabilities, emission probabilities and a sequence of symbols we can begin to think about the most likely state path

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t t State path #2: t t t t aaaa Intuitively, it s pretty easy to figure out. Which of these two is the most likely?

HMM for vs. Rich Regions 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t t State path #2: t t t t aaaa Highly likely path Unlikely path his is the basic idea of an HMM: figure out the most likely state path given a sequence, a model and transition probabilities

Probability of a Given Sequence and State Path 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Formally, the joint probability of a given sequence x and a state path π is given by: Y L P (x, ) =a 0 1 i=1 e i (x i )a i i+1

Probability of a Given Sequence and State Path 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 P(initial state) P(emitting symbol x i in state π i ) L Y P(transition from state π i to state π i+1 ) P (x, ) =a 0 1 i=1 e i (x i )a i i+1

Example State Path Probability Calculation 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t t State path #2: t t t t aaaa P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 P (path 1 )=(0.8 0.9)... (0.8 )... (0.8 0.9) = 0.008 P (path 2 )=(0.2 0.9)... (0.2 )... (0.2 0.9) = 1.2 10 7 Let s start at the beginning. i = 1 and (,a) and (,t). We multiply the emission and transition probabilities.

Example State Path Probability Calculation 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t State path #2: t t t t aaaa P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 P (path 1 )=(0.8 0.9)... (0.8 )... (0.8 0.9) = 0.008 P (path 2 )=(0.2 0.9)... (0.2 )... (0.2 0.9) = 1.2 10 7 nd continue doing that for the whole sequence and each state path, getting the probability of each state path given the observed sequence

What Does the Most Likely Path Mean? 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t State path #2: t t t t aaaa P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 P (path 1 )=(0.8 0.9)... (0.8 )... (0.8 0.9) = 0.008 P (path 2 )=(0.2 0.9)... (0.2 )... (0.2 0.9) = 1.2 10 7 It turns out that state path #1 is the most likely path for this model. So, what can we say?

What Does the Most Likely Path Mean? 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t State path #2: t t t t aaaa P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 P (path 1 )=(0.8 0.9)... (0.8 )... (0.8 0.9) = 0.008 P (path 2 )=(0.2 0.9)... (0.2 )... (0.2 0.9) = 1.2 10 7 hat the first four positions in the sequence are likely from an rich region and the last four are from a rich region!

Example State Path Probability Calculation 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: Y L State path #1: a t aat t t a P (x, ) =a 0 1 i=1 e i (x i )a i i+1 Now, you all take a minute and try to calculate the likelihood of this state path given that the transition probability into the first state a (a 0 π1) is 1

Example State Path Probability Calculation 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: a t aat t t a State path #1: P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 P =1 (0.8 )(0.8 )(0.8 0.9)(0.2 )(0.8 0.9)(0.8 0.9)(0.8 )(0.2 1) P =7.6 10 7 Now, you all take a minute and try to calculate the likelihood of this state path given that the transition probabilities into the first state (a 0 π1) and to the end state are 1

Summary 0.9 0.9 P =0.9 7 =0.47 We learned that a Markov chain is a random process of transitions from one state to another in a state space, and that we could write down a model to describe a Markov chain We saw how a simple Markov model could generate the most likely sequence

Summary 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t State path #2: t t t t aaaa We learned that a Markov chain is a random process of transitions from one state to another in a state space, and that we could write down a model to describe a Markov chain We saw how a simple Markov model could generate the most likely sequence We learned that in a hidden Markov model, states are unknown to us and associated with a set of emission probabilities so that many different state paths can generate a given sequence

Summary 0.9 rich rich 0.9 : 0.8 : 0.2 : 0.2 : 0.8 Sequence: State path #1: aaaat t t State path #2: t t t t aaaa P (x, ) =a 0 1 L Y i=1 e i (x i )a i i+1 We learned that a Markov chain is a random process of transitions from one state to another in a state space, and that we could write down a model to describe a Markov chain We saw how a simple Markov model could generate the most likely sequence We learned that in a hidden Markov model, states are unknown to us and associated with a set of emission probabilities so that many different state paths can generate a given sequence We saw how we could use an HMM to calculate the probability of any (hidden) state path given a sequence

Next ime he Viterbi lgorithm (or, how can we find the most probable state path?) toy gene finding example Generate the a gene finding HMM