Egészségügyi informatika és biostatisztika Döntéstámogatás Antal Péter Computational Biomedicine (Combine) workgroup Department of Measurement and Information Systems, Budapest University of Technology and Economics 1
Témák és beosztás Hét Időpont Téma Előadó 1. szeptember 8. Eü adatok, adatelemzési folyamat. AP 2. szeptember 15. Az R adatelemzési nyelv alapjai I. GA 3. szeptember 22. Az R adatelemzési nyelv alapjai II. GA 4. szeptember 29. Schönherz Kupa 5. október 6. Orvosi döntéstámogatás. Biostatisztikai alapok I. Statisztikai minta, mintavételezés, statisztikai AP 6. október 13. erő számítása, populációk összehasonlítása. Biostatisztikai alapok II: hipotézistesztelés és konfidenciaintervallumok, HG 7. október 20. gyakori statisztikai tesztek. HG 8. október 27. Biostatisztikai alapok III: Túlélési elemzés. GA 9. november 3. Biostatisztikai alapok IV: a bayesi megközelítés. GA 10. november 10. Hálózati medicina és rendszerbiológia. HG 11. november 17. Egészségügyi kódrendszerek, EESZT. AP 12. november 24. 13. december 1. Biomarker kutatás: biomarker típusok, a jegykiválasztási probléma. A többszörös hipotézistesztelési probléma és megoldásai. AP 14. december 8. HF Bemutatás AP 2
Bayesi döntéstámogatás Áttekintés Valószínűségszámítási alapok, Bayes-szabály Valószínűségi gráfos hálózatok, Bayes-hálózatok Optimális döntés fogalma Döntési hálózatok klinikai alkalmazásai 3
An era of a new health care? 4
Orvosi döntéstámogató és szakértői rendszerek Watson for Oncology assessment and advice cycle www.avanteoconsulting.com/machine-learning-accelerates-cancer-research-discovery-innovation/
Orvosi döntéstámogatás alapjai Szakértői érvelés Szakértői döntések Racionális érvelés Racionális döntések 6
Kérdéstípusok az orvosi döntéstámogatásban Diagnosztikai következtetés P(Diagnózis Passzív megfigyelések) Legkisebb várható veszteségű diagnózis passzív megfigyelések esetén Optimális információgyűjtés További információ hatása a következtetésre: P(Diagnózis megfigyelések, új megfigyelés) További információ hasznossága Terápiás következtetés P(Kimenetel Megfigyelés, Beavatkozás) Kontrafaktuális következtetés P(ElképzeltKimenetel Megfigyelés, Beavatkozás,Kimenetel, ElképzeltBeavatkozás) 7
Kérdéstípusok az orvosi döntéstámogatásban II. Obs.: What is the probability that the patient recovers if he takes the drug x. Int.:What is the probability that the patient recovers if we prescribe* the drug x. Counter.: Given that the patient had not recovered for the drug x, what would have been the probability that patient recovers if we had prescribed* the drug x, instead of x. Imagery observations and interventions: We observed X=x, but imagine that x would have been observed: denoted as X =x. We set X=x, but imagine that x would have been set: denoted as do(x =x ). What is the relation of Observational p(q=q E=e, X=x ) Interventional p(q=q E=e, do(x=x )) Counterfactual p(q =q Q=q, E=e, do(x=x), do(x =x )) *: Assume that the patient is fully compliant. 8
Matematikai modellek a kérdéstípusokhoz 1. Representation of the joint distribution small number of parameters 2. Representation of independencies what is relevant for diagnosis 3. Representation of causal relations what is the effect of a treatment 4. Representation of possible worlds quantitave qualitative? Passive (observational) Active (interventional) Imagery (counterfactual) 9
Bayesi módszerek Vélekedések és döntési preferenciák egységes kvantitatív kerete Thomas Bayes (c. 1702 1761) Bayesi értelmezése a valószínűségnek Bayes-szabály Bayes-statisztika Bayes-döntés Bayesi modellátlagolás Bayes-hálók Önkalibráló... p( Modell Adat) p( Adat Modell) p( Modell) (G.E.P.Box: all models are wrong, but some are useful )
Interpretations of probability Sources of uncertainty inherent uncertainty in the physical process; inherent uncertainty at macroscopic level; ignorance; practical omissions; Interpretations of probabilities: combinatoric; physical propensities; N A frequentist; lim lim pˆ N ( A) p( A)? p( A N N N personal/subjectivist; instrumentalist; The three as if theorems: Uncertainty by probabilities Preferences by utility function Note: Optimal action by maximum expected utility principle Axioms in probability theory are the same (Kolmogorov) Independence and convergence of frequencies are empirical observations (e.g., laws of large numbers are consequences of some assumptions about independencies). )
A chronology [1713] Ars Conjectandi (The Art of Conjecture), Jacob Bernoulli Subjectivist interpretation of probabilities [1718] The Doctrine of Chances, Abraham de Moivre the first textbook on probability theory Forward predictions given a specified number of white and black balls in an urn, what is the probability of drawing a black ball? his own death [1764, posthumous] Essay Towards Solving a Problem in the Doctrine of Chances, Thomas Bayes Backward questions: given that one or more balls has been drawn, what can be said about the number of white and black balls in the urn [1812], Théorie analytique des probabilités, Pierre-Simon Laplace General Bayes rule [1921]: Correlation and causation, S. Wright s diagrams -1950 Frequentist statistics Ronald A. Fisher (J. Neyman and E. Pearson) [Bayesianism is a] fallacious rubbish His own approach was Fiducial inference ~ Bayesian statistics He used informed priors in genetics
Döntéselméleti háttér valószínűségek+hasznosságok Decision situation: Actions Outcomes Probabilities of outcomes Utilities/losses of outcomes QALY, micromort Maximum Expected Utility Principle (MEU) Best action is the one with maximum expected utility Actions a i Outcomes (which experiment) (e.g. dataset) Probabilities P(o j a i ) a o i j p( o j ai ) U o j a ) ( i EU ( ai ) U ( o j ai ) p( o j ai ) j a* argmaxi EU( ai ) Utilities, costs U(o j ), C(a i ) Expected utilities EU(a i ) = P(o j a i )U(o j ) a i o j 13
Asszociációs vs. függési vs. oksági vs. szabályozási hálózatok 14
Bayes-hálózatok Irányított körmentes gráf (DAG) Csomópont véletlen változók él közvetlen függés (oksági kapcsolat) Lokális modell - P(X i Pa(X i )) Három értelmezés: 15
Axioms of probability For any propositions A, B 0 P(A) 1 P(true) = 1 and P(false) = 0 P(A B) = P(A) + P(B) - P(A B)
About the event space Atomic events are mutually exclusive and exhaustive. The single variable case. Weather is one of <sunny,rainy,cloudy,snow> P((Weather =sunny) (Weather =rainy)) Challenges in the multivariate case. Weather is one of <sunny,rainy,cloudy,snow> TemperatureofRain is one of <icy,cold,warm> NONE? 11/17/2017 A.I. 17
Classical vs probabilistic logic: truth and beliefs P 1 P 3 KB s pkb P(query evidence) F F F F T.01.1 F F T T F.12.2 F T F F T.35.3 F T T F F.... T F F F T.... T F T T T T T F F T.... T T T F T.... 11/17/2017 A.I. 18
Probability theory: Basic concepts Joint distribution Conditional probability Independence, conditional independence Bayes rule Marginalization/Expansion Chain rule
Joint (probability) distribution Prior or unconditional probabilities of propositions e.g., P(Cavity = true) = 0.1 and P(Weather = sunny) = 0.72 correspond to belief prior to arrival of any (new) evidence Probability distribution gives values for all possible assignments: P(Weather) = <0.72,0.1,0.08,0.1> (normalized, i.e., sums to 1) Joint probability distribution for a set of random variables gives the probability of every atomic event on those random variables P(Weather,Cavity) = a 4 2 matrix of values: Weather = sunny rainy cloudy snow Cavity = true 0.144 0.02 0.016 0.02 Cavity = false 0.576 0.08 0.064 0.08
Conditional probability Conditional or posterior probabilities e.g., P(cavity toothache) = 0.8 i.e., given that toothache is all I know (Notation for conditional distributions: P(Cavity Toothache) = 2-element vector of 2-element vectors) If we know more, e.g., cavity is also given, then we have P(cavity toothache,cavity) = 1 New evidence may be irrelevant, allowing simplification, e.g., P(cavity toothache, sunny) = P(cavity toothache) = 0.8 This kind of inference, sanctioned by domain knowledge, is crucial
Conditional probability Definition of conditional probability: P(a b) = P(a b) / P(b) if P(b) > 0 Product rule gives an alternative formulation: P(a b) = P(a b) P(b) = P(b a) P(a) A general version holds for whole distributions, e.g., P(Weather,Cavity) = P(Weather Cavity) P(Cavity) (View as a set of 4 2 equations, not matrix mult.) Chain rule is derived by successive application of product rule: P(X 1,,X n ) = P(X 1,...,X n-1 ) P(X n X 1,...,X n-1 ) = P(X 1,...,X n-2 ) P(X n-1 X 1,...,X n-2 ) P(X n X 1,...,X n-1 ) = = π i= 1^n P(X i X 1,,X i-1 )
Bayes rule ) ( ) ( ) ( Model p Model Data p Data Model p X X p X Y p X p X Y p Y p X p X Y p Y X p ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( An algebraic triviality A scientific research paradigm A practical method for inverting causal knowledge to diagnostic tool. ) ( ) ( ) ( Cause p Cause Effect p Effect Cause p
Chain rule Chain rule is derived by successive application of product rule: P(X 1,,X n ) = P(X 1,...,X n-1 ) P(X n X 1,...,X n-1 ) = P(X 1,...,X n-2 ) P(X n-1 X 1,...,X n-2 ) P(X n X 1,...,X n-1 ) = = π P(X i X 1,,X i-1 )
Inference: marginalization Any question about observable events in the domain can be answered by the joint distribution. Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω φ P(ω)
Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω φ P(ω) P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
Inference by enumeration Start with the joint probability distribution: For any proposition φ, sum the atomic events where it is true: P(φ) = Σ ω:ω φ P(ω) P(toothache) = 0.108 + 0.012 + 0.016 + 0.064 = 0.2
Inference by enumeration Start with the joint probability distribution: Can also compute conditional probabilities: P( cavity toothache) = P( cavity toothache) P(toothache) = 0.016+0.064 0.108 + 0.012 + 0.016 + 0.064 = 0.4
Normalization Denominator can be viewed as a normalization constant α P(Cavity toothache) = α, P(Cavity,toothache) = α, [P(Cavity,toothache,catch) + P(Cavity,toothache, catch)] = α, [<0.108,0.016> + <0.012,0.064>] = α, <0.12,0.08> = <0.6,0.4> General idea: compute distribution on query variable by fixing evidence variables and summing over hidden variables
Inference by enumeration Typically, we are interested in the posterior joint distribution of the query variables Y given specific values e for the evidence variables E Let the hidden variables be H = X - Y E Then the required summation of joint entries is done by summing out the hidden variables: P(Y E = e) = αp(y,e = e) = ασ h P(Y,E= e, H = h) The terms in the summation are joint entries because Y, E and H together exhaust the set of random variables Obvious problems: 1. Worst-case time complexity O(d n ) where d is the largest arity 2. Space complexity O(d n ) to store the joint distribution 3. How to find the numbers for O(d n ) entries?
Conditional independence Probability theory=measure theory+independence I P (X;Y Z) or (X Y Z) P denotes that X is independent of Y given Z: P(X;Y z)=p(y z) P(X z) for all z with P(z)>0. (Almost) alternatively, I P (X;Y Z) iff P(X Z,Y)= P(X Z) for all z,y with P(z,y)>0. Other notations: D P (X;Y Z) =def= I P (X;Y Z) Contextual independence: for not all z. Homeworks: Intransitivity: show that it is possible that D(X;Y), D(Y;Z), but I(X;Z). order : show that it is possible that I(X;Z), I(Y;Z), but D(X,Y;Z).
Naive Bayesian network Assumptions: 1, Two types of nodes: a cause and effects. 2, Effects are conditionally independent of each other given their cause. Variables (nodes) Flu: present/absent FeverAbove38C: present/absent Coughing: present/absent P(Flu=present)=0.001 Model P(Flu=absent)=1-P(Flu=present) Flu P(Fever=present Flu=present)=0.6 P(Fever=absent Flu=present)=1-0.6 P(Fever=present Flu=absent)=0.01 P(Fever=absent Flu=absent)=1-0.01 P(Coughing=present Flu=present)=0.3 P(Coughing=absent Flu=present)=1-0.7 P(Coughing=present Flu=absent)=0.02 P(Coughing=absent Flu=absent)=1-0.02 Fever Coughing
Naive Bayesian network (NBN) Decomposition of the joint: P(Y,X 1,..,X n ) = P(Y) i P(X i, Y, X 1,..,X i-1 ) //by the chain rule = P(Y) i P(X i, Y) // by the N-BN assumption 2n+1 parameteres! Diagnostic inference: P(Y x i1,..,x ik ) = P(Y) j P(x ij, Y) / P(x i1,..,x ik ) If Y is binary, then the odds P(Y=1 x i1,..,x ik ) / P(Y=0 x i1,..,x ik ) = P(Y=1)/P(Y=0) j P(x ij, Y=1) / P(x ij, Y=0) Flu Fever Coughing p( Flu present Fever absent, Coughing present) p( Flu present) p( Fever absent Flu present) p( Coughing present Flu present)
Conditional probabilities, odds, odds ratios Smoking S S Lung cancer LC P( S, LC) P(S, LC) P( LC) LC P( S, LC) P(S, LC) P(LC) Probability: P(LC) P( S) P(S) Conditional probabilities (e.g., probability of LC given S): P(LC S)=??? P(LC S)=??? P(LC) 5 Odds: 4 [0,1] [0, ]: Odds(p)=p/(1-p) O(LC S)=??? O(LC S) 3 2 1 Odds Ratio (OR) Independent of prevalence! 0 OR(LC,S)=O(LC S)/O(LC S) 0 0.5 1
Probabilities, odds, odds ratios Smoking? Smoking Independence: null modell (H 0 ) S S LC 8 7 15 LC 1 4 5 9 11 20 Contingency table with marginals Lung cancer Lung cancer Conditional probabilities: P(LC S)=.11??? P(LC S)=.36??? P(LC)=.25 Odds: [0,1] [0, ]: Odds(p)=p/(1-p) O(LC S)=.12??? O(LC S)=.56 Odds Ratio (OR): OR(LC,S)=O(LC S)/O(LC S)=4.6 S LC.4.35.75 LC.05.2.25 S.45.55
Bayesian networks A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions Syntax: a set of nodes, one per variable a directed, acyclic graph (link "directly influences") a conditional distribution for each node given its parents: P (X i Parents (X i )) In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over X i for each combination of parent values
Example I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar? Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call
Example contd.
Compactness A CPT for Boolean X i with k Boolean parents has 2 k rows for the combinations of parent values Each row requires one number p for X i = true (the number for X i = false is just 1-p) If each variable has no more than k parents, the complete network requires O(n 2 k ) numbers I.e., grows linearly with n, vs. O(2 n ) for the full joint distribution For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 2 5-1 = 31)
Semantics The full joint distribution is defined as the product of the local conditional distributions: n P (X 1,,X n ) = π i = 1 P (X i Parents(X i )) e.g., P(j m a b e) = P (j a) P (m a) P (a b, e) P ( b) P ( e)
Constructing Bayesian networks 1. Choose an ordering of variables X 1,,X n 2. For i = 1 to n add X i to the network select parents from X 1,,X i-1 such that P (X i Parents(X i )) = P (X i X 1,... X i-1 ) This choice of parents guarantees: n P (X 1,,X n ) = π i =1 P n (X i X 1,, X i-1 ) //(chain rule) = π i =1 P (X i Parents(X i )) //(by construction)
Preferences 11/17/2017 A.I. 42
Rational preferences 11/17/2017 A.I. 43
An irrational preference 11/17/2017 A.I. 44
Maximizing expected utility 11/17/2017 A.I. 45
Utilities 46
Utility scales 47
Microlife a microlife is 30 minutes of your life expectancy speed of life: 48 microlives per day +/- 1 microlife: Smoking 2 cigarettes Drinking 7 units of alcohol (eg 2 pints of strong beer) Each day of being 5 Kg overweight Further losses and gains X-ray: 2 microlives Whole body CT-scan: 180 microlives Improvement in health care: +12 microlives https://understandinguncertainty.org/microlives 48
Kvantitatív döntéstámogatás: Döntési hálózatok I. 49
Döntési hálózatok II. 50
Döntési hálózatok III. D.Timmerman, P.Antal.,2001: ~400 paraméter 51
Bayesian network based decision support systems (DSS): Phases of construction I. Variables/Nodes (concepts) II. Values (descriptions) III. Dependencies/Edges IV. Parameters/Conditional probabilities V. Utilities/losses VI. Probabilistic inference VII.Sensitivity of inference November 17, 2017 A.I. 52
Decision support systems I. variables November 17, 2017 A.I. 53
Decision support systems II. values November 17, 2017 A.I. 54
Decision support systems III. dependencies November 17, 2017 A.I. 55
Decision support systems IV. Conditional probabilites November 17, 2017 A.I. 56
Decision support systems V. utilities/losses November 17, 2017 A.I. 57
Decision support systems VI. inference November 17, 2017 A.I. 58
Decision support systems VII. Sensitivity of inference November 17, 2017 A.I. 59
Sensitivity of the inference 1 P(Pathology=malignant E=e) Evidence e 60
Döntéstámogató rendszerek egészségügyi alkalmazása Felelőség kérdése Tárgyterületi szakértő Tudásmérnök Adatgyűjtés Protokoll tervezője Szoftver megvalósítója Adat felhasználása Következtető gép Adat és tudás fúziója Tudásmérnök és statisztikus Szoftver megvalósító Tárgyterületi felhasználó Osztott felelőség tisztázatlan határok nem optimális (de felelőségre vonható) emberi döntéstámogatás