111111111111111111111111111111111111111111111111111111111111111111111111 II uii IIi US008092994B2 (12) United States Patent (10) Patent No.: US 8,092,994 B2 Yuen et al. (45) Date of Patent: *Jan. 10, 2012 (54) HUMAN VIRUS CAUSING RESPIRATORY TRACT INFECTION AND USES THEREOF (75) Inventors: Kwok Yung Yuen, Hong Kong (CN); Chiu Yat Patrick Woo, Hong Kong (CN); Kar Pui Susanna Lau, Hong Kong (CN); Kwok Hung Chan, Hong Kong (CN); Lit Man Poon, Hong Kong (CN); Joseph Sriyal Malik Peiris, Hong Kong (CN); Yi Guan, Hong Kong (CN) (73) Assignee: Versitech Limited, Hong Kong (CN) (*) Notice: Subject to any disclaimer, the term of this patent is extended or adjusted under 35 U.S.C. 154(b) by 232 days. (21) Appl. No.: 12/476,019 (22) Filed: Jun. 1, 2009 This patent is subject to a terminal disclaimer. (65) Prior Publication Data US 2009/0305282 Al Dec. 10, 2009 Related U.S. Application Data (63) Continuation of application No. 10/895,064, filed on Jul. 21, 2004, now Pat. No. 7,553,944. (51) Int. Cl. C12Q 1/68 (2006.01) (52) U.S. Cl.... 435/6; 536/23.1; 536/24.3; 536/24.32 (58) Field of Classification Search... None See application file for complete search history. (56) References Cited U.S. PATENT DOCUMENTS 2005/0266397 Al 12/2005 Ecker et al. FOREIGN PATENT DOCUMENTS WO W02004096842 * 4/2004 OTHER PUBLICATIONS Weiss, SR et al. "Characterization of Murine Coronavirus RNA by Hybridization with Virusspecific cdna Probes" J. Gen. Virology; 64:127-133;1983.* * cited by examiner Primary Examiner Bo Peng (74) Attorney, Agent, or Firm Saliwanchik, Lloyd & Eisenschenk (57) ABSTRACT The present invention provides the complete genomic sequence of a novel human coronavirus, coined as coronavirus-hku1 ("CoV-HKU1"), isolated in Hong Kong from a patient who had a recent history of visit to Schenzhen, China. The virus belongs to the order Nidovirales of the family Coronavirdae, being a single-stranded RNA virus of positive polarity. The invention also provides the deduced amino acid sequences of the complete genome of the CoV-HKU1. The nucleotide sequences and deduced amino acid sequences of the CoV-HKU1 are useful in preventing, diagnosing and/or treating the infection by CoV-HKU1. Furthermore, the invention provides immunogenic and vaccine preparations using recombinant and chimeric forms as well as subunits of the CoV-HKU1 based on the nucleotide sequences and deduced amino acid sequences of the CoV-HKU1. 10 Claims, 119 Drawing Sheets
U.S. Patent Jan. 10, 2012 Sheet 1 of 119 US 8,092,994 B2 SEQ:1 1 TCGTGCTATGCCAAATATTTTGCGTATTGTTAGTAGTTTAGTTTTGGCCCGCAAACAT 58 SEQ:2 1 RAM P N I L R IV S S L V LARK H 19 59 GAATTTTGTTGTTCACATGGTGATAGATTTTATCGCCTTGCGAATGAATGTGCTCAAGTT 20 E F C C S H G D R F Y R LANE CA Q V 118 39 119 TTGAGTGAAATAGTTATGTGTGGCGGTTGCTATTATGTTAAGCCTGGTGGTACTAGCAGT 40 L SE I V MC G G C Y Y V K PG CT S S 178 59 179 GGTGATGCAACTACTGCTTTTGCTAATTCTGTTTTTAATATATGTCAGGCTGTTACTGCT 60 G D A T T A F A N S V F N I C Q A V T A 238 79 239 AATGTTTGTTCTCTTATGGCCTGTAATGGCCATAAGATTGAAGATTTAAGTATACGCAAT 80 N V C S L M A C N G H K I E D L S I R N 298 99 299 TTACAAAAACGCTTATACTCTAATGTTTATCGTACAGATTATGTTGATTATACATTTGTT 100 L Q K R L Y S N V Y R T D Y V D Y T F V 358 119 359 AATGAGTATTATGAATTTTTATGTAAGCATTTTAG 120 N E Y Y E F L C K H F 393 130
U.S. Patent Jan. 10, 2012 Sheet 2 of 119 US 8,092,994 B2 SEQ:3 1 GAATAAGAGCGAATTGCGTCCGTACCGTCTATCAGCTTACGATCTCTTGTCAGATCTCAT 60 E * E RI A S V P S I S L R SLY R S H N K SE L R P Y R L SAY DL L S DLI IRA N CV R TV Y Q L TI S C Q I S 61 TAAATCTAAACTTTTTAAACAAGATTCCCTGTTATCCATGCTTGTGAGTGTGGTTTAATC 120 * I * T F * PR F P VI H ACE C G L I KS K L F K Q D S L L S M L V S V V * S L N L NFL N K I P C Y PC L * V H F N 121 ATAATCTTGTATTTTACTTTCCACACTTTTCATCTCTCTGCCAGTGACGTGTTGGTTGTC 180 I I L Y FT F H T F H L S A S DV L V V * SC ILL ST L F IS L P VT C W L S H N L V F Y F P H F S S L C Q * R V G C 181 CTCAGCGTCCCTCCCATAGGTCGCAATGATTAAAACCAGCAAATACGGTCTCGGCTTCAA 240 L S V P PIG RN D * N Q Q IRS FL Q S A S L P * V A M I K T S K Y OLD F K F O R PS HR S Q * L K P A NT V S AS 241 GTGGGCGCCAGAATTTCGTTGCCTGCTTCCGGATGCAGCGGAGGAGTTGGCTAGTCCTAT 300 VGA R IS LA AS G CS C G V G * $ Y WA PS FEW L L PD A AK E LA SPM S G R Q N F V G CF R MQRR SW L V L 301 GAAGTCAGATGAGGGTGGGTTATGCCCCTCTACTGGTCAAGCGATGGAAAGTGTTGGATT 360 E V R * G WV M PLY W S SD OK C WI K S DEC G L C PS TO Q AMES V G F * S Q MR V G Y A FL L V K R W K V L D 361 CGTTTATGATAATCATGTGAAGATAGATTGTCGCTGCATTCTTGGACAAGAATGGCATGT 420 FL * * SC ED EL EL HEW T R MAC V Y D N H V KID CRC IL G Q K W S F MI I M * R * IV A A FL D K NO M H V 421 GCAGTCAAATCTTATCCGTGATATTTTTGTTCATGAAGATCTACATGTTGTAGAAGTTCT 480 A V K S Y P * Y F CS * ES T CC R S S QS ML I RID IF VHS DL H V VS V L CS QILS VI FL FM K I Y ML * K F 48]. AACTAAAACAGCCGTAAAGTCCGGTACGGCAATTTTAATTAAATCACCTTTGCATAGCTT 540 N * N S R K V R Y G N F N * IT F A * L T K TA V K SOT A IL IKE FL H EL * L K Q P * SF V R Q F * L NH L CIA FI
U.S. Patent Jan. 10, 2012 Sheet 3 of 119 US 8,092,994 B2 541 GGGTGGTTTTCCTAAAGGGTATGTTATGGGCTTGTTCCGTTCATACAAGACTAAACGTTA 600 G W F S * R V C Y G L VP F I Q D * T L G CF P KG Y V MG L FR S Y K T K fly WV V FL KG ML WA Cs V H T R L N V 601 TGTTGTACATCATCTTTCTATGACTACATCTACTACTAATTTTGGTGAAGATTTTTTGGG 660 CC T S S F Y D Y I Y Y * F W * R F F G V V H H L S M T T ST TN F GE D FL G ML Y I I FL * L H L L L I L V K IF W 661 TTGGATTGTACCTTTTGGTTTTATGCCATCTTATGTTCACAAATGGTTTCAATTCTGTAG 720 L D CT F W F Y A IL CS Q MV S I L * WI VP F G FM P S Y V H K W F Q F C R V G L Y L L V L C H L M F TN CF N S V 721 GTTGTATATTGAAGAGAGTGATTTAATAATTTCAAATTTTAAATTTGATGATTATGATTT 780 V V Y * R E * F N N F K F * I * * L * F L Y I E ES DLII S N F K F D D Y D F G CI L KR VI * * F Q IL N L MIMI 781 TAGTGTAGAAGATGCTTATGCTGAGGTTCATGCTGAGCCTAAAGGTAAATATTCACAAAA 840 * C R R CL C * CS C * A * R * IF T K S V E D A Y AS V HAS P K G KY S Q K L V * K ML ML R F ML S L K V N I H K 841 AGCTTATGCTTTACTTAGACAATATCGTGGTATTAAACCCGTACTTTTTGTAGACCAGTA 900 S L CF T * TI S W Y * T R T F C R P V A Y ALL R Q Y R CI K P V L F V D Q Y K L ML Y L D N IV V L N P Y FL * T S 901 TGGTTGTGACTATTCTGGTAAATTAGCAGATTGTCTTCAAGCTTATGGTCATTATTCTTT 960 W L * L F W * IS R L SSS LW S L F F CC D Y S G K LAD CL Q A Y G H VS L MV V TI L V N * Q IV F K L MVII L 961 GCAAGATATGAGACAAAAGCAGTCTGTATGGCTTGCCAATTGTGACTTTGATATTGTAGT 1020 AR Y E T K A V C MAC Q L * L * Y CS Q PM R Q K Q S V W LAN CD F DIV V C K I * D K S S L Y CL Pt VT L I L * 1021 GGCTTGGCATGTAGTTCGTGATTCACGATTTGTTATGCGCCTGCAGACTATAGCTACTAT 1 080 CL A CS $ * F TIC YAP A D Y S Y Y AWE V V R PS R F V MR L Q TI A TI W L GM * F VI H DL L C A C R L * L L [t4sdaip
U.S. Patent Jan. 10, 2012 Sheet 4 of 119 US 8,092,994 B2 1081 TTGTGGTATTAAATATGTTGCACAACCTACAGAAGATGTAGTAGATGGAGATGTAGTTAT 1140 L W Y * I C C T T Y R R C SEW R C S Y C G I K Y VA Q PT ED V V D G D V VI F V V L NM L H ML Q KM * * ME M * L 114 1 ACGTGAACCTGTACATTTATTATCTGCTGATGCAATAGTTTTAAAGCTTCCTAGTTTGAT 1200 T * T CT F II C * C N SF K A S * F D REP V ML L SAD A IV L K L PS L M Y V N L Y I Y Y L L M Q * F * SF L V 1201 GAAAGTTATGACTCATATGGATGATTTTTCTATTAAATCTATATATAATGTTGATTTGTG 1260 ES Y D S Y G * F F Y * I Y I * C * F V K V MT H MOD F S I K S I Y N V O L C * K L * L I W MI FL L N L Y I ML IC 1261 TGATTGTGGTTTTGTTATGCAGTATGGTTATGTAGATTGTTTTAATGATAATTGTGATTT 1320 * LW F C Y A V W L CRLF * * * L * F D C G F V M Q Y G Y V D C.F N D N C D F VI V V L L CS MV M * IV L MI I V I 1321 TTATGGTTGGGTTTCAGGTAATATGATGGATGGTTTTTCTTGTCCATTGTGTTGTACAGT 1380 LW L G FR * ID G W F FL S I V L Vs Y G WV S G N MMD CF S C P L CC TV F MV G F Q VI * W MV F L V H CV V Q 1381 TTATGACTCTAGCGAAGTTAAAGCCCAATCATCTGGTGTTATTCCTGAAAATCCTGTGTT 1440 L * L * ES * S P I I W C Vs * K SC V Y D SSEV K A Q S S G VIP EN P V L F MT LA K L K P N H L V L FL K IL C 1441 ATTTACTAATAGTACTGATACTGTTAACCATGATTCTTTTAATTTGTATGGTTATTCTGT 1500 IV * * Y * Y C * P * F F * F V W L F C F TN ST D TV N H D SF ML Y G VS V Y L LIV L I L L T MILL I C M VI L 1501 CACACCATTTGGTTCTTGTATATATTGGTCGCCGCGTCCTGGATTGTGGATTCCTATAAT 1560 H TI W FLY IL V A A S WI V D S Y N T P F G SC I Y W S PEP G LW I P I S H H L V L V Y I GRE V L DC G FL * I 1561 TAAATCTTCAGTCAAGTCTTATGATGATTTGGTTTATTCAGGTGTAGTAGGTTGTAAATC 1620 * IFS Q V L * * F CL FR CS EL * I K S S V KS Y D DLV VS CV V CC KS L N L Q S S L MM I W Fl Q V * * V V N FIG, 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 5 of 119 US 8,092,994 B2 1621 TATTGTTAAAGAAACTGCTCTTATTACTCATGCACTTTACTTAGATTATGTTCAATGTAA 1680 Y C * RN C S Y Y SC T L L R L Cs M I V K E TA LIT HAL Y L D Y V Q C K L L L K K L L L L L M H FT * I M F NV 1681 GTGTGGTAATCTTGAACAAAATCATATTCTTGGCGTTAATAATTCTTGGTGTAGGCAACT 1740 V W * S * T K S Y SW R * * F L V * AT CON LE O N H IL CV N N SW CRQ L S V VI L N K I I F LA L I IL GIG N 1741 GTTGCTTAATAGAGGTGATTATAATATGCTTCTAAAAAATATTGACTTGTTTGTTAAGCG 1800 VA * * R * L * VA S K KY * L V C * A L L NRC D Y N ML L K N I DL F V KR CCLI E V I I IC F * KILT CL L S 1801 TCGTGCTGATTTTGCTTGCAAGTTTGCAGTTTGTGGAGATGGTTTTGTACCTTTTTTACT 1860 SC * F CL Q V CS LW R W F CT F FT A AD F A C K F A V COD OF V P FL L V V L I L LAS L Q F V E MV L IL F Y 1861 AGATGGTTTAATTCCCCGTAGTTATTATCTAATTCAGAGTGGTATTTTCTTTACATCTTT 1920 RN F N S P * L L S N SE WY F LV I F DCL I PR S Y Y L I OS CI F FT S L * M V * F P V V II * FR V V F S L H L 1921 GATGTCTCAATTTTCACAAGAAGTTTCTGATATGTGTTTAAAAATGTGTATTTTGTTTAT 1980 DV SIFT R SF * Y V F K N V Y F V Y M S Q F S Q E VS D MC L K MCI L FM * CL N F H K K F L I C V * K C V F CL 1981 GGACAGAGTTTCAGTTGCTACATTTTATATAGACCATTATGTTAATAGGTTGGTTACTCA 2040 G Q SF S C Y IL Y HA L C * * V G VS DR VS VAT F VIE MY V N R L V T Q NT E F Q L L H F I * SIN L I G ML L 2041 ATTTAAGTTATTGGGTACTACACTTGTTAATAAAATGGTTAATTGGTTTAATACCATGTT 2100 I * VI CV VT C * * N G * L V * Y H V F K L LOT T L V N K MV NW F N T ML N L S Y WV L ML L I KM L I C LIP C 2101 AGATGCTAGTGCACCTGCTACAGGCTGGCTTCTTTACCAATTATTGAATGGTCTTTTTGT 2160 R C * CT CV R LAS L P II EW SF C D A SAP AT O M L L Y Q L L N G L F V * ML V ML L Q A OFF TN Y * MV FL
U.S. Patent Jan. 10, 2012 Sheet 6 of 119 US 8,092,994 B2 2161 AGTATCTCAAGCCAACTTTAATTTTGTTGCTTTAATACCTGATTATGCTAAAATTTTAGT 2220 S I S SQL * FCC F NT * L C * N F S V S Q A N F N F V A L I PD VA K I L V * Y L K PT L I L L L * Y L IN L K F * 2221 TAATAAATTTTACACTTTTTTTAAGTTATTATTAGAGTGTGTTACAGTTGATGTTTTAAA 2280 * * IL H F F * VII R V C Y S * CF K N K F VT F F K L L L E CV TV DV L K L I N FT L FL S Y Y * S V L Q L M F * 2281 AGATATGCCTGTTCTTAAAACTATTAATGGTTTAGTTTGTATTGTAGGCAATAAGTTTTA 2340 R Y A CS * NY * W F SLY C R Q * V L D NP V L K TIN G L V CXV G N K F Y K IC L FL K L L MV * F V L * A I S F 2341 TAACGTTACTACAGGGTTAATTCCTGGTTTTGTTTTACCATGTAATGCACAGGAACAACA 2400 * R * Y R V N SW F C F TM * CT OTT N VS TO LIP OF V L PC N A Q E Q Q IT L V Q G * F L V L F Y H V MN RN N 2401 AATTTATTTTTTTGAAGGCGTTGCAGAATCTGTTATAGTAGAAGATGATGTTATTGAGAA 2460 N L F F * R R C R I C Y S R R * C Y * E IV F F E OVA ES V I VS D DVI E N K F IF L K At Q N L L * * K MM L L R 2461 TGTCAAATCTTCTTTATCATCTTATGAGTATTGTCAACCACCTAAATCTGTAGAAAAAAT 2520 C Q 1 F F I IL * V L ST T * ICR K N V K SR L S S YE Y C O P P K S VS K I MS ML L Y ML MS IV N H L ML * K K 2621 TTGTATTATAGATAATATGTACATGGGTAAGTGTGGTGATAAATTTTTCCCTATTGTCAT 2580 L Y Y R * Y V HO * V W * * IF P Y C H CI ID N MY MG K C G D K F F P I V M F V L * I I CT WV S V VI N F S L L S 2581 GAATGATAAAAATATTTGTCTTTTAGATCAGGCTTGGCGTTTTCCATGTGCAGGTAGAAA 2640 E * * KY L SF RE G LA F S NCR * K ND K N IC L L D Q A W R F PC A SR K * MI K IF V F * FR L G V F NV QV E 2641 AGTTAATTTTAACGAGAAACCTGTTGTTATGGAGATTCCGTCTTTGATGACAGTTAAGGT 2700 S* F * RET CCV G D S V F D D S * G V N F N E K P V VMS I PS L M T V K V K LILT R ML L LW R FR L * * CL R r it
U.S. Patent Jan. 10, 2012 Sheet 7 of 119 US 8,092,994 B2 2701 TATGTTTGATTTAGATTCTACTTTTGATGATATTTTAGGTAAAGTTTGTTCAGAATTTGA 2 760 Y V * FR F Y F * * Y FR * EL FR I * M F DL D ST F ODI L G K V C SE FE L CLI * ILL L M IF * V K F V Q N L 2761 AGTAGAAAAGGGTGTTACTGTAGATCATTTTGTTCCTGTTGTTTGTGATGCTATAGAGAA 2820 SR K G C Y C R * F C CCCL * CV RE V E KG VT V D D F V A V V CD A I EN K * KR V L L * MILL L L F V ML * R 2821 TGCTTTAAACTCTTGTAAAGAGCATCCAGTGGTTGGTTATCAAGTTCGTGCATTTTTAAA 2880 C F K L L * RAE SGML SE SC IF K AL N S C KEN P V V G Y Q V R A FL N M LW T L V K S I Q MLV I K F V H F * 2881 TAAACTTAATGAGAATGTTGTTTATTTATTTGATGAGGCTGGTGATGAAGCAATGGCCTC 2940 * T * * E CCL F I * * GM * * S N G L K L N EN V V Y L F D E A G D E AMA S IN L MR ML F I Y L MR L V M K Q W P 2941 TCGTATGTATTGTACTTTTGCTATTGAGGATGTTGAAGACGTTATCAGTAGTGAAGCTGT 3 000 S Y V L Y F CV * G C * REV Q * * SC R M Y CT F Al ED V ED V IS SEA V L V CIV L L L L R ML K T L S V V K L 3001 CGAAGATACTATTGATGGTGTCGTTGAAGACACTATTAATGACGATGAAGATGTTGTTAC 3060 RE Y Y * NCR * R H Y * * R * R C C Y EDT ID CV V EDT IN DDE DV VT S KILL MV S L K T L L MT M K ML L 3061 TGGTGACAATGACGATGAAGATGTTGTTACTGGTGACAATGACGATGAAGATGTTGTTAC 3 120 W * Q * R * R CCV W * Q * R * R C C Y G D NOD ED V VT G D N DDE DV VT L V TM TM K ML L L V TNT M K ML L 3121 TGGTGACAATGACGATGAAGATGTTGTTACTGGTGACAATGACGATGAAGATGTTGTTAC 3 180 W * Q * R * R CCV MW Q * R * R CCV G D ND D ED V VT G D N DDE DV VT L V TNT M K ML L L V TNT M K ML L 3181 TGGTGACAATGACGATGAAGATGTTGTTACTGGTGACAATGACGATGAAGATGTTGTTAC 3240 W * Q * R * R CCV MW Q * R * R C C Y G D N DDE DV VT CON DDE DV VT L V TNT M K ML L L V TNT M K ML L FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 8 of 119 US 8,092,994 B2 3241 TGGTGACAATGACGATGAAGATGTTGTTACTGGTGACAATGACGATGAAGATGTTGTTAC 3300 W * Q * R * R CCV W * Q * R * R CCV GO N D D E DV VT G D NOD ED V VT L VT MT M K M L L L VT MT M KM L L 3301 TGGTGACAATGACGATGAAGATGTTGTTACTGGTGACAATGACGATGAAGATGTTGTTAC 3360 W * Q * R * R CCV W * Q * R * R CCV G ON DID ED V VT G D N DDE DV VT L VT M TM X ML L L VT M TM KM L L 3361 TGGTGACAATGACGATGAAGATGTTGTTACTGGTGACAATGACGATGAAGATGTTGTTAC 3420 W * Q * R * R CCV W * Q * R * R CCV G D NODE DV VT G D NODE DV VT L VT MT M KM L L L V TM TN KM L L 3421 TGGTGACAATGACGATGAAGATGTTGTTACTGGTGACAATAACGATGAAGAGATTGTTAC 3480 W * Q * R * R CCV W * Q * R * R D C Y G D NODE OVVT G D N N DEE I VT L V TM TM XML L L VT IT MX EL L 3481 TGGTGACAATGATGACCAAATTGTTGTTACTGGTGATGATGTAGATGATATTGAAAGTAT 3540 W * Q * * P N C C Y W * * C R * Y * KY G D ND D Q I V VT GOD V D D IL S I L V TM MT XL L L L V MM * NIL K V 3541 TTATGACTTTGATACTTATAAAGCTCTTTTAGTTTTTAATGATGTCTATAATGATGCTTT 3600 L * L * Y L * S S F S F * * CL * * CF Y D F D T Y K A L L V F N D V Y N D A L F MT L I L I XL F * FL MM S INN L 3601 GTTTGTTAGTTATGGTTCTAGTGTTGAAACAGAAACATATTTTAAAGTTAATGGTTTATG 3 660 V C * L W F * C * N RN IF * S * W FM F V S Y G S S VETS TV F K V MG L W CL L V NV L V LX Q K H ILK L MV Y 3661 GTCACCTACTATTACACATACTAATTGTTGGTTGCGTTCTGTGTTACTTGTAATGCAGAA 3 720 V TV Y Y TV * L L V A F C VT C N AR S P T I T H T N C W L R S V L L V M Q K GM L L L NIL IV G CV L C Y L * C R 3721 ATTACCTTTTAAGTTTAAGGATTTAGCTATTGAAAATATGTGGTTATCTTATAAGGTGGG 3780 IT F * V * OF S Y * KY V VI L * G G L P F K F K DL A IENM W L S Y K V G N Y L L S L RI * L L K IC G Y LIEN FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 9 of 119 US 8,092,994 B2 3781 TTATAATCAAAGTTTTGTTGATTATTTACTGACCACTATTCCTAAAGCTATTGTTTTGCC 3840 L * S K F C * L F TO H VS * S Y C F A Y N Q SF V D Y L L T TI P K A I V L P V I I K V L L il y * P L FL K L L F C 3841 TCAAGGTGGTTTTGTAGCTGATTTTGCTTATTGGTTTTTAAACCAGTTTGATATTAATGC 3900 SR W F Cs * F CL L V F K P V * Y * C Q G G F VAD F A Y W FL N Q F DIN A L K V V L * L I L L I SF * T S L I L M 3901 GTATGCTAATTGGTGTTGTTTAAAATGTGGTTTTTCTTTTGATTTAAATGGTTTGGATGC 3 960 V C * L V L F KM W F F F * F K W F SC VA NW CCL K CD F S F DL ND L D A R ML ID V V * N V V FL L I * MV W M 3961 TTTGTTTTTTTATGGAGATATTGTGTCTCATGTTTGTAAGTGTGGACATAATATGACTCT 4020 F V F LW R Y C V SC L * V W T * Y D S L F F Y G DIV S WV C K CC H NM T L L C F FM El L CL M F VS V DII * L 4021 AATAGCAGCGGACTTACCTTGTACATTACATTTTTCATTATTTGATGACAATTTTPGTGC 4 080 N S SQL T LVI T F F I I * * Q FL C IA AOL PC T L H F S L F D D N F C A * * Q R TV L V H Y IF MY L MT IF V 4081 TTTTTGCACCCCTAAAAAAATTTTTATTGCTGCATGTGCTGTGGATGTAAACGTTTQTCA 4140 FL H P * K N F Y CC MC CD C KR L S F CT P K K IF IA A C A V DV N V C H L F A P L K K FL L L H V LW M * T F V 4141 TTCTGTAGCTGTTATAGGTGATGAACAAATAGATGGTAAGTTTGTTACTAAATTTAGTGG 4200 F C SC Y R * * TN R W * V C Y * I * W S V AVIS D E Q IDGK F VT K F SD IL * L L * V MN K * MV S L L L N L V 4201 TGATAAATTTGATTTTATAGTAGGTTATGGAATGTCATTTAGTATGTCTTCTTTTGAGTT 4260 * * I * F Y SR LW N VI * Y V F F * V D K F D F IV G Y GM SFS MS SF EL VI N L I L * * V ME CMLV CL L L S 4261 ACCTCAATTGTATGGTTTGTGTATAACACCTAATGTATGTTTTGTTAAAGGTGATATTAT 4320 T S I V W F V Y NT * C M F C * R * Y Y P Q L Y CL CIT P N V CF V K CD I I Y L N CMV C V * H L MY V L L K V IL
U.S. Patent Jan. 10, 2012 Sheet 10 of 119 US 8,092,994 B2 4321 AAATGTTGCTAGACTTGTTAAAGCTGATGTTATTGTTAATCCTGCTAATGGGCATATGCT 4380 K C C * T C * S * C Y C * SC * WAY A N VAR L V K A Dv I V N PANG NM L * M L L DL L K L ML L L ILL M G IC 4381 CCATGGTGGTGGAGTTGCAAAAGCTATAGCTGTAGCTGCAGGTAAAAAATTTTCTAAAGA 4440 P W W W S C KS Y SC S C R * K I F * R HG G G VA K A I A V A AG K K F EKE S M V V EL Q K L * L * L Q V K NFL K 4441 AACTGCTGCTATGGTTAAATCTAAAGGTGTTTGCCAAGTAGGAGATTGTTATGTTTCTAC 4 500 N CC Y C * I * R CL PER R L L CF Y TA A MV K S KG V C Q V G D CV V ST K L L LW L ML K V PA K * E IV NFL 4 501 CGGTGGTAAATTATGTAAAACAATTCTTAATATTGTAGGCCCTGATGCTAGACAAGATGG 4 560 PM * IN * N N S * Y C R P * C * TRW G G K L C K TI L N IV G PD AR Q D G P V V NY V K Q F L I L * AL ML D KM 4561 AA GACAATCTTATGTTTTGTTAGCACGTGCTTAT.AAGCATCTTAATAATTATGATTGTTG 4620 K TI L CF VET CL * AS * * L * L L R Q S Y V L LA RAY K ML N NY DCC ED N L M F C * MV LIE IL II MI V 4 621 TTTGTCTACTCTCATATCGGCTGGTATATTTAGTGTTCCTGCTGATGTGTCATTAACTTA 4680 F V YE H I G W VI * C SC * C VIM L LET LISA G IFS VP A DV EL T Y V CL L EYE L V Y L V FL L MC H * L 4681 CCTTCTAGGTGTTGTTGATAAACAAGTTATCCTTGTTAGTAATAATAAAGAAGATTTTGA 4 740 PS R CC * * T S Y P C * * * * R pp * L L G V V D K Q V IL V S N N K ED PD T F * V L LINK L EL LVII K K IL 4741 TATTATTCAA.AAATGTCAAATTACTTCAGTTGTTGGTACTAAAGCATTGGCTGTTAGATT 4800 Y YE K H EN Y FEC WY * S I G C * I II Q K C Q ITS V V G T K ALA VP L IL F K N V K L L Q L L V L K NW L L D 4801 AACTGCTAATGTAGGCCGTGTTATTAAATTTGAGACAGATGCATACAAACTTTTTTTGAG 4860 NC * C R PC Y * I * DEC I Q T F FE TAN V GE VI K F MT D A YELP L S * L L M * A V L L ML R Q MM TN F F * FIG. 2 CONT,
U.S. Patent Jan. 10, 2012 Sheet 11 of 119 US 8,092,994 B2 4861 TGGTGATGATTGTTTTGTTTCAAATTCTTCTGTTATACAAGAAGTTTTATTGCTTCGTCA 4 92 0 W * * L F CF K F F C Y T KS F IA S S G D D CF V S N S S V I Q E V L L L R H V V M IV L F Q ILL L Y K K F Y CF V 492]. TGATATACAATTGAATAATGACGTTCGTGATTATTTGTTGTCTAAGATGACTAGTCTTCC 4 980 * Y T I E** R S* L F V V * D D* S S D IQ L N MDV R D Y L L S KM T S L P MI Y N * I MT F VII CCL R * L V F 4981 TAAAGATTGGCGTCTTATCAATAAATTTGATGTTATTAACGGTGTTAAAACTGTTAAGTA 5040 * R LAS Y Q * I * C Y * R C * N C * V K D W R L I N K F DVI N G V K TV KY L K I G V L SIN L ML L TV L K L L S 5 041 TTTTGAGTGTCCTAATTCTATTTATATATGTAGTCAGGGTAAAGACTTTGGTTATGTATG 5 100 F * VS * F Y L Y M * S G * R LW L CM FE C P N S I Y IC S Q G K SF G Y V C IL S V L I L F I Y V V R V K T L V MY 510]. TGATGGTTCTTTTTATAAAGCAACTGTTAATCAAGTTTGTGTTTTATTAGCTAAGAAGAT 5160 * W F FL * S N C * S S L CF I S * ED D G SF Y K A TV NQ V CV L LA K K I V MV L F I K Q L L I K F V F Y * L R R 5 161 AGATGTTTTGCTTACTGTAGATGGTGTTAATTTTAAATCTATTTCTCTTACTGTAGGTGA 5 220 R C F A Y C R W C * F * I Y F S Y C R * DV L L TV D G V N F K S IS L TV GE * M F C L L * M V{ L I L N L F L L L * V 5221 AGTTTTTGGTAAAATACTTGGTAATGTTTTCTGTGATGGCATTGATGTTACTAAGTTAAA 5260 SF W * NT W * C FL * WE * C Y * V K V F OK IL G NV F CD G ID VT K L K K F L V KY L V M F S V MA L ML L S * 5281 GTGTAGTGATTTTTATGCCGATAAAATTTTATATCAGTATGAAAATTTGTCTTTAGCTGA 5340 V * * FL C R * N F I S V * K F V F S * C SD F Y AD K IL Y Q YEN L S LAS S V VI FM P I K F Y IS M K IC L * L 5341 TATTTCTGCTGTACAAAGTTCATTTGGGTTTGATCAGCAACAATTGCTTGCTTATTATAA 5400 Y F C CT K F I W V * S AT IA CL L * ISA V Q S SF SF D Q Q Q L LAY Y N I FL L Y K V EL G L I S N N CL LII FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 12 of 119 US 8,092,994 B2 5401 TTTTTTAACAGTATGTAAATGGTCTGTAGTTGTTAACGGTCCATTTTTTTCTTTTGAACA 5460 F F N S M * MV CS C * RE I F F F * T FL TV C K WE V V V HG P F F SF E Q I F * Q Y V H S L * L L T VHF FL L N 5461 GTCTCATAATAATTGTTATGTGAATGTAGCTTGTCTTATGTTGCAGCATATTAATCTTAA 5520 V S * * L L CE CS L S Y V A A Y * S * S H N N C Y V N VA CL ML Q H IN L K EL III V M * M * L V L CC S I L I L 5521 ATTTAATAAATGGCAGTGGCAGGAAGCATGGTATGAATTTCGTGCTGGCAGACCACATAG 5580 I * * MA V AG S M V * I S C W Q T T * F N K W Q W Q SAW YE F RAG RPM R N L IN G S G R K H GM N F V LAD HI 5581 GTTAGTTGCTCTTGTTTTAGCTAAAGGTCATTTTAAATTTGATGAACCATCAGATGCTAC 5640 VS C SC F S * RSF * I * * TIE CV L V A L V LA K GM F K F DSP SD AT G * L L L F * L K V IL N L M NH Q ML 5641 TGATTTTATTCGTGTTGTTTTGAAACAAGCTGATTTATCAGGTGCAATTTGTGAATTAGA 5 700 * F Y SCC FETE * FIR C N L * I R D FI R V V L K Q A DL S GA ICE L E L I L F V L F * N K LIV Q V Q F VU * 5701 ACTTATTTGTGATTGTGGTATTAAACAAGAAAGTCGTGTTGGTGTTGATGCTGTTATGCA 5760 TV L * L W Y * T R K SC W C * C C Y A L I C DC G I K Q SEE V G V D A V MM N L F V IV V L N K K V V L V L ML L C 5761 TTTTGGTACATTAGCAAAGACTGATCTTTTTAATGGTTATAAGATTGGCTGTAATTGTGC 5820 F WY I S K D * SF * W L * D W L * L C F G T LA K T DL F N G Y K I G CU C A IL V H * Q R L I FL MV I EL AVIV 5821 AGGTAGAATTGTCCATTGTACTAAATTGAATGTACCATTTTTGATTTGTTCTAATACTCC 5880 R * NC PLY * IS CT I F DL F * Y S GE IV MC T K L N VP F L I C S N T P Q VS L S I V L N * MY H F * F V L I L 5881 TCTGAGTAAGGATTTACCTGATGATGTTGTTGCAGCTAACATGTTTATGGGTGTAGGTGT 5940 S E* G F T** C C C S * H V Y G CRC L SK DL PD DV VA A UH F MG V G V L * V R I Y L M M L L Q L T C LW V * V FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 13 of 119 US 8,092,994 B2 5941 AGGCCATTATACACATTTGAAATGTGGTTCACCTTACCAACATTATGATGCTTGTAGTGT 6000 HP L Y T F SM W FT L PT I, * CL * C SHY TM L K CS SPY Q H Y D A CS V * All H I * N V V H L TN I M M L V V 6001 TAAAAAATATACAGGTGTTAGTGGTTGTTTAACTGACTGCTTGTATCTTAAAAATTTAAC 6060 * K I Y R C * W L F N * L L V S * K F N K KY T G VS G CL T DCL Y L K N L T L K NI Q V L V V V * L TA C IL K I * 6061 CCAGACTTTTACATCTATGTTGACTAATTATTTTTTGGATGATGTTGAAATGGTTGCTTA 6120 PD FYI Y V D * L PH D * C * N D CL Q T FT S M L TN Y FL D DV E MV A Y PHIL HI C * LII F W MM L K W L L 6121 TAACCCTGATCTTTCACAATATTATTGTGATAATGGTAAGTATTATACAAAACCTATTAT 6180 * P * SF TILL * * W * V L Y K T Y Y N P DL S Q Y Y CD MG KY Y T K P I I IT II F H NI IVI MV SII Q N L L 6181 AAAGGCTCAGTTTAAACCATTTGCTAAAGTTGACGGTGTTTATACTAACTTTAAGTTAGT 6240 K G S V * TIC * S * R CL Y * L * VS K A Q F K P F A K. V D G V Y TN F K L V * R L DL N H L L K L TV F I L T IS 6241 TGGACATOATATTTGTGCTCAATTGAATGATAAGTTAGGTTTTAATGTAGATTTGCCGTT 6300 W T * Y L CS I E * * VHF * C R F A V G H DI CA Q L MD K L G F N V DIP F L D M I F V IN * MIS * VI M * ICR 6301 TGTTGAGTACAAAGTAACAGTCTGGCCTGTAGCTACTGGTGATGTTGTTTTGGCATCTGA 6360 C * V Q S N S L A CS Y W* CC F G I * V E Y K V TV W P VAT G D V V LAS D IL ST K * Q S G L * L L V ML F W ML 6361 TGATTTATATGTGAAACGTTATTTTAAAGGATGTGAAACTTTTGGTAAGCCTGTTATTTG 6420 * F ICE TI F * R M * N F W * A C Y L DIV V K R Y F K SC ST F G K P V I W MI Y M * N V ILK D V K LIV S L L F 6421 GTTTTGTCATGATGAAGCATCATTGAATTCTCTTACTTATTTTAATAAACCTAGTTTTAA 6480 V IS * * S I I EFS Y IF * * T * F * F C H DEAD INS IT Y F N K P SF K G F V MM K NH * ILL LIII NI VI
U.S. Patent Jan. 10, 2012 Sheet 14 of 119 US 8,092,994 B2 6481 ATCTGAAAATAGATATAGTGTTTTGTCTGTTGATTCTGTATCTGAGGAGTCACAAGGTAA 6540 I * K * I * CF V C * F CI * G VT R * SE NE VS V L S V D $ VS RE S Q G N N L K I DIV F CL L I L Y L R S H K V 6541 TGTGGTTACTTCTGTTATCGAATCGCAGATTAGTACTAAAGAGGTTAAGTTAAAGGGTGT 6600 CCV F CV G I A D * Y * R G * V KG C V VT S V M ES 0 I S T K DV K L K CV MW L L L L W N RE L V L KR L S * R V 6601 TAGAAAGACTGTTAAAATAGAAGATGCTATTATTGTTAATGATGAAAATAGTTCTATTAA 6660 * K DC * N FR CV Y C * * * K * F Y * R K TV K I ED A ll V ND ENS S I K L EEL L K * K ML ILL MM K IV IL 6661 GGTTGTTAAAAGTTTATCTTTAGTTGATGTTTGGGATATGTATTTGACAGGTTGTGATTA 6720 CC * K F IFS * CL G Y V F DR L * L V V K S IS L V DV W D MY LTD CD Y DL L K V VI * L M F G X C I * 0 V VI 6721 TGTTGTTTGGGTTGCTAATGAATTGTCACGCCTAGTTAAATCACCAACAGTTAGGGAATA 6780 CCL G C * * I V T PS * IT N S * CI V V W VANE L SR L V K S PT V REV M L F G L L M N C H A * L N H Q Q L G N 6781 TATACGATATGGTATTAAACCTATTACTATACCTATAGATTTGTTATGTTTAAGAGATGA 6840 Y T I WY * TV Y Y TV R F V M F KR * I R Y G I K PIT I P ID IL CL RD D IV D MV L N L L L VI * XCV V * EM 6841 TAATCAAACTCTTTTAGTTCCTAAAATTTTTAAAGCAAGAGCTATAGAATTTTATCCTTT 6900 * S N S F S S * N F * SK S Y RI LW F N O T L L V P K IF K A R A 1 2 F Y G F I I K L F * FL K FL K OR L * N F MV 6901 TTTGAAGTGGTTGTTTATTTATGTTTTTAGTTTATTACATTTTACAAATGATAAAACCAT 6960 FE V V V Y L CF * FIT F Y K * * N H L K W L F I Y V F S L L H FT ND K TX F * S G C L F M F L V Y Y I L Q M I K P 6961 TTTTTATACTACAGAAATAGCTTCTAAGTTTACTTTTAATTTGTTTTGTTTGGCTCTTAA 7020 PLY Y R N S F * V Y F * F V IF CS * F Y PT RI AS K FT F N IF CIA L K F F I L Q K * 1151 L L I C F V WI L
U.S. Patent Jan. 10, 2012 Sheet 15 of 119 US 8,092,994 B2 7021 AAATGCTTTTCAGACATTTAGATGGAGTATATTTATA.AAAGGTTTTCTTGTTGTAGCCAC 7060 K C F SD I * ME Y I Y KR F S C C S H N A F Q T FR MS IF I K SF L V V A T KM L F R H L OS V Y L * K V FL L P 7081 TGTGTTTTTGTTTTGGTTTAATTTTTTGTATATAAATGTTATTTTTAGTGACTTTTATCT 7140 C V F V L V * F F V Y K C Y F * * L L S V FL FM F N FLY I N VI F SD F Y L L CF CF G L I F C I * ML F L V T F I 7141 TCCTAATATTAGTGTTTTTCCTATTTTTGTGGGAAGAATTGTTATGTGGATAAAGGCTAC 7200 S * Y * CF S Y F C GK N C Y V O R G Y P N IS V F P IF VS RI V MW I K AT F L I L V F FL F LW RE L L CS * R L 7201 TTTTGGTTTGGTTACAATTTGTGATTTTTATTCTAAGTTAGGTGTAGGTTTTACAAGTCA 7260 FM F G Y N L * FL F * V RC R F Y K S F G L V TIC SF Y S K L G VS FT S H L L V ML Q F VI F IL S * V * V L Q V 7261 TTTTTGTAATGGTAGTTTTATATGTGAATTGTGTCATTCTGGTTTTGATATGTTGGATAC 7320 FL * W * F Y M * IV S F W F * Y VS Y F C N G S F ICE L C H S G F D ML ST IF V MV V L Y V N C V I L V L I C WI 7321 ATATGCAGCTATAGATTTTGTTCAGTATGAAGTAGATAGACGTGTTTTATTTGATTATGT 7380 IC S Y R F CS V * SR * T C F I * L C VA A IDF V Q YE VS R R V L F D Y V H M Q L * I L F S M K * I DV F Y L I M 7381 TAGTTTAGTCAAATTAATTGTTGAACTCGTTATTGGTTATTCATTATACACAGTATGGTT 7440 * F S Q INC * T R Y W L F II H S M V S L V K LIVE LVI G VS L Y TV W F L V * S N * L L MEL LVI H VT Q VS 7441 TTATCCATTATTTTGTCTTATTGGTTTACAATTATTTACTACATGGTTGCCTGATTTGTT 7500 L S I IL S Y W F TI I Y Y MV A * F V Y FL F CLI G L CL FT TM L PD L F F I H Y F V L L V Y N Y L L H SC L I C 7501 TATGTTAGAAACTATGCATTGGTTGATTAGATTTATTGTATTTGTAGCTAATATGTTACC 7560 Y V R NY A L V D * I Y CI CS * Y VT ML ET H H ML I R F IV F V A N ML P L C * K L CI G * LOLL Y L * L ICY
U.S. Patent Jan. 10, 2012 Sheet 16 of 119 US 8,092,994 B2 7561 TGCTTTTGTCTTCTTGCGGTTTTATATAGTTGTTACTGCTATGTATAAAGTAGTTGGTTT 7620 C F CLV A V L Y S C Y CIV * S SW F A F V L L R F Y I V V T A M Y K V V G F L L L S C C G F I * L L L L C I K * L V 7621 TATTAGGCATATTGTCTATGGTTGTAATAAAGCTGGTTGTTTATTTTGTTATAAACGAAA 7680 Y * A Y C LW L * * SW L FILL * T K IRK IVY CC N K AG CL F C Y K RN L L GIL S M V V I K L V V Y F V I NE 7681 TTGTAGTGTTCGTGTTAAGTGTAGTACTATTGTTGGTGGTGTAATTCGTTATTATGATAT 7740 L * C S C * V * Y Y C NW C N S L L * Y C S V R V K C ST I VG CVI R Y Y DI IV V F V L S V V L L LIV * F VIM I 7741 TACTGCTAATGGTGGTACTGGTTTTTGTGTTAAACATCAATGGAATTGTTTTAATTGCCA 7800 Y C * W WY W FL C * T S M E LF * L P TANGO TO F C V K H Q W N C F N C H L L L MV V L V F V L NI NO I V L I A 7801 TTCTTTTAAACCAGGTAACACTTTTATAACTGTAGAAGCTGCTATAGAACTTTCTAAAGA 7860 F F * T R * H F Y NCR S CV FT F * R SF K PG NT FIT V E A Al EL S KR ILL NO VT L L * L * K L L * NFL K 7861 GCTTAAACCACCTGTAAATCCAACTGATGCTTCACATTATGTACTTACTGATATTAAGCA 7920 A * T T C K S N * CF T L CRY * Y * A L KR P V N PT DASH Y V VT D I K Q S LW DL * I Q L ML HIM * L L I OS 7 921 AGTTGGTTGTATGATGCGTTTGTTCTATGATAGAGATGGACAGCGTGTTTACGATGATGT 7980 SW L Y OAF V L * * R W TACO R * C V CC MM R L F Y OR DC Q R V 100 V K L V V * C V C S M I E M OS V F TM M 7 981 TGATGCTACTTTATTTGTAGATATTAATAATCTGTTACATTCTAAAGTTAAAGTTGTTCC 8040 * C * F ICR Y * S VT F * S * SC S D AS L F V DIN N L L H S K V K V V P L ML V IL * IL II CV ILK L K L F 8041 TAATTTGTATGTAGTTGTAGTAGAGAGTGATGCTGATAGAGCTAATTTTCTGAATGCTGT 8100 * F V C SC SF E * C * * S * F S E C C N LIV V V V ES DAD RAN FL N A V LICK * L * * NV M L I E L I F * ML FIG. 2 NT.
U.S. Patent Jan. 10, 2012 Sheet 17 of 119 US 8,092,994 B2 8 101 TGTGTTTTATGCACAATCATTGTATAGGCCTATATTACTTGTAGACAAAAAGTTAATTAC 8160 CV L C TI IV * A Y IT C R Q K V NY V F Y A Q SLY R P I L L V D K K LIT L C FM H N H CI CL Y Y L * T K S * L 8161 TACAGCTTGTAATGGTATCTCTGTAACCCAGACTATGTTTGATGTTTATGTTGATACTTT 8220 Y S L * WY L C N P D Y V * CL C * Y F TA C NO ISV T Q TM F DV Y V D T F L Q L V MV DL * PR L CL M F ML IL 8221 TATGTCTCATTTTGATGTTGATAGAAAGAGTTTTAATAATTTTGTTAACATTGCTCATGC 8280 Y V SF * C * * HE F * * F C * H C SC MS H F DV DR K S F N N F V N IA H A L CLI L ML I ER V LII L L T L L M 8281 TTCTCTTAGAGAGGGTGTGCAATTAGAAAAGGTTTTAGATACTTTTGTGGGATGTGTACG 8340 F S * R G C A IRK G FRY FCC MC T S L RE CV Q L E K V L D T F V C CV R L L L ER V C N * KR F * ILL W DV Y 8341 TAAATGTTGTTCCATTGATTCAGATGTTGAAACAAGATTTATTACTAAATCTATGATATC 8400 * ML FM * FR C * N HI Y Y * IF DI K C CD ID SD VET R FIT K S M I S V N V VP L I Q ML K Q DL L L N L * Y 8401 TGCAGTAGCTGCTGGTTTGGAATTTACTGATGAAAATTATAACAATTTGGTACCTACATA 8460 CS SC W F G I Y * * K L * Q F CT Y I A VA AG LEFT DENY N N L V PT Y L Q * L L V W N L L M K II TI WY L H 8461 TTTAAAGAGTGATAATATTGTAGCTGCTGATTTAGGTGTTCTTATACAGAATGGTGCTAA 8520 F HE * * Y CS C * FR CS Y T E W C * L KS D N IV A A DL G V L I Q N G A K I * R VII L * L L I * V FL Y R MV L 8521 GCATGTACAGGGTAATGTTGCTAAGGCAGCTAATATTTCTTGTATATGGTTTATTGATGC 8580 ACT G * C C * G S * Y FLY MV Y * C H V Q G N V AK AAN IS CI W F IDA S M Y R V ML L R Q L I F L V Y CL L M 8581 TTTTAATCAACTTACTGCTGATTTACAGCATAAATTAAAAAAAGCATGTGTTAAAACTGG 8640 F * STY C * F TA * I K K S M C * NW F N Q L TAD L Q H K L K HA CV K T G L L I N L L LIV SIN * K KM V L K L
U.S. Patent Jan. 10, 2012 Sheet 18 of 119 US 8,092,994 B2 8641 CTTGAAGTTAAAATTGACTTTTAATAAGCAAGAGGCAAGTGTCCCTATTCTTACAACACC 8700 L E V K ID F * * AR G K C P Y S Y NT L K L K L T F N K Q E A S V P IL VT P A * S * N * L L IS KR Q VS L FL Q H 8701 CTTTTCACTTAAAGGAGGTGTTGTATTGAGTAATTTGTTATATATATTATTTTTTGTTAG 8760 L FT * R FCC I E * F VI VII F C * F S L KG G V V L S N L L Y IL F F VS P F H L K E V L Y * VI C Y I Y IF L L 8761 TTTAATCTGTTTTATATTATTGTGGGCTTTATTGCCTACATATAGTGTTTATAAGTCTGA 8820 F N L F Y I IV CF IA Y I * CL * V * L I CF ILL WALL PT YE V Y K CD V * S V L Y Y CCLV CL H IV F I CL 8821 TATTCATTTGCCTGCTTATGCTAGTTTTAAAGTTATTGATAATGGTGTTGTTAGAGATAT 8880 Y SF A CL C * F * $ Y * * W CC * R Y I H L P A VA S F K VIP N G V V R DI IF IC L L ML V L K L L I MV L L HI 8881 TTCAGTTAATGATTTATGTTTTGCTAATAAATTTTTCCAATTTGATCAATGGTATGACTC 8940 F S * * FM F C * * IF P I * S M V * V S V N PLC F A N K F F Q F D Q WY ES F Q L MI Y V L L I N F S N L I N GM S 8 94 1 CACTTTTGGGTCTGTTTACTATCATAATTCTATGGATTGCCCTATTGTAGTGGCAGTTAT 9000 H F WV CL L S * F Y G L P Y CS G $ Y T F CS V Y Y H N S M D C P I V VA V M FL L G L F T ill LW IA L L * W Q L 9001 CCATGAAGATATCGGTTCTACTATGTTTAATGTTCCTACTAAAGTTTTGAGACATGGCTT 9060 G * R Y R F Y Y V * CCV * SF E T W L CE DIG ST M F N V PT K V L R H CF W M K ISV L L CL M FL L K F * DMA 9061 TCATGTTTTACATTTTTTAACTTATGCATTTGCTAGTGATAGTGTTCAGTGCTATACACC 9120 SC FT F F N L C IC * * * CS V L VT H V L H FL TV A F A S DCV Q CIT P FM F Y IF * L M H L LVI V F S Al H 9121 ACATATTCAGATTTCTTATAATGATTTTTATGCTAGTGGTTGTGTTTTATCATCTTTGTG 9180 TV SD FL * * FL C * W L CF II F V HI Q ISV ND F VA S G CV L CS L C HI FR F L I M IF ML V V V F Y H L C
U.S. Patent Jan. 10, 2012 Sheet 19 of 119 US 8,092,994 B2 9181 TACTATGTTTAAAAGAGGTGATGGTACACCACATCCTTATTGTTATTCAGATGGTGTTAT 9240 Y Y V * KR * WY FT S L L L F R W C Y TM F KR G D CT P H P Y C VS DCV M V L CL KR V MV H H IL IVI Q MV L 9241 GAAGAATGCTTCTTTGTATACATCTTTGGTTCCACATACACGTTATAGCCTTGCTAATTC 9300 EL C F F V Y I F G ST VT L * P C* F K NSL VT CLV PH TRY CLAN S * R ML L CI H LW F H I H VIAL L I 9301 TAATGGTTTTATAAGATTTCCTGATGTTATTAGTGAAGGTATTGTACGTATTGTAAGAAC 9360 * W F Y K I S * CV * * R Y CT Y C K N N CF I R F PD VISE CI V R I V R T L MV L * D FL ML L V K V L Y V L * E 9361 GCGCTCTATCACTTATTGTAGAGTGGGTGCATGTGAATACGCCGAAGAGGGTATATGTTT 9420 AL Y DL L * S GC M * I R R R GYM F R S M T Y C R VGA CE Y A EEC IC F R AL * LIVE WV H V NT P K R V Y V 9421 TAATTTTAATAGTTCCTGGGTTTTGAATAATGATTATTATAGAAGTATGCCTGGAACTTT 9480 * F * * FL G F E * * L L * K YAW N F N F MS SW V L N N D Y Y R S M P G T F L I L I V P CF * I M III E V CL EL 9481 TTGTGGTACAGATCTTTTTGATTTGTTTTATCAATTTTTTAGTAGTTTAATTCGTCCTAT 9540 LW * R SF * F VLSI F * * F N S S Y CC RD L F DL F Y Q FSS L I R P I F V V E IF L I C FIN F L V V * F V L 9541 AGATTTCTTTTCTCTTACTGCTAGTTCTATTTTTGGAGCTATATTGGCTATAGTTGTTGT 9600 R FL F S Y C * F Y F W S Y ICY SCC D F PS L TA SR I F G AI L A IV V V I S F L L L L V L FL ELY W L * L L 9601 CTTGGTTTTTTATTATTTAATAAAACTTAAGCGTGCTTTTGGAGATTATACTAGTGTTGT 9660 L G FL L F NET * A C F W ELY * C C L V F Y Y L I K L K R A F CDV T CV V SW F F I I * * N L S V L L E li L V L 9661 AGTTATAAATGTTGTTGTTTGGTGTATTAATTTTCTTATGCTTTTTGTTTTTCAAGTTTA 9720 SIX CCCLV Y * F S V A F C FR CL VI N V V V W CI NFL ML F V F Q V Y * L * ML L F G V L I FL C FL F F K F
U.S. Patent Jan. 10, 2012 Sheet 20 of 119 US 8,092,994 B2 9721 TCCTATTTGTGCATGTGTTTATGCTTGTTTTTATTTTTATGTAACATTGTATTTTCCTTC 9780 S Y L C MC L CL FL FL C N IV F S F PICA C V VA C F Y F Y VT L Y F PS IL F V H V FM L V F IF M * H CI FL 9781 TGAAATTAGTGTAATTATGCATTTGCAATGGATTGTTATGTATGGTGCTATAATGCCTTT 9840 * N * C N Y A F A M DC Y V W CV N A F El S VIM H L Q WI V MY GA I M P F L K L V * L C IC HG L L CMV L * CL 9841 TTGGTTTTGTGTCACATATGTAGCTATGGTTATTGCAAACCATGTTTTATGGTTATTTTC 9900 L V L C H IC S Y G Y C K P C F MV IF W F C V TV VA MV IAN H V LW L F S F G F V S H M * LW L L Q TM F Y G Y F 9 901 ATATTGTAGGAAAATTGGTGTTAATGTATGTAGTGATAGTACATTTGAAGAAACATCTCT 9 960 IL * E N MC * CM * * * VI * RN IS Y C R K IC V NV C SD ST F SETS L H IV G K L V L MV V V IV H L K K ML 9 961 TACTACTTTTATGATTACTAAAGATTCTTATTGTAGATTAAAGAATTCTGTTTCTCATGT 10020 Y Y F Y DV * R FL L * IKE F C F * C T T F MIT K D S Y C R L K N S VS DV L L L L * L L K IL IV D * RI L FL M 10 02 1 TGCCTACAATAGATATTTGAGTTTGTATAATAAGTATCGTTACTATAGTGGTAAAATGGA 10 080 CL Q * IF SF V * * V S L L * W * N G A Y HR Y L S L Y N K Y R Y Y SC K MD L PT ID I * V C 1 1 51 VT IV V KM 10081 TACTGCTGCCTATAGAGAAGCGGCGTGTTCTCAGTTAGCTAAAGCTATGGAAACATTTAA 10140 Y CCL * R DC V F S VS * S Y G NI * TA A Y SEA A C SQL A K A MET F N ILL PIE KR R V L S * L K LW K H L 10141 TCACAATAATGGTAATGATGTCTTATACCAACCTCCTACAGCATCTGTTTCTACATCTTT 10200 S Q * W * * CLIP T S Y SIC F Y IF H N N C N D V L Y Q P P T A S V S T S F IT I MV MM S Y TN L L Q ML FL H L 10201 TTTGCAATCAGGTATTGTAAAGATGGTATCTCCTACGTCAAAAATTGAACCTTGTATTGT 10260 F AIR Y C K D G 1 9 Y V K N * T L Y C L Q SC I V K MV S PT SKI E PCI V F C N Q V L * R WV L L R Q K L N L V L
U.S. Patent Jan. 10, 2012 Sheet 21 of 119 US 8,092,994 B2 10261 TAGTGTTACTTATGGTAGTATGACTTTGAATGGTTTATGGTTAGATGACAAAGTTTATTG 10320 * CI L W * Y D FE W F MV R * Q DL L S VT Y OS MT L NO LW L D D K V Y C L V L L MV V * L * MV Y G * MT K F I 10321 TCCTCGTCATGTTATATGTTCATCCTCTAATATGAACGAACCTGATTATTCTGCCTTATT 10380 S S S CI M F IL * YE R T * L F CLI PR NV I CS S S NM NE PD Y SAL L V L V ML Y V H P L I * TN L I IL. P Y 10381 GTGTAGAGTTACTCTAGGTGATTTTACTATAATGTCTGGTCGGATGAGTTTAACAGTTGT 10440 V * S Y SR * F Y Y NV W SD E F N SC C R VT L G D F TIM S GEMS L TV V CV EL L * V ILL * CLV G * V * Q L 10441 GTCTTACCAGATGCAGGGCTGTCAACTTGTTTTGACAGTCTCTTTACAAAATCCTTACAC 10 500 V L PD AG L ST CF D S L FT K DL H S Y Q M Q G C Q L V L TV DL Q NP Y T CL T R C R A V N L F * Q SLY KILT 10501 TCCAAAATATACTTTTGGTAATGTTAAACCTGGTGAAACTTTTACTGTTTTAGCTGCGTA 10560 SKI Y F W * C * T W * N F Y C F DCV P KY T F ON V K PG ST F TV LA A Y L Q N ILL V ML N L V K L L L F * L R 10561 TAATGGCCGACCACAAGGGGCATTTCATGTTACTATGCGTACTAGTTATACTATTAAAGG 10620 * W PT T R G IS C Y Y A * * L Y Y * R N GE P Q GA F H VT MRS DV T I K G I MAD H K OH F ML L CV V VI L L K 10621 TTCTTTTTTGTGTGGGTCATGTGGATCTGTTGGTTATGTATTAACAGGTGATAGTGTTAA 10680 F F F V W V MW IC W L CI NE * * C * S FL C G S C G S V G Y V L T G D S V K V L PC V G H V DL L V MY * Q VI V L 10681 GTTTGTATATATGCATCAATTAGAGCTCAGTACTGGTTGTCACACTGGCACTGATTTTAC 10740 V C I VA S I RAQ Y W L S H W H * F Y F V Y M H Q L EL ST G C H TOT D FT SLY IC IN * S S V L V VT LA L I L 10741 TGGTAATTTTTATGGTCCATATAGAGATGCTCAAGTTGTACAGTTGCCAGTTAAGGACTA 10800 W * F LW S I * R CS SC TV AS * G L ON F Y O P Y R D A QV V Q L P V K D Y LVI F MV HI EM L K L Y SC Q L R T
U.S. Patent Jan. 10, 2012 Sheet 22 of 119 US 8,092,994 B2 10801 CGTCCAGACTGTTAATGTTATTGCTTGGCTCTATGCAGCTATACTTAATAATTGTGCTTG 10860 R P DC * CV CL AL Cs VT * * L CL V Q TV NV I AWL Y A A IL N N CAN TSR L LW L L L CNN Q L Y L I IV L 10861 GTTTGTACAAAATGATGTTTGTTCTACTGAAGATTTTAATGTTTGGGCTATGGCAAATGG 10920 V CT K * CL F Y * R F * CL CV G KM F V Q N DV C STE CF N V WA MAN G G L Y K M HP V L L KILN F G LW OW 10921 TTTTAGCCAAGTAAAAGCAGATCTTGTCTTAGATGCTTTGGCTTCAATGACAGGTGTTTC 10 980 F * P S K SR S CL R C F CF N DR CF F SO V K AD L V L D ALAS MT G VS V LA K * K QILS * ML ML Q * Q V F 10981 TATTGAAACTTTATTGGCTGCTATTAAGCGTCTATATATGGGATTTCAAGGTCGTCAAAT 11 040 Y * N F 1 0 C Y * A S I Y G IS R S S N INTL LA AIR R L Y M G F Q GRQI L L K L Y W L L L S V Y I W D F K V V K 11041 ACTAGGAAGTTGTACTTTTGAAGATGAATTGGCACCTTCTCACGTTTATCAACAATTGGC 11 100 T R K L Y F * R * I G T F * R LET IC L GE CT FED E LAPS DV Y Q Q LA Y * NV V L L K MN W ML L T FIN NW 11101 TGGTGTTAAATTGCAATCTAAAACAAAAAGATTTATTAAAGAAACAATTTATTGGATTTT 11160 W C * I A I * N K K I Y * R N N` L L D F G V K L Q S K T KR F IKE TI Y WI L L V L N C N L K Q K DL L K K Q FIG F 11161 GATATCTACATTTTTGTTTAGTTGTATAATTTCTGCATTTGTTAAATGGACTATATTTAT 11220 DIV I? V * L Y N F C IC * MDVI Y I S T FL FEC II SAP V K MT IF M * Y L H F CLV V * FL H L L N CLI L 11221 GTATATTAATACACATATGATTGGTGTTACATTATGTGTACTTTGTTTTGTTAGTTTTAT 11280 V Y * Y TV D MCVI MC T L PC * F Y Y IN TM MI G VT L CV L CF V SF M C IL IN I * L V LW Y V IF V L L V L 11281 CATGTTACTAGTTAAACATAAGCATTTTTATTTGACTATGTATATAATTCCTGTACTCTG 11340 D VT S * T * A FL PD Y V Y N SC T L ML L V K H K H F Y L TN VI I P V L C * C Y * L N ISI Fl * L CI * FLY S FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 23 of 119 US 8,092,994 B2 11341 TACCTTGTTTTATGTAAATTATTTAGTTGTTTATAAGGAAGGTTTTAGAGGTTTTACTTA 11400 Y L V L C K L F S CL * SR F * R FT L T L F Y V NY L V V Y KEG F KG FT Y VP C FM * I I * L FI R K V L E V L L 11401 TGTCTGGCTCTCATATTTTGTTCCTGCTGTGAATTTTACTTATGTTTATGAAGTATT'PTA 11460 CL A L I F C S C C E F Y L CL * S I L V W L ST F VP A V N FT TV YE V FT MS G S H IL FL L * ILL M FM KY F 11461 TGGTTGTATTTTATGTGTTTTTGCTATTTTTATAACTATGCATAGTATTAATCATGACAT 11520 W L Y F MC F CT FT NY A * Y * S * H G CI L CV F A IF IT M H SIN MDI MV V F TV FL L FL * L CIV L I MT 11521 TTTTTCTTTGATGTTTTTGGTTGGTAGAATAGTTACTTTAATTTCTATGTGGTATTTTGG 11580 F F F DV F G W * N $ Y F N F TV V F W F S L M F L V G R IV T L I S M NY F G F FL * CF W L V E * L L * FL C C IL 11581 GTCGAATTTAGAAGAGGATGTTTTGTTATTTATTACAGCCTTTTTAGGTACTTATACATG 11640 V E F KR SC F V I Y Y S L F R Y L TM S N LEE DV L L FIT A FL STY TN SRI * KR M F CV L L Q P F * V L I H 11641 GACCACTATTTTGTCATTAGCTATAGCAAAAATTGTTGCTAATTGGTTGTCTGTTAATAT 11700 DR Y F VI ST S K N C C * L V V C * Y T TI L S LA I A K IVAN W L S V NI G P L F CM * L * Q K L L L I G CL L I 11701 ATTTTATTTTACAGATGTACCTTATATTAAATTGATTCTCTTGAGTTACTTATTTATAGG 11760 IL F Y R CT L Y * ID S L EL L I Y R FY FT DV PT I K L I L L ST L FIG Y F IL Q MY L I L N * F S * VT Y L 11761 GTATATTTTATCTTGTTATTGGGGATTTTTCTCTCTTTTAAACAGTGTTTTTAGAATGCC 11820 VT FILL L CI FL SF K Q C F * N A TI L SC Y HG F F S L L N S V F R M P SIFT LVI G D F S L F * TV FL E C 11821 TATGGGTGTTTATAATTATAAAATTTCTGTTCAAGAATTGCGTTATATGAATGCTAATGG 11880 Y G CL * L * NP C SRI A LYE C * W MG V TNT K I S V Q EL R TM HAN G LW V F I I I K FL F K NC VI * ML M
U.S. Patent Jan. 10, 2012 Sheet 24 of 119 US 8,092,994 B2 11881 CTTACGTCCACCTCGTAATAGTTTTGAGGCTATTTTGTTAAATTTAAAACTGCTTGGAAT 11940 L T ST S * * F * G Y F V K F K TA W N L R P P RN S FE A ILL N L K L L G I A Y V H L V IV L EL F C * I * N CL E 11941 AGGTGGCGTGCCAGTTATTGAAGTCTCCCAAATTCAATCAAAATTGACTGATGTGAAATG 12000 E WE A S Y * S L P N S I K ID * CS M GO VP VIE VS Q I Q S K L T DV K C * V AC Q L L KS P K F N Q N * L M * N 12001 TGCTAATGTTGTTTTGTTAAATTGTTTACAGCATTTGCATGTTGCTTCTAATTCTAAGTT 12060 C * CC F V K L F TA F A C C F * F * V A N V V L L N CL Q H L H VA S N SK L V L ML F C * IVY SIC ML L L IL S 12061 GTGGCAGTATTGTAGTGTTTTACATAATGAAATACTATCTACTTCAGATTTGAGTGTAGC 12 12 0 V A V L * CF T * * N TI Y FR FE CS W Q Y C S V L H NEIL ST SD L S VA C G S I V V F Y IN KY Y L L Q I * V * 12 12 1 TTTTGATAAGCTTGCTCAATTATTGATTGTTTTATTCGCCAATCCTGCTGCAGTTGATAC 12180 F * * A C S I I DC F I EQS CC S * Y F D K LA Q L LIV L F A N P A A V DT L L I EL L NY * L F Y SPILL Q L I 12 181 TAAGTGTCTTGCAAGTATAGATGAAGTTAGCGATGATTATGTTCAAGATAGTACCGTTTT 12240 * V S C K Y R * S * R * L C SR * Y R F K C LAS IDE VS D D Y V Q D ST V L L S V L Q V * M K LAM IN F K IV P F 12241 GCAGGCTTTGCAAAGTGAGTTTGTAAATATGGCTAGTTTTGTTGAATATGAAGTCGCAAA 12 300 A OF A K * V C KY 0 * F C * I * SR K Q AL Q SE F V N MA SF V EYE VA K C EL C K VS L * I W L V L L NM K S Q 12301 GAAAAATTTGGCTGATGCTAAAAATAGTGGTTCTGTTAATCAACAACAGATAAAACAGTT 12360 E K F G * C * K * W F C * ST T D K TV K N LAD A K N SOS V N Q Q Q I K Q L R Kr W L ML K IV V L L I N N R * N S 1236? AGAAAAAGCATGTAATATAGCTAAGTCTGTGTATGAACGTGATAAAGCTGTAGCTCGCAA 12420 RE S M * Y S * V CV * T * * SC S S Q SE A C N IA KS V YE RD K A VAR K * K K H VI * L S L CNN V I K L * LA FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 25 of 119 US 8,092,994 B2 1242]. ACTTGAACGTATGGCAGACCTAGCACTTACTAACATGTATAAAGAGGCTCGGATTAATGA 12480 T * TV G R PS T Y * H V * R CS D * * L E P.M AD LA L T NM Y K E A RIND ML N V W Q T * H L L T C I KR L CL M 124 8 1 TAAGAAGAGTAAAGTTGTTTCCGCTTTGCAGACAATGCTTTTTAGCATGGTTCGTAAATT 12540 * E E * SC FR F A DNA F * H CS * I K KS K V VS AL Q TM L F S MV R K L I R R V K L F P L C R Q CF LAW F V N 12541 GGATAATCAGGCTTTAAATTCTATTCTGGATAATGCTGTTAAAGGTTGTGTACCTTTGAG 12600 G * S G F K F Y SC * cc * R L CT FE CM Q AL N S I L DNA V K CCV P L S W I I R L * IL F WI ML L K V V Y L * 12601 TGCTATTCCAGCATTGGCTGCTAATACTTTAACTATAGTAATACCAGATAAACAAGTTTT 12660 C Y S SIC C * Y F NY S N VP. * T SF Al PAL A ANT L TI VIP D K Q V F V L F Q NW L L I L * L * * Y Q INK F 12661 TGATAAAGTTGTTGATAATGTTTATGTTACATATGCTGGTAGTGTATGGCATATACAGAC 12720 * * SC * * CL C Y IC W * CM A Y T D D K V V D N V Y VT Y A CS V W H I 0 T L I K L L I M F ML H M L V V Y G I Y R 12721 TGTTCAAGATGCTGATGGTATTAATAAACAGTTAACTGATATTAGTGTTGATTCTAATTG 12780 C SR C * WY * * TV N * Y * C * F * L V Q D A D G I N K Q, L T D I S V D S N W L F K ML MV L I N S * L I L V L I L I 12781 GCCTCTTGTTATCATTGCGAACAGGTATAATGAAGTTGCTAATGCTGTTATGCAGAATAA 12840 A SC Y H C E Q V * * SC * CC Y AS * P L V I IAN R Y N EVA N A V M Q N N CL L L S L R T CI M K L L ML L C R I 12 84 1 TGAGTTGATGCCTCATAAATTAAAAATACAAGTTGTTAATAGTGGTTCTGATATGAATTG 12900 V D AS * I K NT SC * * WI * Y EL EL MPH K L K IC VV N SC SCM NC MS * CLI N * KY K L LIV V L I * I 12901 TAATATTCCTACTCAATGTTATTATAATAATGGTAGTAGTGGTAGAATAGTTTATGCTGT 12 960 * Y S Y S M L L * * W * * W * NC L CC NI PT Q C Y Y MMCC SC R IVY A V VI FL L N VIII MV V V VS * F ML
U.S. Patent Jan. 10, 2012 Sheet 26 of 119 US 8,092,994 B2 12 961 TCTTAGTGATGTTGATGGTCTTAAGTATACTAAGATAATGAAAGATGATGGAAATTGTGT 13020 S * * C * W $ * V Y * ON E R * W K L C L S DV D G L K Y T KIM K DO G N CV F L V ML MV L $ ILK * * KM ME IV 13021 TGTTTTAGAGCTTGATCCTCCTTGTAAATTTTCTATACAAGATGTTAAGGGACTTAAAAT 13 080 C FR A * S EL * I F Y T R C * G T * N V L EL OP P C K F S I Q DV KG L K I L F * S L IL L V N F L Y KM L RD L K 13 081 TAAGTATCTTTATTTTATTAAAGGATGTAACACTTTAGCTAGAGGGTGGGTTGTTGGTAC 13140 VS L F Y * KM * H F S * NV G C WY KY L Y F I KG C NT LA KG WV V G T L S I FILL K DV T L * L ES CL L V 13 141 TTTATCTTCAACAATTAGATTGCAGGCTGGTGTTGCTACTGAGTATGCAGCTAATTCTTC 13200 F IF N N * IA SW CCV * V CS * F F LEST I EL Q AG VA T E Y A A N ES L Y L Q Q L DC R L V L L L S M Q L I L 13201 TATACTTTCATTATGTGCATTTTCTGTAGATCCTAAGAAAACTTATTTAGATTATATACA 13260 VT F I MCI F C R S * EN L FR L VT I L EL CAPS V OP K K T Y L D Y I Q L Y F NY V H FL * ILK K L I * I I Y 13261 ACAAGGTGGTGTACCTATAATTAATTGTGTTAAAATGCTCTGTGATCATGCTGGTACTGG 13320 TRW C T Y N * L C * N AL * SC WY W Q GO V P I I N C V KM L C DH AG PG N K V V Y L * LIV L K C S VIM L V L 13321 TATGGCCATTACTATTAAACCTGAGGCTACTATTAACCAAGATTCTTATGGTGGTGCCTC 13380 Y SHY Y * T * G Y Y * PR F LW W CL MA IT I K P SATIN Q D S Y G GAS V W P L L L N L R L L L T K I L M V V P 13381 AGTTTGTATTTATTGCCGTGCACGTGTAGAGCATCCAGATGTAGATGGTATATGTAAATT 13440 SLY L L PC PC EASE C R W Y M * I V C I Y C RAN VS H PD V D SICK L OF V F IA V H V * SIG M * M V Y V N 13441 ACGTGGTAAATTTGTACAAGTCCCTTTGGGTATAAAAGATCCTATTCTTTATGTGTTAAC 13 500 TN * ICT S P F DYERS? S L C V N R GK F V Q VP L 0 I K DPI L Y V L T Y V V N L Y KS LW V * K IL F F MC * FIG. 2 CONT,
U.S. Patent Jan. 10, 2012 Sheet 27 of 119 US 8,092,994 B2 13501 ACATGATGTTTGTCAAGTCTGTGGTTTTTGGAGAGATGGCAGTTGTTCCTGTCTAGGTTC 13560 T * CL S S LW FL ER W Q L FL C R F H Dv C Q V C G F W RD G Sc S CV G $ H MM F V KS V V F GEM A V VP V * V 13561 AAGTGTCGCTGTTCAATCTAAAGATTTAAATTTTTTAAACGGGTTCGGGGTACTAGTGTG 13620 K C R CS I * R F K F F KR V HG T S V S VA V Q SK DL N FL N CF CV L V Q VS L F N LXI * IF * T CS CV * C 13 62 1 AATGCCCGGCTAGTACCCTGTGCTAGTGGTTTATCTACTGATGTTCAATTAAGGGCATTT 13680 N AR L V PC A S G L ST DV Q L HA F MPG * Y P V L V V Y L L M F N * G H L E CPA ST L C * W Fly * CS I K G I 13681 GACATTTGTAATACCAATAGAGCTGGTATAGGTTTATATTATAAAGTGAATTGTTGCCGT 13740 DI C N TN RAG I G L Y Y K V N CC R T F VI PIE L V * V VII K * IV A V * H L * Y Q * SW Y R F IL * SELL P 13 74 1 TTTCAGCGTATAGATGACGACGGTAATAAATTGGATAAGTTCTTTGTTGTCAAAAGAACT 13800 F Q RID D D G N K L D K F F V V K R T F S V * MT TV IN WI S S L L S K EL F SAY R * R R * * IC * V L CC Q K N 13801 AATTTAGAAGTTTATAATAAAGAGAAAACTTATTATGAGTTGACTAAAAGTTGTGGTGTT 13 860 N LEVY N K EXT Y YE L TX S CCV I * K F II KR K L I MS * L K V V V L * FR S L * * REM L L * V D * K LW C 13861 GTGGCTGAACATGATTTCTTTACATTTGATATTGATGGTAGTCGCGTGCCACATATAGTT 13920 VA E H D F FT F DID CS R V P H IV W L N MIS L H L I L MV V AC H I * F C G * T * F LVI * Y * W * SR AT VS 13 92 1 CGTAGGAATCTTTCAAAGTATACTATGTTAGATCTTTGCTATGCATTGCGTCATTTTGAT 13980 R R N L SKY TM L DL CV AL R H F D V CI F Q S I L C * IF A MMCVI L I S * ES F K V Y Y V R S L L CIA SF * 13981 CGTAATGATTGTTCAATATTGTGTGAAATTCTTTGTGAGTATGCTGATTGTAAAGAATCC 14 040 RN DC S I L CE IL CE VA DC K ES V MI V Q Y C V K F F V S M LIV K NP S * * L F N IV * N S L * V C * L * RI
U.S. Patent Jan. 10, 2012 Sheet 28 of 119 US 8,092,994 B2 14 04 1 TACTTTTCTAAGAAAGATTGGTATGATTTTGTTGAAAATCCTGATATTATTAATATATAT 14 100 Y F S K K D WY D F V EN PD I I N I Y T FL R K I G M ILL K I L I L L I Y I L L F * ER L V * F C * KS * Y Y * Y I 14 10 1 AAAAAATTAGGCCCTATTTTTAATAGAGCTTTACTTAATACTGTCATTTTTGCAGACACC 14160 K FL G P I F NE ALL N TV IF AD T K N * AL F LIE L Y L I L SF L Q T P * K IA P Y F * * SF T * Y C H F C R H 14161 TTAGTTGAAGTAGGTTTAGTTGGTGTTTTAACTTTAGATAACCAAGATTTGTATGGTCAA 14220 L V E V G L V G V L T L D NQ D L Y G Q * L K * V * L V F * L * IT K IC MV N L S * SR F SW CF N FR * PR F V VS 14221 TGGTATGATTTTGGTGATTTTATACAAACAGCCCCAGGGTTTGGTGTGGCAGTTGCAGAT 14280 W ID PS D F I Q TA PG F G VA V AD G NIL V I L Y K Q P Q G L V W Q L Q I MV * F W * F Y TN SPA V W CC S C R 14281 TCTTACTATTCTTATATGATGCCTATGTTGACTATGTGTCATGTATTACATTGTGAATTA 14340 S Y Y S Y M MS ML TM C H V L D CE L L TI L I * CL C * L CV MY * IV NY FL L FLY DAY V D Y VS CI R L * I 14341 TTTGTTAATGATAGTTATAGACAATTCGATCTTGTACAGTATGATTTTACTGATTACAAG 14400 F V ND S Y R Q F DLV Q Y D FT D Y K L L M IV ID N S I L Y SHILL ITS IC * * * L * TIES C TV * F Y * L Q 14401 TTAGAGTTGTTTAATAAGTATTTTAAGTATTGGGGTATGAAGTATCATCCTAATACTGTG 14460 L ELF N KY F KY W GM KY H P N TV * SC L I S I L S I G V * 511 L I LW V R V V * * V F * V L G YE VS S * Y C 14461 GATTGTGATAATGATAGGTGTATTATTCATTGTGCTAATTTTAATATACTATTTAGTATG 14520 D CD N DR CII H CAN F NIL F S M I VIM I G V L F IV L I L I Y Y L V W G L * * * * V Y VS L C * F * Y TI * Y 14521 GTTTTACCTAATACTTGTTTTGGTCCCCTTGTTAGACAAATTTTTGTAGATGGTGTACCG 14580 V L P NT CF GP L V R Q IF V DC VP F Y L I L V L V P L L D K FL * MV Y R G FT * Y L F VS PC * TN F C R W CT FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 29 of 119 US 8,092,994 B2 14 581 TTTGTTGTTTCTATTGGTTACCATTACAAAGAGTTAGGTGTAGTTATGAACTTAGATGTT 14640 F V Vs I G Y H Y K EL CV V MN LB V L L FL L V T IT KS * V * L * T * ML V CC F Y W L P L Q R V R CS YE L SC 14 641 GACACACACCGTTATCGTTTGTCTCTTAAAGATTTACTTCTTTATGCAGCAGATCCTGCT 14700 D T HR Y R L S L K D L L L Y A A D P A T H TV IV CL L K I Y F FM Q Q ILL H T P L SF VS * R FT S L CS R SC 14701 ATGCACGTTGCATCTGCTAGTGCTCTGCTTGATTTACGAACTTGTTGTTTTAGTGTAGCT 14760 NH VA S A S AL LB L R T CC KS V A CT L H L L V L CL I YE L V V L V * L VAR CI C * CS A * FT N L L F * CS 14761 GCCATTACAAGTGGTATAAAATTTCAAACTGTAAAACCAGGTAACTTTAACCAAGACTTT 14820 A IT SC I K F Q TV K PG N F NO D F PLQ V V * N F K L * N Q VT L T K T F C H Y K WY K IS NC NT R * L * PR L 14821 TACGAGTTTGTTAAAAGTAAAGGCTTGTTTAAAGAGGGTAGTACAGTTGATTTGAAACAT 14880 YE F V K S KG L F KE GS TV D L KR T S L L K V K A CL KR V V Q L I * N I L R V C * K * R L V * KG * Y S * F NT 14881 TTTTTCTTTACTCAAGATGGTAATGCTGCAATTACTGATTATAATTATTATAAGTATAAT 14 940 F F FT Q DC N A A IT D Y NY Y KY N F S L L KM V ML Q L L I I I I I S I I F FLY SR W * C C NY * L * L L * V * 14 94 1 TTACCTACTATGGTTGATATTAAGCAGTTATTGTTTGTATTAGAAGTTGTTTATAAATAT 15 000 L PT MV D I K Q L L F V L E V V Y KY Y L LW L I L SS Y CL Y * K L FIN I FT Y Y G * Y * AIV CI R SC L * I 15001 TTTGAAATTTATGATGGTGGTTGTATACCAGCATCACAAGTTATTGTTAATAATTATGAT 15060 F E I VII G G C I P A S Q V IV N N Y D L K FM MV V V Y OH H K L L LII MI F * N L * W ML VT SITS Y C * * L * 15061 AAAAGTGCTGGTTATCCATTTAATAAATTTGGTAAAGCCAGACTTTATTATGAGGCATTA 15120 K SAG Y P F N K F G K AR L Y VEAL K V LVI H L I N L V K P D F I M FRY * K C W L S I * * I W * S Q T L L * CI FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 30 of 119 US 8,092,994 B2 15121 TCATTTGAGGAACAGAATGAAATTTATGCATATACTAAACGTAATGTTCTGCCCACCTTA 15180 SF E E Q NE I Y A Y T KR NV L PT L H L R NE M K FM H IL N V M F C pp I I * GT E * N L CI Y * T * CS A H L 15181 ACTCAAATGAATTTAAAATATGCTATCAGTGCTAAGAATAGAGCTCGCACTGTAGCAGGT 15240 T Q MN L KY A ISA K NEAR TV AG L K * I * NM L S V L RI EL AL * Q V N $ NE F K IC Y Q C* E* S S H CS R 15241 GTTTCTATTCTTAGTACTATGACAGGCCGAATGTTCCATCAAAAATGTTTGAAGAGTATA 15 300 VS IL ST MT GEM PH Q K CL K S I FL FL V L * Q A E C S I K N V * R V CF Y S * Y Y DR P N V PS KM F EE Y 15301 GCAGCTACCCGAGGTGTTCCTGTTGTTATAGGAACCACTAAATTTTATGGTGGTTGGGAC 15360 A ATE G VP V VI G T T K F Y G G W D Q L P E V FL L L * E P L N F MV V G T S S Y PR CS C C YEN H * IL W W L G 15361 GATATGTTACGTCATCTTATAAAGGATGTTGACAACCCTGTTCTTATGGGTTGGGATTAT 15420 D ML R H L I K DV D NP V L MG W D Y ICY VI L * R ML T T L FL WV Gil R Y VT S S Y KG C * Q PC S Y G L G L 15421 CCTAAATGTGATCGTGCTATGCCAAATATTTTGCGTATTGTTAGTAGTTTAGTTTTGGCC 15480 P K C D RAM P NIL RI VS S L V LA L N VI V L C Q IF C V L L V V * F W P S * M * SC YAK Y F A Y C * * F S F G 15481 CGCAAACATGAATTTTGTTGTTCACATGGTGATAGATTTTATCGCCTTGCGAATGAATGT 15540 R K HE FCC S HG DEFY ELAN E C A N MN F V V MMV I D F IA L EM NV P Q T * ILL FT W * * IL S PC E * M 15541 GCTCAAGTTTTGAGTGAAATAGTTATGTGTGGCGGTTGCTATTATGTTAAGCCTGGTGGT 15 600 A Q V L SE IV MC G G C Y Y V K PG G L K F * V K * L C V A V A I M L S L V V CS SF E * N S Y V WELL L C * A W W 15601 ACTAGCAGTGGTGATGCAACTACTGCTTTTGCTAATTCTGTTTTTAATATATGTCAGGCT 15 660 T S S G DAT TAP A N S VP N IC Q A LA V V M Q L L L L L I L FL IV V EL Y * Q W * C NY CF C * F C F * Y MS G FIG. T.
U.S. Patent Jan. 10, 2012 Sheet 31 of 119 US 8,092,994 B2 15661 GTTACTGCTAATGTTTGTTCTCTTATGGCCTGTAATGGCCATAAGATTGAAGATTTAAGT 15720 VT A NV CS L M A CN C H K I ED L S L L L M F V L LW P V MA I ELK I * V CV C * CL F S Y CL * W P * D * R F K 15721 ATACGCAATTTACAAAAACGCTTATACTCTAATGTTTATCGTACAGATTATGTTGATTAT 15780 I EN L Q K EL VS N V YET D Y V D Y Y Al Y K N A VT L M F I V Q IN L II YT Q FT NT L I L * CL S YE L C * L 15781 ACATTTGTTAATGAGTATTATGAATTTTTATGTAAGCATTTTAGTATGATGATTTTGAGT 15840 T F V NE Y YE FL C KM F S M N I L S ML L MS I NN F IV S IL V * * F * V Y IC * * V L * IF M * A F * Y DID FE 15841 GATGATGGTGTTGTCTGTTATAACTCTGATTATGCTAGTAAGGGTTATATAGCTAATATA 15 900 D D C V V C Y N SD Y ASK G Y I A N I MM V L S VI T L I ML V R V I * L I * * W CCL L * L * L C * * C LV S * Y 15 901 AGTGTTTTTCAACAAGTTTTGTACTATCAGAATAATGTCTTTATGTCTGAATCTAAATGT 15960 S V F Q Q V L Y YQN NV F MS ES K C V F F N K F C TIE I NS L CL N L N V K C F ST SF V L SE * CLY V * I * M 15 961 TGGGTTGAAAATGATATTACTAATGGTCCTCATGAATTTTGTTCCCAACATACTATGTTA 16020 W VEND IT MG PH SF CS Q MT ML G L K MILL MV L MM F VP N IL C * L G * K * Y Y * W S S * IL F PT Y Y V 16021 GTTAAGATAGATGGTGATTATGTTTATTTACCATATCCAGATCCTTCTAGAATTTTAGGA 16080 V K ID G D Y V IL P Y PD PS El L G L R * MV IN F I Y HI O IL L E F * E S * DEW * L CL F TI SR SF * N FR 16081 GCTGGTTGTTTTGTTGATGATTTATTGAAGACTGACAGTGTTCTTTTGATAGAGCGCTTT 16140 A CC F V D DL L K T D S V L LIE R F L V V L L MI Y * EL TV F F * * SAL SW L F C * * F I ED * Q C SF DEAL 16141 GTAAGTCTAGCTATAGATGCTTACCCTTTAGTACATCATGAAAATGAAGAATACCAAAAA 16200 VS LAID Al P L V MM EM SE Y Q K * V * L * ML T L * YIN KM K N TN K C K S S Y R CL P F ST S * K * RI P K
U.S. Patent Jan. 10, 2012 Sheet 32 of 119 US 8,092,994 B2 16201 GTCTTTCGTGTATATTTAGAATATATAAAAAAACTGTATAATGATCTTGGTACTCAGATC 16260 V FR V Y LEVI K K L Y ND L CT Q I SF V Y I * N I * K N CI N I L V L ES S L SC I FR I Y K K TV * * S W Y SD 16261 TTAGATAGTTATAGTGTTATTTTAAGTACTTGTGATGGTTTAAAGTTTACTGAAGAATCA 16320 L D SIC VI L ST CD G L K FT E ES * IV IV L F * V L V MV * S L L K N H L R * L * C Y F KY L * W F K V Y * RI 16321 TTTTACAAGAATATGTATTTAAAAAGTGCCGTGATCCAGAGTGTAGGTGCATGCGTTGTT 16380 F Y K N MY L KS A V M Q S V SAC V V FT RI C I* K V P* C R V * V H AL F IL Q E Y V F K K C RD A 2 CRC NRC 16381 TGTTCATCACAAACTTCTTTGCGTTGTGGCAGTTGTATACGTAAGCCTTTGTTATGTTGT 16440 C S S Q T S L R CS SC IRK P L L CC V H H K L L CV V A V V Y VS L CV V V L FIT N F F AL W C LV T * A F V M L 16441 AAATGTTGTTATGACCATGTTATCGCAACTAATCATAAATATGTTTTGAGTCTCTCACCT 165 00 K C C Y D H V M A T N H K Y V L S V S P N V V M TN LW Q LII N M F * VS H L * M L L * PC Y G N * S * IC FEC L T 16501 TACGTTTGTAATGCACCTAACTGTGATGTGAGTGATGTCACCAAATTATATTTGGGCGGT 16560 Y V C N A P N CDV SD VT K L Y L' G G T F V NH L TV M * VMS P NY I WA V L EL * CT * L * CE * C H Q II F SR 16561 ATGTCTTACTATTGTGAAAACCATAAACCCCATTATTCATTTAAGTTAGTTATGAATGGT 16620 MS Y Y C EN H K PH Y SF K L V MN G CL TI V K TIN P I I H L S * L * NV Y V L L L * K P * T P L F I * VS Y R W 16621 ATGCTCTTTGGTTTGTATAAACAATCTTCCACGGGTTCACCTTATATAGATGATTTTAAT 16680 MV F CLV K QS CT CS P Y ID D F N W S L V C INN LA R V H L I * NIL I Y CL W F V * TI LW CF T LYE * F 16681 AAGATAGCTAGTTGTAAATGGACAGAAGTTGATGATTATGTTCTGGCAAATGAGTGTATT 16740 K I A SC K MT E V D D Y V LANE CI R * L V V N G Q K L MI M F W Q MS V L * D S * L * M DES * * L CS G K * VI FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 33 of 119 US 8,092,994 B2 16741 GAACGTTTAAAGTTATTTGCTGCAGAAACTCAAAAGGCAACTGAAGAGGCTTTTAAACAA 16800 ER L K L F A AR T Q KATE E A F K Q N V * S Y L L Q K L K R Q L KR L L N K * T F K V ICC ENS KG N * R G F * T 16801 AGCTATGCTTCTGCTACCATTCAAGAGATTGTTAGTGATAGAGAAGTTATTTTGTGTTGG 16860 S VA S AT I Q El V SD RE VI L C W AM L L L P F KR L L VIE K L F C V G K L CF C Y H SR DC * * * ES Y F V L 16861 GAGACAGGTAAAGTTAAACCACCACTTAATAAAAATTATGTTTTCACAGGCTACCATTTT 15920 E T G K V K P P L N K NY VP T G Y H F R Q V K L N H H L I K I M PS Q A TI L G D R * S * T T T * * K L CF H EL pp 16921 ACTAGTACTGGTAAGACAGTTTTAGGTGAGTATGTTTTTGATAAAAGTGAATTAACTAAC 16980 T ST G K TV L GE Y VP D K SE L TN L V L V R Q F * V S M FL I K V N * L T Y * Y W * D SF R * V CF * * K * IN * 16981 GGTGTGTATTACCGCGCTACAACTACTTATAAACTTTCTATAGGTGATGTTTTTGTTTTA 17040 G V Y YEA T T TV K L S I G D V F V L V CIT AL Q L L I NFL * V NFL F R CV L PRY NY L * T F Y R * CF CF 17041 ACATCACATTCTGTACCTAGTTTAAGTGCACCTACACTTGTCCCACAAGAGAACTATGCT 17100 T S H S VA S L SAP T L V P Q EN VA H N I L * L V * V H L H L S H KR TM L NIT PC S * F K CT Y T C PT RE L C 17101 AGTATAAGATTTTCTAGTGTTTATAGTGTTCCATTGGTGTTTCAAAATAATGTTGCTAAT 17160 SIR F SS V IS VP L V F Q N N VAN V * D FL V F IV F NW CF KIN L L I * Y K IF * CL. * C S I G VS K * CC * 17161 TATCAGCACATTGGAATGAAACGTTATTGCACTGTTCAAGGTCCCCCTGGTACGGGAAAG 17220 Y Q H I G M KR Y CT V Q G PP G T G K I S T L E * N VIAL F K V P L V RE S L S ANN NET L L H C SR S P WV G K 17221 TCTCATCTTGCTATAGGTCTAGCTGTTTATTACTACACAGCACGTGTAGTTTATACTGCT 17280 S H LA I G LA V Y Y VT AR V V VT A L I L L * V * L PITT Q H V * FILL V SS C Y ES SC L L L H ST CS L IC FIG, 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 34 of 119 US 8,092,994 B2 1728 1 GCTAGTCATGCTGCTGTAGATGCATTGTGTGAAAAAGCTTATAAGTTTTTAAATATTAAC 17340 A S H A A V D AL CE KAY K FL N I N L V ML L * NH CV K K L I S F* IL T C* SC CC R C I V* K S L * V F K Y* 17341 GATTGTACACGTATTATTCCTGCTAAAGTTCGTGTAGATTGTTATGATAAGTTTAAAATT 17400 D CT RI I PA K V R V DC Y D K F K I IV H V L FL L K F V * IV MIS L K L R L Y T Y Y SC * S SC R L L * * V * N 17401 AATGATACCACTTGTAAGTATGTTTTTACCACAATAAATGCATTACCAGAGTTGGTTACA 17460 ND T T C KY V FT TIN AL P EL VT NIP L V S M FL P Q * M MY Q S ML Q ** Y H L* V C F Y H N K CIT R V G Y 17461 GATATTGTTGTTGTTGATGAAGTTAGTATGCTTACTAATTATGAATTGTCTGTTATAAAT 17520 DIV V V D E V S M L TN YE L S V IN ILL L L M K L V CL L I MN CL L * M R Y C C C * * S * Y A Y * L * IV C Y K 17521 GCTCGTATTAAAGCTAAACATTATGTATATATTGGAGATCCTGCTCAATTACCTGCACCA 17580 A RI K A K MY V VI G D FAQ L PAP L V L K L N IH VI L E I L L NY L H H CS Y * S * T L CIV MR SCSI T CT 17581 CGTGTGCTGTTGAGCAAGGGTTCTTTAGAACCTAGGCACTTCAATTCTATTACTAAAATA 17640 R V L L SKG S L SF R H F N SIT K I V C C * AR V L * N L G T SILL L K T CAVE Q G F FR T * AL Q F Y Y * N 17641 ATGTGTTGTTTAGGTCCTGATATCTTTTTGGGAAATTGTTATAGGTGTCCTAAAGAAATT 17700 MCL G PD IF L G N C Y R CF K E I C V V * V L I SF WE IV I G V L K K L N V L FR S * Y L F G K L L * VS * RN 17701 GTAGAAACTGTTTCAGCATTGGTTTATGATAATAAACTCAAGGCTAAAAATGATAATAGT 17760 VET VS A L V Y D N K L K A K ND N S * K L F Q H W F N il N SR L K M I I V C R N C F S I DL * * * T Q G * K * * 17761 TCATTATGTTTTAAAGTATATTTTAAGGGACAGACAACACATGAGAGTTCAAGTGCTGTA 17820 S L CF K V Y F KG Q T THE S S S A V H Y V L KY IL RD R Q H MR V Q V L F I M F * S I F * G T D NT * SF K C C Cl it14i)i!
U.S. Patent Jan. 10, 2012 Sheet 35 of 119 US 8,092,994 B2 17821 AATATTCAACAGATATATCTAATTAGTAAATTTTTAAAAGCTAATCCAGTTTGGAATAGT 17880 N I Q Q I Y L IS K FL K A N P V W N S IF NE VI * L V N F * X LI Q F D I V KY ST DI S N * * IF K S * EEL E 17 881 GCTGTTTTTATTAGTCCTTATAATAGTCAGAATTATGTTGCTAAGCGTGTTTTAGGTGTT 17940 A V F ISP Y NE O N Y VA ICR V L G V L FL L V L I I V RIM L L S V F * V F CC F Y * EL * * SE L CC * A CF R C 17941 CAAACACAAACTGTAGATTCTGCTCAAGGTTCGGAATATGATTATGTTATATATTCACAA 18000 Q T Q TV D S A Q GEE Y D Y VI Y S Q K H K L * ILL K V RN MI MLVI H K S N TN C R F C SR F DI * L CV IF T 18001 ACAGCAGAAACAGCCCATTCTGTTAATGTTAATCGATTTAATGTTGCCATAACTAGAGCC 18060 TA ETA HE V N V NE F N VA IT R A Q Q K Q PILL M LID L ML P * L E P N SE N S P F C * C * S I * CC H N * S 18061 AAGAAGGGCATTTTTTGTGTTATGAGTAATATGCAATTATTTGAATCTCTTAATTTTATT 18120 K KG I F C V MEN M Q L FEEL N F I RE A F F V L * VI C NY L N L L I L L 0 E G H FL C YE * Y A I I * IS * F Y 18121 ACTCTACCTTTAGATAAAATTCAAAATCAAACTTTACCTCGTTTGCATTGCACAACTAAT 18180 T L P L D K I Q N Q T L P EL H CT TN L Y L * IX F K I K L Y L V CIA Q L I VS T FE * NE KEN FT SF AL H N * 18181 CTTTTTAAAGATTGTAGTAAAAGTTGCTTAGGTTATCATCCAGCGCATGCCCCCTCATTT 18240 L F K DC S K SC L G Y HP A MAPS F FL K IV V K V A * VII OEM PP H F SF * EL * * K L L EL ES SAC P L I 18241 TTAGCADTTGATGATAATATAAGGTTAATGAAATTTGGCTGTAAATTTAATATTTGT 18300 LA V D D K Y K V N EN LA V N L NI C * Q L MINI ELM K I ML * I * IF V FEE * * * I * G * * K F DC K P KY L 18301 GAACCTGTTTTAACATATTCTCGTTTAATATCTCTTATGGGTTTTAAATTAGATTTGACT 18360 E P V L TV EEL IEL MG F K L DL T FL F * H IL V * Y L LW V L N * I * L * T CF NI F SF N IS Y G F * I R F D FIG. 2 CONT,
U.S. Patent Jan. 10, 2012 Sheet 36 of 119 US 8,092,994 B2 18361 CTTGATGGTTATTCTAAATTGTTTATTACTAAAGATGAAGCCATTAAACGTGTTAGAGGT 18420 L D G Y SK L FIT KG E Al KR V R G L MV IL N CL L L KM K P L N V L E V S * W L F * I V Y Y * R * S H * T C * R 1 842 1 TGCCTTGGTTTTGATGTTGAGGGCGCTCATGCTACTCGCGAAAACATTGGAACAAACTTT 18480 WV SF D V E GA HAT RE M I ST N F CLV L ML R AL ML LA K T L E Q T F L SW F * C * SR SC Y SR K MW N K L 18481 CCACTGCAAATAGGTTTTTCAACTGGTGTGGATTTTGTAGTTGAAGCTACTGGCTTATTT 18540 P L Q I G F ST G V D F V V EATS L F H C K * V F Q L V WI L * L K L LAY L STAN R F F NW CC F CS * S Y W L I 18541 GCTGAGAGAGATTGTTATACTTTTAAAAAAACTGTAGCTAAAGCTCCTCCTGGTGAAAAA 18600 A ERG CV T F K K TV A K A P PS E K L R El V ILL K K L * L K L L L V K N C * ER L L Y F * K N CS * S S S W * K 18601 TTTAAACATTTAATACCCCTTATGTCAAAAGGTCAAAAGTGGGATATTGTTAGAATTAGA 18660 F K H L I P L MS K G Q K W DIV RI R L NI * VP L C Q K V K S CI L L EL E I * T F NT P Y V KR S K VS Y C * N * 18661 ATTGTTCAAATGTTATCTGATTATCTTTTAGACCTTTCTGATAGTGTAGTATTTATTACT 18720 IV Q ML SD Y L L DL SD S V V FI T L F K C Y L II F * T F LIV * Y L L L N C S N VI * L S F R P F * * CS I VY 18721 TGGTCTGCCAGTTTTGAACTTACTTGTTTAAGGTATTTTGCTAAATTAGGCAGAGAGCTT 18780 W S A SF EL T CL KY F A K L GRE L G L P V L ML L V * S I LL N * ASS L L V C Q F * TV L F K V F C * I R Q R A 18781 AATTGTAATGTGTGTTCTAATCGTGCTACATGCTACAATTCTAGAACTGGTTATTATGGT 18840 NC N V C S N RAT C VMS R T CV Y G IV MC V LIV L H A TILE LVI MV * L * CV F * S C Y ML Q F * NW L LW 18841 TGTTGGCGCCATAGTTATACTTGTGATTATGTGTATAATCCACTTATTGTAGATATACAA 18 900 C W R H S VT COY V Y NP LIV DI Q VGA IV IL VI MC I I H L L * IV N L LAP * L Y L * L CV * STY C R VT FIG. 2 CONT,
U.S. Patent Jan. 10, 2012 Sheet 37 of 119 US 8,092,994 B2 18901 CAGTGGGGTTATACAGGTTCTTTAACTAGTAATCACGATATAATTTGTAATGTACATAAA 18960 Q W G VT G S L T S N MDI I C N V H K S G V IQ V L * L V IT I * F V MY I K TV G L Y R F F N * * SR Y N L * CT * 18961 GGTGCACATGTTGCGTCAGCTGATGCAATTATGACTCGTTGTTTAGCAATCTATGATTGT 19020 GA H V A SAD A I MT R CL A I Y DC V H M L R Q L M Q L * L V V * Q S M I V R CT CCV S * C NY D S L F S N L * L 19021 TTTTGTAAATCTGTTAATTGGAATTTAGAGTATCCAATAATTTCTAATGAGGTCAGTATA 19080 F C K S V NW N L E Y P I I S NE VS I F V N L L I G I * S I Q * FL MR S V FL * IC * L E FR V S N N F * * G Q Y 19081 AATACATCTTGTAGGTTATTGCAGCGTGTCATGCTTAAAGCTGCCATGCTATGTAATAGA 19140 NT SC EL L Q R V ML K A A ML C NR IN L V G Y CS V DCL K L PC Y VI D KY IL * VIA ACM A * SC HAM * * 19141 TACAACTTATGTTATGACATAGGCAATCCTAAAGGTTTAGCTTGTGTCAAAGATTATGAA 19200 Y N L CV DIG N P KG LA C V K DYE T TV V MT * A ILK V * L V SKIM N I Q L ML * FIR Q S * R F SL C Q EL * 19201 TTTAAATTTTATGATGCTTTTCCTGTAGCCAAGTCTGTTAAACAGTTATTTTATGTCTAT 19260 F K F Y D A F P VA KS V K Q L F Y V Y L N F MM L FL * PS L L N S Y F M S M I * IL * C F SC S Q V C * TV IL CL 19261 GATGTGCATAAAGATAATTTTAAAGATGGTTTATGTATGTTTTGGAATTGTAATGTTGAT 19320 DV H K D N F K D G L C M F W NC N V D MCI K I I L K MV Y V C F CIV ML I * C A R * F * R W F MY V L EL * C * 19321 AAATATCCATCTAATTCAATTGTTTGTAGATTTGACACTCGAGTGTTAAATAAATTAAAC 19380 KY PS N DIV C R F D T R V L N K L N NIH L I Q L F V DL T L E C * IN * T * ISI * F N CL * I * H S S V K * I K 19381 CTTCCTGGATGTAATGGTGGTAGTTTGTATGTTAATAAACATGCATTCCATACTAATCCT 19440 L PG C N G G S L Y V N K H A F MT NP FL DV MV V V CMLI N MMDI L I L P S W M* W W* F V C * * T CI P Y* S
U.S. Patent Jan. 10, 2012 Sheet 38 of 119 US 8,092,994 B2 1944 1 TTTACTAGAACTGTTTTTGAAAATCTTAAGCCTATGCCTTTTTTCTATTATTCAGATACG 19500 FT R TV FE N L K PM P F F Y Y SD T L L EL FL K IL S L CL F S I 1 0 1 R F Y * N CF * KS * A Y A F FL L FRY 1950 1 CCTTGTGTGTACGTAGATGGTTTAGAATCTAAACAAGTTGATTACGTTCCTTTAAGAAGC 19560 PC V Y V D C L ES K Q V D Y VP L RD L V CT * MV * N L N K LIT FL * E A AL CV RE W FRI * T S * L R SF K K 19561 GCCACTTGTATCACACGGTGTAATCTAGGTGGAGCTGTTTGTTCAAAGCATGCTGAAGAA 19620 AT CIT R C N L G GA V CS K H A SE P L V S H CV I * V ELF V Q S M L K N R ML TM TV * $ R MS CL F K AC * R 19621 TATTGTAACTACCTTGAGTCTTATAATATAGTTACTACAGCAGGCTTTACTTTTTGGGTT 19680 IC NY L ES TN I V T TA G FT F WV IV T T L S LII * L L Q Q ALL F G F IL * L P * V L * IS Y T SR L Y FL G 19681 TATAAGAATTTTGATTTTTATAATTTATGGAACACTTTTACTACGTTACAGAGTTTAGAA 19740 IX N F SF IN LW NT FT T L QS L E I R IL IF II I CT L L L RI R V * K L * E F * FL * F MEN FYI VT SF R 19741 AACGTAATATATAACTTGGTTAATGTTGGTCATTATGATGGACGTACAGGTGAATTACCT 19800 N VI Y N L V N V GM Y DC ETC EL P T * Y IT W L MLVI MMDV Q V NY L KR N I * L G * C VS L * W T Y R * IT 19801 TGTGCTATTATGAATGACAAACTTCTTGTTAAGATTAATAATGTAGATACTGTTATTTTT 19860 C Al MN D K V V V K INN V D TV IF V L L * MT K L L L R L I M * ILL FL L CITE * Q DCC * D * * C RI C IF 19861 AAAAATAATACATCATTTCCTACTAATATAGCTGTTGAATTCTTTACAAAACGTAGTATC 19920 K N NT SF PT N IA V ELF T K ES I K I I H H FL L I * L L N CL Q NV VS * K * It I S Y * VS C * IVY K T * Y 19921 CGGCACCACCCTGAACTTAAGATTCTTAGAAATTTGAACATTGATATTTGTTGGAAGCAT 19980 R H H PS L K IL R N L N ID IC W K H CT T L N L R FL E I * T L I F V Cs M P A PP * T * D S * K PS H * ILL E A
U.S. Patent Jan. 10, 2012 Sheet 39 of 119 US 8,092,994 B2 19981 GTCCTGTGGGATTATGTTAAAGATAGTTTGTTTTGTAGTTCCACTTATGGTGTTTGTAAA 20040 V LW D TV K D S IF C S STY G V C K SC GIN L K IV CF V V P L MV F V N C P V G IC * R * F V L * F HI W CL 20041 TACACAGATTTGAAGTTCATCGAAAATTTGAATATACTTTTTGATGGTCGTGACACTGGC 20100 VT DI K F I EN L NIL F D G RD T G T Q I * S S SKI * I Y FL MV VT LA I HR FE V HR K FE VT F * W $ * H W 20101 GCTTTAGAAGCTTTTAGAAAAGCAAGAAATGGTGTTTTTATTAGTACTGAAAAATTAAGT 20160 A LEAF R K A RN G V FISTS K IS L * K LIE K QEM V F IL V L K N * V R FR SF * K SK K W C FT * Y * K I K 20161 AGGTTATCAATGATTAAAGGTCCGCAACGAGCTGATTTAAATGGTGTGATTGTGGATAAA 20220 RI S M I K G P Q R A DING V IV D K G Y Q * L X V RN ELI * MV * LW I K * VI MD * R SAT S * F K W CD C G * 20221 GTTGGAGAACTCAAAGTTGAGTTTTGGTTCGCTATGAGAAAAGATGGTGACGATGTTATC 20280 V GEL K V E F W F A MR K D G D DVI LENS K L S F G S L * E K MV T ML S SW R T Q S * V L V RYE KR W * R CV 20281 TTCAGCCGAACAGACAGCCTATGCTCAAGCCATTACTGGAGCCCACAAGGTAATCTAGGT 20340 F SR T D SIC S SHY W S PQ G NI G S A E Q TA Y A Q A IT GA H K VI * V L Q PM R Q P M L K P L L E PT R * SR 20341 GGTAATTGCGCGGGTAATGTCATTGGTAATGATGCTCTAACACGTTTTACTATCTTTACT 20400 G N C AG N VI G ND ALT R F TI FT VIA R VMS L V M ML * H V LLSLL W * LEG * C NW * * CS NT F VT IT 20401 CAGAGTCGTGTATTGTCAAGTTTTGAACCTCGCTCAGATTTAGAACGGGATTTTATTGAT 20460 Q SR VI SS F E P R SD IS RD F ID R V VT C Q V IN L A Q I * N GILL I SEE CIV K F * T SIR FR T G FT * 20461 ATGGATGATAATCTGTTTATTGCTAAATATGGTTTAGAAGACTATGCATTTGATCATATA 20520 MD D N IF IA KY G LED TA F D H I W MI IC L L L N MV * K TM H L I I * Y G * * S VT C * IN F SRI CI * 5 Y
U.S. Patent Jan. 10, 2012 Sheet 40 of 119 US 8,092,994 B2 2052 1 GTTTATGGTAGTTTTAACCATAAAGTTATAGGAGGTTTGCATTTGCTTATAGGCTTATTT 20580 V PG S PR H K V IS G L H L L I G L F F MV V L TI K L * E V CI CL * AT F S LW * F * P * ST R R F A F A Y R L I 20581 CGTAGGAAAAAAAAATCTAATTTGTTAATTCAAGAGTTTTTACAGTATGATTCTAGTATT 20640 R R K K K S N L L I Q E FL Q ID S S I PG K K N L IC * F K SF Y S M I L V F S * E K K I * F V N SR V F TV * F * Y 20641 CATTCATATTTTATTACTGATCAGGAGTGTGGTAGTAGTAAGAGTGTTTGTACAGTTATT 20700 H S Y FIT D Q SC GS SK S V C TV I I HILL L I RSVP V V R V F V Q L L S F I P VT * 50 V W * * * SC L Y S Y 20701 GATTTATTATTAGATGATTTTGTTTCTATTGTTAAGTCATTAAATTTGAGTTGTGTTAGT 20760 D L L L D D F V S I V K S L N L S C V S I Y Y * MI L FL L L S H * I * VV L V * F II R * F C F Y C * V I K F EL C * 20761 AAAGTTGTTAATATTAATGTTGATTTTAAGGATTTTCAATTTATGTTGTGGTGTAATGAT 20820 K V V N I N PD F K D F Q F ML W C ND K L L I L ML IL R IF N L C CO V MI S C * Y * C * F * G F SIT V V V * * 20821 AATAAAATTATGACTTTTTATCCTAAAATGCAAGCCACTAATGATTGGAAACCTGGCTAT 20880 N KIM T F PP KM QA TN D W K PG Y I K L * L F ILK C K P L MI G N LA I * * NY D FL S * N A S H * * LET W L 2088 1 TCTATGCCTGTTTTGTATAAGTATTTGAATGTTCCATTAGAGAGAGTCTCTTTATGGAAT 20940 S M P V L Y K IL N VP L ER VS LW N L CL F C IS I * M F H * R ES L Y G I PTA CF V * V F SC S IRE S L F ME 2 094 1 TATGGTAAACCTATTAATTTGCCTACAGGCTGTATGATGAATGTTGCTAAGTACACTCAA 21000 Y G K PIN L PT G C M MN VA KIT Q WIN L L I CL Q A V * * ML L ST L N LW * TI * FAIR LIDS CC * VHS 21001 TTATGTCAGTATTTGAATACTACAACATTAGCTGTTCCTGTTAATATGCGTGTTTTACAT 21060 L C Q IL NT T T LA VP V N MR V LW Y VS I * IL Q H * L FL L I CV FYI IN S V F SPY HIS CS C * IA C FT r
U.S. Patent Jan. 10, 2012 Sheet 41 of 119 US 8,092,994 B2 21061 TTACGTGCAGGGTCTGATAAAGAAGTAGCTCCAGGTTCTGCTGTTTTAAGACAGTGGTTA 21120 L GAG SD K EVA PG S A V L R Q W L * V Q CL I K K * L Q V L L F * D S CV FR C R V * * R S S SR F C CF K TV V 21121 CCATCTGGTAGTATTCTTGTAGATAATGATTTAAACCCATTTGTTAGCGATAGTTTAGTT 21180 P S G S I L V D ND L N P F V SD S L V H L V V FL * I MI * T H L LA I V * L TI W * Y SC R * * F K P I C * R * F S 21181 ACTTATTTTGGAGATTGTATGACTTTACCATTTGATTGTCATTGGGATTTGATAATATCT 21240 T Y F G D C MT L PP DC H W DLII S L I L E Iv * L Y H LIV I CI * * Y L Y L F W S LY D F TI * L S L SF D N I 21241 GATATGTATGATCCTCTTACTAAAAATATTGGTGATTATAATGTGAGTAAGGATGGGTTT 21300 D NY D P L T K NI G D Y N V S K D G F IC MILL L K I LVII M * V R MG F * Y V * S S Y * KY W * L * CE * G WV 21301 TTTACTTACATTTGTCATTTAATTCGTGATAAATTATCTTTGGGTGGTAGTGTAGCTATA 21360 FT VICE L I RD K L S L G CS VA I L L T F VI * F V IN Y L W V V V * L F Y L EL SF N S * * II F G W * CS Y 21361 AAAATTACAGAGTTTTCTTGGAATGCTGATTTATATAAATTAATGAGTTGTTTTGCATTT 21420 KITE F SW N AD LYE L MS C F A F K L Q SF L G M L I VI N * * V V L H F K N Y R V FL E C * F I * IN ELF CI 21421 TGGACAGTTTTTTGTACTAATGTAAATGCTTCTTCTAGTGAAGGGTTTTTAATAGGTATA 21480 W T V F C TN V N ASS SEC F L I CI G Q F F V L M * ML L L V K CF * * V L DSP L Y * C K C F F * * R V F NE Y 21481 AATTACCTGGGTAAATCTTCTTTTGAAATAGATGGCAATGTTATGCATGCTAACTATTTG 21540 NY L G KS SF HI DC NV M H A N Y L IT WV N L L L K * MA ML CML TIC K L PG * I F F * NEW Q CV AC * L F 2 154 1 TTTTGGAGAAATAGTACAACATGGAATGGCGGTGCTTATAGTTTATTTGATATGACTAAA 21600 F W RN ST T W NC GAYS L F D MT K F G HI V Q H G MA V LIV Y L I * L N V L E K * Y N HE HR CL * F I * Y D * FIG. 2 C ONT.
U.S. Patent Jan. 10, 2012 Sheet 42 of 119 US 8,092,994 B2 21601 TTTTCTTTGAAATTGGCTGGCACTGCTGTTGTTAATTTAAGACCAGATCAATTAAATGAT 21660 F S L K L AGFA V V N L R PD Q L ND FL * NM LA L L L L I * D Q I N * MI I F F El G W H CCC * F K T ES I K 21661 TTAGTTTATTCTCTTATTGAAAGAGGTAAATTATTAGTTCGCGATACGCGTAAAGAGATT 21720 L V Y SLIER G K L L V RD FR K El * PILL L KR V NY * FAIR V KR F F S L F S Y * KR * I I S SR VA * RD 2 172 1 TTTGTTGGTGATAGTCTTGTAAATACTTGTTAGATCTCATTAAATCTAAACTATGTTAAT 21780 F V G D S L V NT C * I S L ML NY V N L LVI V L * IL FR S H * I * TM L I F C W * * S C KY L L DLI K S K L C * 2 178 1 TATTTTTTTATTTTTTTATTTCTGTTRTGGTTTTAATGAACCTCTTAATGTTGTGTCTCA 21840 Y F F IF L FL LW F * * T S * C C V S IF L F F Y F C Y G F NE P L NV VS H L F F Y F F ISV MV L MN L L ML CL 2 184 1 TTTAAACCATGACTGGTTTTTATTTGGTGATAGTCGTTCTGATTGTAACCATATTAATAA 21900 F K P * L V F I W * * SF * L * P Y * * L NED W FL F G D SR S DC N H INN I * T MT G F Y LVI V V LIV TI L I 21901 TTTAAAAATTAAAAATTTTGATTATTTGGATATTCACCCTAGTTTGTGCAACAATGGTAA 21960 F K N * K F * L F G Y S P * F V Q Q W L K I K N PD Y L DI H P S L C N N OK I * K L K IL I I W I F T L V C ATM V 21961 GATTTCATCTAGTGCCGGTGATTCTATTTTTAAGAGTTTTCATTTCACTCGATTTTATAA 22020 D F I * C R * F Y F * E F SF H S IL * IS S SAGS S I F K SF H F FR F Y N R F H L V P V IL FL R FIS L DPI 22021 TTACACTGGCGAAGGTGATCAAATTATTTTTTATGAGGGTGTTAATTTTAATCCTTATCA 22080 L H W RE * S NY FL * SC * F * S L S Y T GE G D Q II F YE GYM F N P Y H IT LA K VI K L PPM R FL IL IL I 22081 TAGATTTAAGTGTTTTCCTAATGGTAGTAATGATGTATGGCTTCTTAACAAGGTAAGATT 22140 * I * VP S * W * * * C MA S * Q SK I R F K C F P NC S N DV W L LW K V R F ID L S V FL MV V MM Y G FL FR * D (CIP4IIIAI
U.S. Patent Jan. 10, 2012 Sheet 43 of 119 US 8,092,994 B2 22141 TTATCGTGCCTTATATTCTAATATCCCCTTTTTTCGTTATCTTACTTTTGTTGATATTCC 22200 L Sc L IF * Y G L F S L S Y F C * Y S Y R AL Y S N MA F F R Y L T F V DIP F IV P Y IL I W P F F VI L L L L I F 22201 TTATAATGTTTCTCTTTCTAAGTTTAATTCTTGTAAAAGTGATATTTTATCACTTAACAA 22260 L * CF SF * V * FL * K * Y FIT * Q Y N VS L S K F N S C K SD IL S L N N L I M FL FL S L I L V K V I F Y H L T 22261 TCCTATTTTTATTAATTATTCTAAGGAAGTTTATTTTACTTTATTAGGTTGTTCTCTTTA 22320 S Y F Y * L F * G S L F Y FIR L F S L P I FIN Y S K E V Y FT L L G CS L Y IL FL LII L R K FILLY * V V L F 22321 TTTAGTACCGCTTTGCCTTTTTAAATCTAACTTTAGTCAGTACTATTATAACATAGATAC 22380 F ST AL P F * I * L * S V L L * Ii R Y L V PLC L F K S N F S Q Y Y Y N ID T I * Y R F A FL N L T L VS TI IT * I 22381 TGGCTCTGTTTATGGTTTTTCTAATGTTGTTTATCCTGATTTAGACTGTATTTATATTTC 22440 W L CL W F F * CCL S * FR L Y L Y F G S V Y G F S N V V Y PD L D CIV IS LA L F MV FL ML F IL I * TV Fl F 22441 TCTTAAACCAGGTTCTTATAAAGTTTCCACCACTGCACCTTTTTTATCCTTACCTACTAA 22500 S * T R FL * S F H H C T F F IL TV * L K PG S Y K V ST TAP FL S L PT K L L N Q V L I K F P P L ML F Y P Y L L 22501 AGCTCTCTGTTTTGATAAATCTAAACAATTTGTACCTGTACAGGTTGTTGATTCTAGATG 22560 $5 L F * * I * TIC T CT SC * F * M AL CF D K SK Q F VP V Q V V D SR W K L S V L I N L N N L IL Y R L L I L D 22561 GAACAACGAGCGTGCCTCAGATATTTCTTTATCTGTTGCATGTCAATTGCCATATTGTTA 22620 E Q R A CL R IF FICC MS IA ILL N N SR A SD IS L S V A C Q L P ye Y CT T S V P Q IF L Y L L H V NC H IV 22621 TTTTCGCAATTCTTCTGCTAATTATGTTGGCAAGTATGATATTAACCACGGTGATAGTGG 22680 F S Q F F C * L C WQV * Y * PR * * W F RN S S A N Y VS KY D I N H G D S G IF A ILL L I ML A S NIL T TV IV
U.S. Patent Jan. 10, 2012 Sheet 44 of 119 US 8,092,994 B2 22681 TTTTATTTCTATTTTATCTGGTCTTTTATATAATGTTTCTTGTATTTCATATTATGGTGT 22740 F Y F Y F I W S F I * CF L Y F I LW C F I S IL SQL L Y N V S C I S Y Y G V V L FL F Y L V FY I M F L V F H I MV 22 74 1 ATTTTTATATGATAATTTTACATCCATTTGGCCCTATTATTCTTTTGGTAGGTGTCCTAC 22800 IF I * * FYI H L ALL F F W * VS Y FLY D N FT S I W P Y Y S F G RCP T Y P Y M I I L HP F G P I I L L V CV L 22801 ATCTTCTATTATTAAACATCCAATTTGTGTTTATGATTTTTTGCCTATTATTTTACAAGG 22860 IF Y Y * T S N L CL * F F A Y Y FT R S S I I K H P I C V Y D FL, P I I L Q G H L L L L N I Q F V F M IF CL L F Y K 22861 TATTTTATTATGTTTAGCTTTACTTTTTGTTGTTTTTCTATTATTTTTGTTATATAACGA 22920 Y F I M F SF T F C CF S I IF VI * R ILL CL ALL F V V FL L FL L Y ND V F Y Y V * L Y F L L F F Y Y F CV IT 22921 TAAATCTCATTAAATCTAAACATGTTATTAATTATTTTTATTTTGCCTACAACATTAGCT 22 980 * IS L N L N ML LII F IL PT T LA K S H * I * T C Y * L FL F CL OH * L IN L IN S K H VINY F Y F A Y N IS 22981 GTTATAGGTGATTTTAATTGTACTAATTTTGCTATTAATGATTTAAACACCACAGTTCCT 23040 VI G D F NC TN F A I N DL NT TV P L * VI LIV L I L L L MI * T P Q FL CV R * F * L Y * F CV * * F K HESS 23041 CGCATAAGTGAGTATGTTGTGGATGTTTCTTATGGTTTGGGTACATATTATATACTTGAT 23100 R I S E Y V V DV S VS L STY Y IL D A * VS M LW M FL MV WV H II Y L I S H K * V CC G CF LW PS Y IL VT * 23101 CGTGTTTATTTAAATACTACTATATTATTTACTGGTTATTTCCCTAAATCTGGTGCCAAT 23 160 R V FL NT TI L FT SF F P K SC AN V F I * ILL Y Y L LVI S L N L VP I SC L F KY Y Y II Y W L F P * I W C Q 23161 TTTAGGGATCTATCTTTAAAAGGTACTACATATTTGAGTACTCTTTGGTATCAGAAACCC 23220 FR DL S L K ST TV L ST LW Y Q K P L CI V L * K V L HI * V L F G I RN P F * G S I F K R Y Y IF E Y S L V SET FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 45 of 119 US 8,092,994 B2 23221 TTTTTATCTGATTTTAATAATGGTATTTTTTCTAGAGTTAAGAATACTAAGTTGTATGTT 23280 FL SD F N N G IFS R V K NT K L Y V F Y L I L I MV F FL EL RI L S CML L F I * F * * WY F F * S * E Y * V V C 23281 AATAAAACTTTGTATAGTGAGTTTAGTACTATAGTTATAGGTAGTGTTTTTATTAACAAC 23340 N K T L Y SE F ST XVI G S V FINN I K L CIV S L V L * L * V V FL L T T * * N F V * * V * Y Y S Y R * CF Y * Q 23341 TCTTATACTATTGTTGTTCAACCTCATAATGGTGTTTTGGAGATTACAGCTTGTCAATAC 23400 S Y TI V V Q P H N G V L El TA C Q Y L I L L L F N L I MV F W R L Q L V NT L L Y Y CC ST S * W CF G D Y S L S I 23401 ACTATGTGTGAGTATCCTCATACTATTTGTAAATCTAAAGGTAGTTCTCGTAATGAATCT 23460 TM C E Y PH TICKS KG S SR N ES L CV S I L I L F V FL K V V L V MN L H Y V * V SS Y IL * I * R * F S * * I 23461 TGGCATTTTGATAAATCTGAACCTTTGTGTCTGTTCAAGAAAAATTTTACTTATAATGTT 23520 NH F D K SE P L CL FR K N FT Y N V GIL IN L N L CV CS R KILL I M F LA F * * I * T F VS V Q SF F Y L * C 23521 TCTACAGATTGGTTGTATTTTCATTTTTATCAAGAACGTGGCACTTTTTATGCTTATTAT 23580 ST D W L Y F H F Y Q ERG T F Y All L Q I G CI F IF I K N V AL F ML I M F Y R L V V F S FL SR T W NFL CL L 23581 GCTGATTCTGGCATGCCTACTACTTTTTTATTTAGTTTGTATCTTGGTACTCTTTTATCT 23640 AD SCM PT T FL F SLY L G T L L S LILAC L L L F Y L V CI L V L F IL C * F W H A Y Y F F I * F V SHY S F I 23641 CATTATTATGTTTTGCCTTTGACTTGTAATGCTATATCTTCTAATACTGATAATGAGACT 23700 H Y IV L P L T C F AX S S N PD NE T II M F CL * L V M L Y L L I L I MR L S L L CF A F DL * C Y IF * Y * * * D 23701 TTACAATATTGGGTCACACCTTTGTCTAAACGCCAATATCTTCTTAAATTTGACAACCGT 23760 L Q Y W V PP L SR R Q Y L L K F D FR Y F IGS H L CL NAN IF L FL PT V F TI L G MT F V * T P ISS * I * Q P
U.S. Patent Jan. 10, 2012 Sheet 46 of 119 US 8,092,994 B2 23761 GGTGTTATTACTAATGCTGTTGATTGTTCTAGTAGTTTCTTTAGCGAGATTCAATGTAAA 23820 G VI T N A V DC ES S F F S El Q C K V L L L ML LIV L V VS L AR F N V K W C Y Y * CC * L F * * FL * RD S M * 23821 ACTAAATCTTTATTACCTAATACTGGTGTTTATGACTTATCTGGTTTTACTGTTAAGCCT 23880 T K S L L P NT G V Y DL S G F TV K P L N L Y Y L I L V F MT Y L V L L L S L N * IF IT * Y W CL * L IV F Y C * A 23881 GTTGCAACTGTACATCGTCGTATTCCTGATTTACCTGATTGTGACATTGATAAATGGCTT 23940 V A TV HR RI PD L PD CDI D K W L L Q L Y IV V F L I V LIV T L I N CL C C N CT CS VS * FT * L * H * * MA 23941 AACAATTTTAATGTACCCTCACCTCTTAATTGGGAACGTAAAATTTTTTCTAATTGCAAC 24000 N N F N V PS P L N W ER K IFS NC N T IL MV P H L LION V K F F L I AT * Q F * CT L T S * L G T * N F F * L Q 24001 TTTAATTTGAGTACTTTGCTTCGTTTAGTTCATACTGATTCTTTTTCTTGTAATAATTTT 24060 F ML ST L L R L V MT D SFS C N N F L I * V L CF V * F IL IL F LVI IL L * FE Y F A SF SE Y * F F FL * * F 24061 GATGAATCTAAGATATATGGTAGTTGTTTTAAGAGTATTGTTTTAGATAAATTTGCCATA 24120 DESK I Y G SC F K S I V L D K F Al MN L R Y MV V V L R V L F * IN L P Y * * I * D IV * L F * E Y C FR * IC H 24121 CCCAACTCCAGACGATCTGATTTGCAGTTGGGCAGTTCTGGTTTTCTGCAATCTTCTAAT 24180 P N SR RED L Q L G ES G FL Q ES N PT PD D LI CS W A V L V F C N L L I TO L Q TI * F AVG Q F W FE A IF * 24 181 TATAAAATTGACACTACTTCTAGTTCTTGTCAATTGTATTATAGTTTGCCTGCAATTAAT 24240 Y KID PT S S SC Q L Y Y EL PA I N I K L T L L L V L V N C II V C L Q L M L * N * H IF * FL S I V L * F A C N * 24241 GTTACTATTAATAATTATAATCCTTCTTCTTGGAATAGAAGGTATGGTTTTAATAATTTT 24300 V TIN N Y N PS S W N R R Y G F N N F L L L I III L L L G IS CMV L I IL CV Y * * L * SF FL E * K V W F * * F
U.S. Patent Jan. 10, 2012 Sheet 47 of 119 US 8,092,994 B2 24301 AATTTGAGCTCTCATAGTGTTGTTTACTCACGTTATTGTTTTTCTGTTAATAATACTTTT 24360 N L S S H S V V Y SR Y C F S V N NT F I * A L IV L FT H VI V FL LII L F * F EL S * CCL L T L L F F C * * Y F 24361 TGTCCTTGTGCTAAACCTTCTTTTGCTTCAAGTTGCAAGAGTCATAAACCACCTTCTGCT 24420 C P CA K P S F AS SC K S H K PP S A V L V L N L L L L Q VAR VI N H L L L L S L C * T F F CF K L Q ES * T T F C 24421 TCCTGTCCTATTGGTACTAATTATCGTTCTTGTGAGAGTACTACTGTACTCGACCACACT 24480 SC FIST NY R SC E ST TV L D NT P V L L V LII V L V R V L L Y ST T L FL SIN Y * L SF L * KY Y CT R P H 24481 GACTGGTGTAGGTGTTCTTGTTTACCTGATCCTATAACTGCTTATGACCCTAGGTCTTGT 24540 GM C R C DCL PD PITA Y D PR SC T G V CV L V Y L I L * L L MT L CLV * L V * V FL FT * S Y N CL * P * V L 24541 TCTCAAAAAAAGTCTCTGGTTGGTGTTGGTGAACATTGTGCAGGGTTCGGTGTTGATGAA 24600 S Q K KS L V G V GE MC A G F G V GE L K K S LW L V L V N IV Q G S V L MN F S K K V S C M CM * T L C R V R C * * 24601 GAAAAGTGTGGTGTATTGGATGGATCATATAATGTTTCTTGTCTTTGTAGTACTGATGCC 24660 E K C G V L D G S Y N V SC L C ST GA KS V V Y MMD H IN F L V F V V L NP R K V MC I G W II * C FL S L * Y * C 24 661 TTTCTAGGTTGGTCTTATGACACTTGCGTCAGTAACAACCGTTGTAATATTTTTTCTAAT 24720 FL GM S Y D T C V S N NE C N IFS N F * V CL MT LAS VT TV V I F F L I L SR L V L * H L R Q * Q P L * Y F F * 24721 TTTATTTTAAATGGTATCAATAGTGGTACCACTTGTTCTAATGATTTATTGCAGCCTAAT 24780 F IL N GINS G T T C S N DL L Q P N L F * MV S I V V P L V L MI Y CS L I F Y F K WY 0 * WY ML F * * F IA A * 24781 ACTGAAGTTTTTACTGATGTTTGTGTTGATTACGACCTTTATGGTATTACAGGACAAGGT 24840 T E V FT DV C V D Y DLV G IT G Q G L K FL L M F V LIT T F MV L Q D K V Y * SF Y * CL C * L R P LW YYRT R
U.S. Patent Jan. 10, 2012 Sheet 48 of 119 US 8,092,994 B2 24841 ATTTTTAAAGAAGTTTCTGCTGTTTATTATAATAGTTGGCAAAATCTTTTGTATGATTCT 24 900 I F KS VS A V Y Y N SW Q N L L Y D S FL K K FL L F I I I V G K IF CM IL Y F * R S F CCL L * * LA K SF V * F 24 90 1 AATGGCAACATTATTGGTTTTAAAGATTTTGTTACTAATAAAACATATAATATTTTCCCT 24 960 N G N il CF K D F VT N K T Y N I F P MAT L L V L KILL L I K H II F S L * W Q WY W F * R F CI * * NI * Y F P 24961 TGTTATGCAGGAAGAGTTTCTGCTGCTTTTCATCAAAATGCTTCCTCTTTGGCTTTACTT 25020 C Y AG R VS A A FM Q N A S S LA L L V M Q EEF L L L F I K M L P LW L Y F L L C R K SF C C F SSK C FL F CF T 25021 TATCGTAATTTAAAATGTAGCTATGTTTTGAATAATATTTCTTTAACTACTCAGCCATAT 25080 Y R N L K CS Y V L N N IS L T T Q P Y IVI * N V AM F * II FL * L L S H I L S * F K M* L CF E* Y F F N IS Al 25081 TTTGATAGTTATCTTGGTTGCGTTTTTAATGCTGATAATTTAACTGATTATTCTGTTTCT 25 140 F DS Y L G C V F N AD N LTD Y S VS LIV IL VA FL MLII * LII L FL F * * L SW L SF * C * * F N * L F C F 25141 TCTTGTGCTCTTCGCATGGGTAGTGGTTTTTGTGTTGATTATAACTCACCTTCTTCTTCC 25200 SC AL R M G SC F CV D Y N S P S S S L V L F A WV V V F V LII TM L L L P FL C SS MC * NFL C * L * L T F F F 25201 TCTTCGCGTCGTAAACGTAGAAGTATTTCTGCTTCTTATCGTTTTGTTACTTTTGAACCC 25260 SSRRK R R SI S A S Y R F VT FE P L R V V N V E V FL L LIV L L L L NP L F AS * T * KY F C FL SF C IF * T 25261 TTTAATGTCAGTTTTGTTAATGACAGTATTGAGTCTGTGGGTGGTCTTTATGAGATCAAA 25320 F N V SF V ND SIRS V CCL YE I K L MS V L L MT V L S LW V V F MRS K L * C Q F C * * Q Y * V C G W S L * D Q 25321 ATTCCCACTAACTTTACTATACTTGCTCAAGAGCAATTTATTCAAACTAATTCTCCTAAA 25380 I PT N F TI V G Q REF I Q TN S P K F P L T L L * L V K RN L F K L I L L K N S H * L Y Y SW SR CI Y S N * F S * FIG. 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 49 of 119 US 8,092,994 B2 25381 GTTACTATTGATTGTTCTTTATTTGTCTGTTCTAATTATGCAGCTTGCCATGACTTATTG 25440 VT ID Cs L F V CS NY A A C H D L L L L LIV L Y L ST L I M Q LA MT IC S Y Y * L F F IC L F * L C S L P * L I 25441 TCAGAGTATGGCACTTTTTGTGATAATATTAATAGTATTTTAGATGAAGTTAATGGTTTA 25500 SEY G T F CD N INS IL DR V N G L Q S M AL F V II L I V F * M K L MV Y V R V W H FL * * Y * * Y FR * S * W F 25501 CTTGATACTACTCAATTGCATGTAGCTGATACTCTTATGCAAGGTGTCACACTTAGCTCC 25560 L D T T Q L H VA D T L M Q G VT L S S L I L L N CM * L I L L C K VS H LAP T * Y Y S I A C S * IS VAR C H T * L 255 1 AATCTTAATACTAATTTGCATTTTGATGTTGATAATATTAATTTTAAATCCCTAGTTGGA 25620 N L N TN L H F DV D N IN F KS L V G IL IL IC IL ML II L I L N P * L D Q S * Y * F AS * C * * Y * F * I P S W 25621 TGTTTAGGTCCACACTGCGGTTCTTCTTCTCGTTCTTTTTTTGAAGATTTATTGTTTGAC 25680 CL G P H C G SSS R S F F ED L L SD V* VII TA V L L L V L FL K II CL T MS R ST L R F F F S F F F * R F I V * 25 681 AAAGTTAAACTTTCAGATGTTGGTTTTGTTGAAGCTTATAACAATTGTACTGGTGGTAGT 25740 K V K L SD V CF V E A Y N N CT G CS K L N F Q ML V L L K LIT IV L V VT Q S * T FR C W PC * S L * Q L Y W W * 25741 GAAATTAGAGATCTTCTTTGTGTACAATCCTTTAATGGTATTAAAGTTTTGCCTCCTATT 25800 El ROLL C V Q SF N G I K V L PP I K L El SF V Y NP L MV L K SC L L F * N * R S S L C TI L * WY * SF AS Y 25801 TTGTCTGAATCTCAAATTTCTGGTTACACCACAGCCGCTACTGTTGCTGCTATGTTTCCA 25860 L SE S Q IS G IT TA A TV A A MS P CL N L K FL VT P Q P L L L L L C SIC F V * IS N F W L H H SRI CCCV V S 25861 CCATGGTCAGCAGCAGCTGGCATACCATTTTCTCTTAATGTACAATATAGAATTAATGGT 25 920 P VS A A AG I PFS L N V Q Y R INC HG Q Q Q LAY H FL L MV NI ELM V TM V SS SW H TI FS * C TI * N * W
U.S. Patent Jan. 10, 2012 Sheet 50 of 119 US 8,092,994 B2 25921 TTGGGTGTTACTATGGATGTTCTTAATAAAAATCAAAAGTTGATAGCTACTGCTTTTAAT 25 980 L G VT MDV L N K N 0 K L IA TA F N WV L LW NP L I RI K S * * L L L L I F G C Y Y G C S * * K S K V D S Y C F * 25981 AATGCTCTTCTTTCTATTCAGAATGGI'TTTAGTGCTACCAACTCTGCACTTGCTAAAATA 26040 NA L L S I Q NC F S A TN S ALA K I ML F FL F RN V L V L PT L H L L KY C S S F VS S W F * CV Q L CT C * N 26041 CAAAGTGTTGTTAATTCTAATGCTCAAGCACTTAATAGTTTGTTACAGCAATTATTTAAT 26100 Q S V V N S N A QA L N S L L Q Q L F N K V L L I L ML K H LIV C Y S N Y L I T K CC * F * CS ST * * F VT A ll * 26101 AAATTPGGTGCAATTAGTTCTTCTTTACAACAAATTTTATCTCGTCTCCATGCTTTACAG 26160 K F GA IS S S L Q E IL SR L D AL E N L V Q L V L L Y K K F Y L V S M L * R * I W C N * F F FT RN F IS SR CF R 26161 GCTCAGGTTCAGATTGATAGGCTTATTAATGGTCGTTTAACTGCTTTAAATGCTTATGTC 26220 A Q V Q ID ELI N GEL TA L NAY V L R F R L I G L L MV V * L L * ML MS CS G SD * * A Y * W S F N C F K C L C 26221 TCTCAACAGCTTAGTGATATTTCTCTTGTAAAATTTGGTGCTGCTTTAGCTATGGAGAAG 26280 S Q Q L SD IS L V K PG A A LAMER L N S LVI FL L * N L V L L * L W R R L ST A * * Y PS C K I W C CF S Y CE 26281 GTTAATGAGTGTGTTAAAAGTCAATCTCCTCGTATTAATTTTTGTGGTAATGGTAATCAT 26340 V NE CV K S Q S PR IN PC G N G N H L MS V L K V N L L V L I F V V MVII G * * V C * KS IS S Y * F LW * W * S 26341 ATTTTGTCATTAGTTCAAAATGCTCCTTATGGTTTGTTGTTTATGCATTTTAGTTATAAA 26400 IL S L V Q NAP Y G L L F M H F S Y K F C H * F K ML L MV CCL CI LVI N VP VI S S K CS LW F V V Y A F * L 26401 CCTATTTCTTTTAAAACTGTTTTAGTAAGTCCTGGTTTGTGTATATCAGGTGATGTAGGT 26460 515 F K TV L V S PG L C IS G D V G L FL L K L F * * V L V C V Y Q V M * V TV PP * N C P S K SW F V Y I R * C R
U.S. Patent Jan. 10, 2012 Sheet 51 of 119 US 8,092,994 B2 26461 ATTGCACCTAAACAAGGGTATTTTATTAAACATAA.TGATCATTGGATGTTCACTGGTAGT 26520 I A P K Q G Y F I K H ND H MM FT G S L H L N KG ILL N I M I I G C S L V V Y C T * T R V F Y * T * * S L D V H W * 26521 TCTTACTATTATCCTGAACCAATTTCAGATAAAAATGTTGTTTTTATGAATACTTGTTCT 26580 STY VP E P Is D K N V V F MN T Cs L TI IL N Q F Q I K M L FL * I L V L FL L L S * TN FR * K CC FT E IL F 26581 GTTAATTTTACTAAAGCGCCTCTTGTTTATTTGAATCATTCTGTACCAAAATTGTCTGAT 26640 V N FT K A P L V Y L N H S VP K L SD L I L L KR L L F I * II L Y Q N CLI C * F Y * S A SC L FE SF CT K IV * 26641 TTTGAATCTGAGTTATCTCATTGGTTTAAAAATCAAACATCCATTGCGCCTAATTTGACT 26700 FE SE L S NW F K N Q T S I A P N L T L N L ST L I G L K IN HP L R L I * L F * I * VIE L V * K S N I H C A * F D 26701 TTAAATCTTCATACTATTAATGCTACTTTTTTAGATTTGTATTATGAGATGAATCTTATT 26760 L N L H TIN AT FL DLV VS MN L I * IFI L L ML L F * IC I MR * IL F F K S STY * CI PS R F V L * DES Y 26761 CAAGAGTCTATTAAGTCTTTGAATAATAGTTAT`ATCAATCTTAAAGATATAGGTACATAT 26820 Q K S I K S L N N S Y IN L K DIG TV KS L L S L * II VI SILK I * V MM S 9 VT * VS E * * L Y Q S * R Y R VI 26821 GAAATGTATGTAAAATGGCCTTGGTATGTTTGGCTACTAATTTCTTTTTCATTTATAATA 26880 E MY V K MT W TV W L L ISFS FII K CM * N G L G M F CV * FL F H L * Y * N V C K MA L V CL A T N F F F I Y N 26881 TTCCTTGTATTGCTCTTTTTTATATGTTGTTGTACTGGTTGTGGTTCTGCATGTTTTAGT 26940 FL V L L F F I C C CT CCC SAC F S SLY C SF L Y V V V L V V V L H V L V I PCI AL F TM L L Y W L W SC M F * 26941 AAATGTCATAATTGTTGTGATGAGTATGGTGGTCATCATCATTTTGTTATCAAAACATCT 27000 K C H N C CD KY G G H MD F V I K T S N VII V V M S M V VIM ILLS K H L * MS * L L * * V MW 55 * SC Y Q NI
U.S. Patent Jan. 10, 2012 Sheet 52 of 119 US 8,092,994 B2 27001 CATGATGATTAGAATCTCTTGTCAGATCTCATTAAATCTAAACTTTATTTATGGACGTTT 27060 MD D * N L L S D L I K S K L Y LW T F M MI RISC Q I S L N L N F I Y G R L S * * L ES L V ES H * I * T L F M DV 27061 GGAGACCTAGCTACACACATTCTCTTGTTATTAGAGAATTTCGTCTTACAAACCTTGAAC 27120 G D L AT HILL L LEN L V L Q T L K ST * L H T F S C Y * RI NC Y K P * R W R P S VT VS LVI REF G V TN L E 27121 ATTTGTGTCTAAAGTATAATTACTGTCAACCTATTGTTGGTTACTGTATTGTACCTTTAA 27 180 IC V * 511 TV N L L L V TV L Y L F VS K V * L L STY C W L L Y CT F K DL CL KY NY C Q P I V G Y CIV FL 27181 ATGTTTGGTGTCGCAAGTTTGGCAAATTTGCTTCTCACTTTACATTACGTAGTCACGATA 27240 M F G VA S LAN L L L T L H Y V VT I CLV S Q VW Q IC P SLY IT * $ KY N V NCR K F G K F AS H FT L R S H D 27241 TTTCCCATAGTAATAATTTTGGTGTTGTAACTAGTTTTACTACTTATGGTAATACTGTTT 27300 F P I V I IL V L * L V L L L M VI L F F P * * * F W C C N * F Y Y LW * Y C F I S H S N N F CV VT SF VT Y G N TV 27301 CTGAGGCTGTGTCTAGATTAGTTGAATCAGCTTCTGAATTTATTGTTTGGCGTGCAGAGG 27360 L EL CL D * L NQ L L N L L F CV Q R * CCV * IS * I S F * IV CLACK G SEA VS ELVES A S E F IV W R A E 27361 CACTTAATAAGTATGGTTGATTTATTTTTCAATGATACTGCTTGGTACATAGGACAGATT 27420 H L I S M V DL F F ND TRW VI G Q I T * * V W LIV F S M ILL CT * DR F RI. N K Y G * F IF Q * Y C L V HR T I7 27421 TTAGTTTTAGTTTTATTTTGTCTTATTTCTTTAATCTTTGTTGTTGCTTTTTTAGCAACT 27480 L V L V L F CLI S L I F V V A FLAT * F * F Y F V L F L * S L L L L F * Q L F SF S F I L S Y F F N L C C CF F S N 27481 ATTAAGCTTTGTATGCAACTTTGTGGTTTTTGTAATTTCTTTATTATTTCACCTTCGGCT 27540 IV L CM Q L C CF C N FFIIS PS A L SF V C N F V V F V IS L L F ML EL Y * A L Y AT L NFL * FLY Y FT F G FIGS 2 CONT.
U.S. Patent Jan. 10, 2012 Sheet 53 of 119 US 8,092,994 B2 27541 TACGTTTATAAAAGAGGTATGCAGTTGTATAAGTCTTATAGTGAACAAGTTATACCACCC 27600 Y V Y KR GM Q L Y K S Y SEQ V I PP T F IKE V Cs C I S LIV N K L YE P L EL * KEY A V V * V L * * T S Y VT 27601 ACTTCAGATTATTTAATCTAAATCTAAACATTATGAATAAATCTTTTCTTCCTCAATTTA 27660 T SD Y L I * I * T L * I N L SF L FL LQ II * S K S K H YE * IFS S S I Y H FR L F FL N L N I NN K S FL P O F 27661 CTTCTGATCAAGCTGTTACATTCTTAAAAGAATGGAATTTCTCTTTGGGGGTGTAATACTAC 27720 L L I K L L H S * K MG IS L WV * Y Y F * S SC Y IL HEM ES L F G C NT T T SD Q A VT FL KS W N FS L G VI L 27721 TTTTTATTACTATCATATTGCAGTTCGGTTATACGAGCCGTAGTATGTTTGTTTATCTTA 27780 FL L L S Y CS S VI R A V V CL F I L F Y Y Y H IA V R LYE P * Y V CL S Y L FIT I I L Q F G Y TSR S M F V IL 27 781 TCAAGATGATTATTCTTTGGCTTATGTGGCCATTGACTATCACCTTGACTATATTTAATT 27840 SR * L F F G L C G H * L S P * L Y L I Q D D Y SLAY VA ID Y ML D Y I * L I K MI IL FL MW P L TI T L T I F N 27841 GTTTTTATGCTTTGAATAATGCTTTTCTTGCATTTTCTATAGTGTTTACTATTATTTCTA 27900 V FM L * I ML FL H FL * CL L L FL FL CF E * CF SC IF Y S V Y Y IF Y CF Y AL N N A FL AS S I VP T 115 27901 TTGTTATATGGATTCTTTATTTTGTTAATAGTATTCGGCTTTTTATTAGAACTGGCAGTT 27960 L L Y 5551 L L I VS G FL L EL A V CI MD S L SC * * VS AS Y * NW Q L IVI WILY F V N S I ELS I E T G S 27961 GGTGGAGTTTTAATCCAGAGACCAATAATCTTATGTGTATTGATATGAAAGGCAAGATGT 28020 G G V L I Q R P I I L CV L I * K AR C VS F * SR D Q * BY VI * YE R Q DV NW SF N PET N F L MCI D NH G K M 28021 TTGTTAGGCCAGTTATTGAGGACTATCACACATTAACTGCTACTGTTATTCGTGGTCATC 28080 L L G Q L L R F I T H * L L L L F V VI C * AS Y * G LB 9 I NC Y CV SW S S F V HP VIED? H FL TAT VIES H
U.S. Patent Jan. 10, 2012 Sheet 54 of 119 US 8,092,994 B2 28081 TTTATATACAGGGTGTCAAACTTGGCACTGGTTATACTCTTTCAGATTTGCCCGTATATG 28140 Fly R VS N LA LVI L F Q IC P Y M L Y T Sc Q TN MW L Y S FR FAR IC L Y I Q G V Kb CT G Y T L SD L P V Y 28141 TTACTCTAGCTAACGTGCAAGTACTTTGTACCTATAAACGTGCCTTTI`TAGATAAGTTAG 28200 L L * L R C KY F V P IN V P F * IS * Y CS * CAST L Y L * T CL FR * V R VT V A K V Q V L C T Y KR A FL D Kb 28201 ATGTTAATAGTGGTTTTGCTGTTTTTGTTAAGTCTAAAGTTGGTAACTATCGTTTACCGT 28260 MLIV V L L FL L S L K L V TI V Y R C * * W F C C F C * V * S W * L SF TV DV N SC F A V F V K SKV G N Y R L P 28261 CTAGTAAACCTAGTGGTATGGATACTGCCTTCTTAAGAGCTTAAATCTAAACTATTAGGA 28320 L V N L V V NIL PC * ELKS K L L G * * T * WY G Y CLV KS L ML NY * D S S K PS GM D TALL R A * I * TI R 28321 TGTCTTATACTCCCGGTCATTATGCTGGAAGTAGAAGCTCCTCTGGAAATCGTTCAGGAA 28380 CLI L P VIM L EVE A FL E IV Q E V L Y SR S L C W K * Kb LW K SF RN MS Y T PG MY AG SR S S S G N R S G 28381 TCCTCAAGAAAACTTCTTGGGCTGACCAATCTGAGCGAAATTACCAAACCTTTAATAGAG 28440 S SR K L L CL TN L S SIT K Pb IS P Q EM FL G * P I * A Kb P N L * * R ILK K T S WAD Q SE R N Y Q T F N R 28441 GCAGAAAAACCCAACCTAAATTCACTGTGTCTACTCAACCACAAGGAAATACTATCCCAC 28500 ASK P N L MS L CL L N H K El L S H Q K N PT * I H C V Y S PT R KY Y PT SR K T Q P K F TV ST Q P Q GM TIP 28501 ATTATTCCTGGTTCTCCGGGATCACTCAATTTCAAAAAGGTAGAGACTTTAAATTTTCAG 28560 II PG S PG S L N F K K VET L N F Q L F L V L R D H S I S K R * R L * I F R MY SW F SGI T Q F Q K G R D F K F S 28561 ATGGTCAAGGAGTTCCCATTGCTTTCGGAGTACCCCCTTCTGAAGCAAAAGGATATTGGT 28620 MV K SF P Lb SE Y Pb L K Q K DIG W SR S S H CF R ST P F * SIC RI L V D G Q CV P I A F G VP PS E A K G Y W
U.S. Patent Jan. 10, 2012 Sheet 55 of 119 US 8,092,994 B2 28621 ATAGACACAGCCGGCGTTCTTTTAAAACAGCTGATGGTCAACAAAAGCAGTTGTTACCGA 28680 ID TAG V L L K Q L MV N K S SC Y R * TQP A F F * N S * W ST K A V VT E Y R H SR R SF K TAD GO Q K Q L L P 28681 GATGGTATTTCTACTATCTCGGTACCGGCCCATATGCCAATGCATCCTATGGTGAATCCC 2 874 0 D G I S TI S V P A H M PM H P MV N P M V FL L SR Y R plc Q CI LW * I P R WY F Y Y L G T G P Y A N A S Y GE S 28741 TCGAAGGGGTCTTCTGGGTTGCTAATCACCAAGCTGACACTTCTACTCCCTCCGATGTTT 28800 S K G SE G L LIT K L T L L L PPM F R R G L L G C * S PS * H F Y S L R CF LEG V F WV A N HQ AD T ST P S DV 28801 CGTCAAGGGATCCTACTACTCAAGAAGCTATCCCTACTAGGTTTCCGCCTGGTACGATTT 28860 A 9 01 L L L K K L S L L OF R L VA F V KG S Y Y SR S Y P Y * VS A WY SF SE R D P VT Q E Al PT R F P PG TI 28861 TGCCTCAAGGCTATTATGTTGAAGGCTCAGGAAGGTCTGCTTCTAATAGTCGACCAGGTT 28920 CL K Al ML K A Q HG L L LIV D Q V A SR L L C * AL AK V CF * * sr R F L PQ G Y Y V HG S G R S A S N SR PG 28921 CACGTTCTCAATCACGTGGACCCAATAATCGTTCATTAACTAGAAGTAATTCTAATTTTA 28980 H V L N NV D P I I V H * V EVIL IL T F SIT WV Q * S F I K * K * F * F SR S Q ERG P N N AS L SR ENS N F 28981 GACATTCAGATTCTATAGTAAAACCTGATATGGCTGATGAGATCGCTAATCTTGTTTTAG 29040 DI Q I L * * N L I W L MRS L I L F * T FR F Y SK T * VS * * DR * SC F S R H S D S I V K P DMA SE IAN L V L 29041 CCAAGCTTGGTAAAOATTCTAAACCTCAGCAACTCACTAAGCAAAATGCCAAGGAAATCA 29100 PS L V K IL N L SIC DL SKM PR K S Q AN * R F * T SASH * A K C Q G N Q A K L OK DEK P Q Q VT K Q N A K HI 29101 GGCATAAAATTTTAACAAAACCTCGCCAAAAGCGAACTCCTAATAAACATTGTAATGTTC 29160 G I K F * ON L A K S EL L I N IV M F A * N F N K TSP K A N S * * T L * CS R H KILT K P R Q K R VP N K NC N V 'a itwisrii
U.S. Patent Jan. 10, 2012 Sheet 56 of 119 US 8,092,994 B2 29161 AACAGTGTTTTGGTAAAAGAGGACCTTCTCAAAATTTTGGTAATGCTGAAATGTTAAAGC 29220 N S V L V K ED L L K I L V ML K C * S TV F W * K R T F S K F W * C * N V K A Q Q CF G K KG P S Q N F G N A EM L K 29221 TTGGTACTAATGATCCTCAGTTTCCTATTCTTGCAGAATTAGCTCCTACACCAGGTGCTT 29280 L V L M IL S FL FL Q N * L L H Q V L WY * * S S VS Y SC K I SS? T R CF L G T ND P Q F F I LA E LAP T P GA 29281 TTTTCTTTGGTTCTAAATTAGACTTGGTTAAAAGAGATTCCGAGGCTGACTCACCTGTTA 29340 F S L V L N * TV L K El PR L T H L L F LW F * I R L G * KR F PG * L T C F F F CSK L DLV K R D SHAD S P V 29341 AAGATGTTTTTGAACTTCATTATTCTGGTTCTATTAGGTTTGATAGTACTTTACCAGGCT 29400 KM FL N F II L V L L CL I V L Y Q A R CF * T S L F W F Y * V * * Y FT R L K DV F EL H Y S G SIR F D ST L PG 29401 TTGAGACAATTATGAAAGTTCTTGAAGAGAATTTAAATGCTTACGTTAATTCTAATCAGA 29460 L R Q L * K FL K RI * MI.. T L I L I R * D N YES S * REF K CL R * F * SE FE TIM K V L E E N L NAY V N S N Q 29461 ACACTGATTCTGATTCGTTGAGTTCTAAACCTCAGCGTAAAAGAGGTGTTAAACAATTAC 29520 T L I L I R * V L N L S V K E V L N NY H * F * F V E F * T S A * KR C * TI T NT D SD S L S S K P Q R KR CV K Q L 29521 CAGAACAGTTTGACTCTCTTAATTTAAGTGCTGGTACTCAGCACATTTCAAATGATTTTA 29580 Q N S L T L L I * V L V L ST F Q MI L R TV * L S * F K C W Y SAM F K * F Y P E Q F D S L N L SAC T Q HIS ND F 29581 CTCCTGAGGATCATAGTTTACTTGCTACTCTTGATGATCCTTATGTAGAAGACTCTGTTG 29640 L L RI IVY L L L L MI L M * K T L L S * G S * F T C Y S * * S L C R R L CC T P ED H S L LAP L SD P Y V ED S V 29641 CTTAATGAGAATGAATCCTAATTCGACACTAGGTGGTAACCCCTCGCTATTATTCGGAAT 29700 L N E N ES * F D T R W * P LA I I R N L MR MN P N ST L CC N PS L L F CI A * * E * IL I R H * V VT P R Y Y SE
U.S. Patent Jan. 10, 2012 Sheet 57 of 119 US 8,092,994 B2 29701 AGGACACTCTCTATCAGAATGAATTCTTGCTGTAATAACAGATAGAGTAGGTTGTTACAG 2 97 60 R T L SIR MN SCC N HR * SR L L Q G H S L SE * I LA V IT DR VS C Y R * D T L Y Q NE FL L * * Q I E * V VT 29761 ACTATATATTAATTAGTAGAAATTTTATATTTAGACATTTGATTGTTAGAGTAGTTATAA 29820 TI Y * L V RI L Y L DI * L L E * L * L YIN * * K FYI * T F DC * S S YE D Y IL I SR N F I FR H LIV R V V I 29821 GGTTTAGCTGTAGTATAAACGCCTCCGGGAAGAGCTATCAATTGTAGTGTTTAATATATA 29 S8 SLAV V * T P P SR Al N C S V * TI V * L * Y KR L REEL S I V V F NI Y R F S C S I HAS G KS TO L * C L I Y 29881 TATTAGTATATGATTGAAATTAATTATAGCCTTTTGGAGGAATTAC 29940 Y * Y MI El NY S L LEE L Q K K K K IS I * L K LII A F W R N Y K K K K K IL V Y D * N * L * P F G SIT K K K K 29941 AA 29942 FIG. 2 CO NT.
U.S. Patent Jan. 10, 2012 Sheet 58 of 119 US 8,092,994 B2 SEQ: 1 CTTATTCTCGCTTAACGCAGGCATGGCAGATAGTCGAATGCTAGAGAACAGTCTAGAGTA 60 Y SRI AD T GO IL K RD R T L D * IL A F Q T R VT * * S VI E Q * I E FL L S N R G Y R RD A * SR K D SR M 61 ATTTAGATTTGAAAAATTTGTTCTAAGGGACAATAGGTACGAACACTCACACCAAATTAG 120 * I * V K * V L N G TI WA Q S H P K I N FR F K K F L I G Q * G H K H TN N L L DL S K L C SE R N D M ST L T T * D 121 TATTAGAACATAAAATGAAAGGTGTGAAAAGTAGAGAGACGGTCACTGCACAACCAACAG 180 M I K Y K V K WV K * R E AL ST NT T * L R TN * KG C K ED R Q W HR T P Q Y D Q I K SE VS K ME KG TV H Q ND 181 CAGTCGCAGGGAGGGTATCCAGCGTTACTAATTTTGGTCGTTTATGCCAGAGCCGAAGTT 240 R L T SCM PR L S * F W CI RD KS G * R SEW LB C H N F G A F V TEA E E A DR G Y T A I I L V L L Y PR P K L 241 CACCCGCGGTCTTAAAGCAACCGACGAAGGCCTACGTCGCCTCCTCAACCGATCAGGATA 300 T PAL I E N A A E PH L PP VP * D L PR W F K T P Q K RI C R L L Q ST R HAG S N R Q S SG S A AS S N AL CI 301 CTTCAGTCTACTCCCACCCAATACGGGGAGATGACCAGTTCGCTACCTTTCACAACCTAA 360 ST L HP RI' 1 0 R * Q DL S P F H 01 H L * IL VP * A SR ST L R H FT N S F D SS P P N HG E VP * A IS L T P N 361 GCAAATACTATTAGTACACTTCTATCTAACAGCGACGTAAGAACCTGTTCTTACCGTACA 420 R K H Y D H S S L ND SC E Q V L I A H EN I I I M H L Y IT AA N KS L F PM T * S L * T F IS Q R QM R PC S H CT 421 CGTCAGTTTAGAATAGGCACTATAAAAACAAGTACTTCTAGATGTACAACATCTTCAAGA 480 AT L D * SHY K Q E H LB V HO L L E H L * I K DV I N K N MT I * MN Y F N CD F R I R SIKT * S SR C VT ST R 481 TTGATTTTGTCGGCATTTCAGGCCATGCCGTTAAAATTAATTTAGTGGAAACGTATCGAA 540 L * FL R L TRY P L K L * IV KAY S * S FCC Y L ST R C N * N F * R Q M A V L V A T FOP VA I K IL D SIC CL K
U.S. Patent Jan. 10, 2012 Sheet 59 of 119 US 8,092,994 B2 541 CCCACCAAAAGGATTTCCCATACAATACCCGAACAAGGCAAGTATGTTCTGATTTGCAAT 600 P H NE * L TN * P ST G N MC S * V N Q T T KR F PIN H A Q E T * V L SF T P P KG L P Y TIP K N R KY L V L R * 601 ACAACATGTAGTAGAAAGATACTGATGTAGATGATGATTAAAACCACTTCTAAAAAACCC 660 H Q V D OK * S * M * * * N Q H L N K P I N Y MM K R H SC R S S I K T FIX Q T T C * R E IV V DV V L K P S S K K P 661 AACCTAACATGGAAAACCAAAATACGGTAGAATACAAGTGTTTACCAAAGTTAAGACATC 720 N S Q V K Q N * A MX HE CIT N I H Y T P NY R E T K NW RI N V F P K L NT Q IT G K P K I G D * T * L H N * N Q L 721 CAACATATAACTTCTCTCACTAAATTATTAAAGTTTAAAATTTAAACTACTAATACTAAA 780 T T Y Q L S H N L L K IN * I Q H N H N P Q IN FL TI * Y N * I K F K 1111 NY ISS L S K I I E F K L N S S * S K 781 ATCACATCTTCTACGAATACGACTCCAAGTACGACTCGGATTTCCATTTATAAGTGTTTT 840 * H L L H K H Q P E H Q A * L YIN V F K T Y F I S I S L N MS L R FT F I * L L T S S A * A ST * A S CL PLY E C F 841 TCGAATACGAAATGAATCTGTTATAGCACCATAATTTGGGCATGAAAAACATCTGGTCAT 900 LX H K V * VI D NY * V R V K Q L CT F S IS * KS LIT TN F G Y K KY V L A * A KS L C Y R P I L G PSK T S H Y 901 ACCAACACTGATAAGACCATTTAATCGTCTAACAGAAGTTCGAATACCAGTAATAAGAAA 960 H N H S N Q Y ILL N DEL K ND N N K ITT VI R T F * CIT K L SIT N I H P Q S * E PIN A S Q K * A * P * * E K 961 CGTTCTATACTCTGTTTTCGTCAGACATACCGAACGGTTAACACTGAAACTATAACATCA 1020 A L Y S V F AT Q IA Q MN MS Q Y Q L Q L I H DILL R VP K CI T V K I N Y CS IL CF CDT H SAL Q SR S ITT 1021 CCGAACCGTACATCAAGCACTAAGTGCTAAACAATACGCGGACGTCTGATATCGATGATA 1080 P K A H IS H N VI Q * A GAS * L * * H S P M Y N TI * SR N H A Q IS Y SE A Q CT T R S E RN TI R R C VIA V
U.S. Patent Jan. 10, 2012 Sheet 60 of 119 US 8,092,994 B2 1081 AACACCATAATTTATACAACGTGTTGGATGTCTTCTACATCATCTACCTCTACATCAATA 1140 K H Y * I H Q V V * L L H Lb H L H L NT TN F INC L R CF I Y Y IS I Y N Q P I L Y T A C G V S ST TSP ST TI 1141 TGCACTTGGACATGTAAATAATAGACGACTACGTTATCAAAATTTCGAAGGATCAAACTA 1200 V H V Q V Nil Q Q H Lb K LA E * N S IT FRY M * * R SIC Y N * L KR T Q R SO T C K N D A S A ll K F S G L K I 1201 CTTTCAATACTGAGTATACCTACTAAAAAGATAATTTAGATATATATTACAACTAAACAC 1260 S L * SE Y PH N K * * I * I Y H Q NT H F N H S M H I I K RN FRY I IN I Q F TI V * is S K El L D I Y L T S K H 1261 ACTAACACCAAAACAATACGTCATACCAATACATCTAACAAAATTACTATTAACACTAAA 1320 H N H NQ * AT H N H L N N * H Y N H N TI T T K N H LIT I Y IT K ill TI S Q P K TIC Y P * T S Q K L S L Q S K 1321 AATACCAACCCAAAGTCCATTATACTACCTACCAAAAAGAACAGGTAACACAACATGTCA 1380 K H N P K L Y Y S PH N K K D MT NY L KIT P N * TI H HIT KR T W Q T PC P Q T E Pb I I S P K E Q ON HO VT 1381 AATACTGAGATCGCTTCAATTTCGGGTTAGTAGACCACAATAAGGACTTTTAGGACACAA 1440 K H S * R L * L G I M Q H * E Q F D Q T N IV R A F N F G L * R TN HR F I R H * S EL ST LAW D ID PT IC S F G TN 1441 TAAATGATTATCATGACTATGACAATTGGTACTAAGAAAATTAAACATACCAATAAGACA 1500 I * * Y Y Q Y Q * OH N K * NT H N NQ * K SITS IS N V MI R K I Q IT I R N V L L V S VT LW SE Kb KY P * E T 1501 GTGTGGTAAACCAAGAACATATATAACCAGCGGCGCAGGACCTAACACCTAAGGATATTA 1560 * V M Q N IC YIN TA A D Q ITS E * L DC W K T RI YIP KR PR S Q P N RI VON P E Q II Q D ORG P N HI G I I 1561 ATTTAGAAGTCAGTTCAGAATACTACTAAACCAAATAAGTCCACATCATCCAACATTTAG 1620 * I K L * T K H H NP K N L H Lb NY I H FR * DL RI I I Q NI * T Y Y TI F L D E T L D * S S K I * E PT T P Q LID FIG. 3 CONT'D
U.S. Patent Jan. 10, 2012 Sheet 61 of 119 US 8,092,994 B2 1621 ATAACAATTTCTTTGACGAGAATAATGAGTACGTGAAATGAATCTAATACAAGTTACATT 1680 * Q * L F Q E * * E H V K EL N HE I Y RN N F PS SF N S MC K V * II N L T IT L S VAR IV * A S * KS * T * H L 1681 CACACCATTAGAACTTGTTTTAGTATAAGAACCGCAATTATTAAGAACCACATCCGTTGA 1740 T H ID Q V PSI E Q R * INK T IA V L T TI K F LIME K A 1411 R PT P L NFL RE CF * I R PT L L E Q EL Cs 1741 CAACGAATTATCTCCACTAATATTATACGAAGATTTTTTATAACTGAACAAACAATTCGC 1800 T A * IL H N Y IA ELF Y Q ST Q * A Q Q K I S T I I I H K * F I N V Q K N L N S L L PS * L I SR F F I S K NT L R 1801 AGCACGACTAAAACGAACGTTCAAACGTCAAACACCTCTACCAAAACATGGAAAAAATGA 1860 OH Q N Q K CT Q L K H LEN Q V K K V T T S I K SALK C NT SIT KY R K * R A SF A Q L NAT Q PS P K T G K KS 1861 TCTACCAAATTAAGGGGCATCAATAATAGATTAAGTCTCACCATAAAAGAAATGTAGAAA 1920 L H N LEG INN DL ES H IN R * M K * IT * N G T T I I * N L T TN SF C R S P K I G R L * * RI * L P I K K V D K 1921 CTACAGAGTTAAAAGTGTTCTTCAAAGACTATACACAAATTTTTACACATAAAACAAATA 1980 STE I K V L L K Q IT EL FT Y FT Q HR L K * L F N RI H T * F HI N Q K ID * NE CS T E S I H K F I HIKE? 1981 CCTGTCTCAAAGTCAACGATGTAAAATATATCTCGTAATACAATTATCCAACCAATGAGT 2040 PC LX L Q * M K IL ANN * IT P * E NV S N * N SC K I Y L MINI? Q N S S L T ETA V N * IS C * T L L N TV * 2041 TAAATTCAATAACCCATGATGTGAACAATTATTTTACCAATTAACCAAATTATGGTACAA 2100 1 * TI P Y * V Q * Y F P * NT * Y W T L FL * Q T SC K N IF H NIP K I GM N L N N P V V ST L LIT L Q N L V M N 2101 TCTACGATCACGTGGACGATGTCCGACCGAAGAAATGGTTAATAACTTACCAGAAAAACA 2160 L H * H V Q * L GA E K G 115 H OK Q * I S T C R S CAP K K V L * Q IT K K SAL A GA V POSE * EN N PP R K T
U.S. Patent Jan. 10, 2012 Sheet 62 of 119 US 8,092,994 B2 2161 TCATAGAGTTCGGTTGAAATTAAAACAACGAAATTATGGACTAATACGATTTTAAAATCA 2220 LIE LW S * N Q Q K L V Q NH * F EL Y Y R L G V K I K N S * Y R I IS F N * T D * AL K L K TA K I G S * A L I K T 2221 ATTATTTAAAATGTGAAAAAAATTCAATAATAATCTCACACAATGTCAACTACAAAATTT 2280 * Y I K C K K * TI IL TN * L Q K Kb NI F K VS K K L * * * L TN C NI N * L L N * V K K L N N N S H TV T ST K F 2285 TCTATACGGACAAGAATTTTGATAATTACCAAATCAAACATAACATCCGTTATTCAAAAT 2340 L Y A Q E * F * * RN L KY Q L CIT K FIN RN K F S N IT * N TN Y A ILK S I CT R LVI L PET Q IT P L L N 2341 ATTGCAATCATGTCCCAATTAAGGACCAAAACAAAATGGTACATTACGTGTCCTTGTTGT 2400 Y R * Y L T L E Q N Q K V MY H VP V V IV NT C P * N R T K N * W TIC L FL L T L V P NI G P K T KG H LA C S C C 2401 TTAAATAAAAAAACTTCCGCAACGTCTTAGACAATATCATCTTCTACTACAATAACTCTT 2460 F K N K Q L R Q L I Q * Lb L H H * Q S L N I K K F A N C FR N Y Y F II N N L I * K K S PTA SD TI T S S ST I S F 2461 ACAGTTTAGAAGAAATAGTAGAATACTCATAACAGTTGGTGGATTTAGACATCTTTTTTA 2520 H * I K K I ME H TN DV V * I Q L F F ID FR R * * R IL IT LW R FRY F F T L D E K D D * 5y Q * CCL D T S F I 2521 AACATAATATCTATTATACATGTACCCATTCACACCACTATTTAAAAAGGGATAACAGTA 2580. KY * L Y IT C PIT H H Y I K G * Q * N TN Y I I H V NT L T TI F K ER ND Q II S L I IMP L H PS L N KG I TM 2581 CTTACTATTTTTATAAACAGAAAATCTAGTCCGAACCGCAAAAGGTACACGTCCATCTTT 2640 S H IF YE D K L D P KANE M H L IF HIT FIN T K * IL S PT K VT CT S F S L F I Q R K S * A Q R KG H A P L F 2641 TCAATTAAAATTGCTCTTTGGACAACAATACCTCTAAGGCAGAAACTACTGTCAATTCCA 2700 L * N * KS V Q Q * P S E T KS S L * P F N I K V L F RN N HL N KR Q H C N L T L Kb SF CT TI SIC D K IV T L T
U.S. Patent Jan. 10, 2012 Sheet 63 of 119 US 8,092,994 B2 2701 ATACAAACTAAATCTAAGATGAAAACTACTATAAAATCCATTTCAAACAAGTCTTAAACT 2760 * T Q N L N * K Q H Y K L IL K N L I Q N H K I * I R S K I I N * T F NT * F K INS K SE V K S S I K P L T Q ES N S 2761 TCATCTTTTCCCACAATGACATCTACTAAAACAACGACAACAAACACTACGATATCTCTT 2820 L L F PH * Q L H N Q Q Q Q K H H * L S F Y FL TN S Y I I K N S N NT I $ IL T SF P TV T S S K TAT T Q S A 1 S F 2821 ACGAAATTTGAGAACATTTCTCGTAGGTCACCAACCAATAGTTCAAGCACGTAAAAATTT 2880 H K L S K Y LAD L P Q N DL E H MN L IS * V R T FL M W H NT IL NT C K * A K F E Q L SC CT T P * * T RANK F 2881 ATTTGAATTACTCTTACAACAAATAAATAAACTACTCCGACCACTACTTCGTTACCGGAG 2940 Y V * H S H Q K NI Q H P Q H H L L PR IF K IL INN I * K IL ST IF C HG L S L SF T T * K N SS A PS S A IA E 2941 AGCATACATAACATGAAAACGATAACTCCTACAACTTCTGCAATAGTCATCACTTCGACA 3000 S Y TN Y K Q * Q P SQL R * * Y H L Q FT HITS K S N L I N F V ND VT F S RI Y Q V NA IS ST S S T I L L SAT 3001 GCTTCTATGATAACTACCACAGCAACTTCTGTGATAATTACTGCTACTTCTACAACAATG 3060 FLY * Q H HR Q L C * * H R H 1. H Q D F IS NIT la N F V S N IVI FINN S S VI S PT VS S VI L S S S S T TV 3061 ACCACTGTTACTGCTACTTCTACAACAATGACCACTGTTACTGCTACTTCTACAACAATG 3120 Q H C H R H L H Q * Q H CS R H L H Q ST V IVI FINNS TV IV IF INN PS L S S S ST TV PS L S S S ST TV 3121 ACCACTGTTACTGCTACTTCTACAACAATGACCACTGTTACTGCTACTTCTACAACAATG 3180 Q H C HR H L H Q * Q H CS R H L H Q ST VI VI FINNS TV I V I FINN P S L S SSS TV V P S L S S S S TV V 3181 ACCACTGTTACTGCTACTTCTACAACAATGACCACTGTTACTGCTACTTCTACAACAATG 3240 Q H C H R H L H Q * Q SC HR ML H Q * ST V IV IF INNS TV IVI FINN P S L S S S S VT VP S L S S S S TV V 13 i 1
U.S. Patent Jan. 10, 2012 Sheet 64 of 119 US 8,092,994 B2 3241 ACCACTGTTACTGCTACTTCTACAACAATGACCACTGTTACTGCTACTTCTACAACAATG 3300 OH C HR H L H Q * Q H C HR EL H Q * S TV I V I F I N N S TV I VI F I N N P S L S S S ST TV P S L S S S ST TV 3301 ACCACTGTTACTGCTACTTCTACAACAATGACCACTGTTACTGCTACTTCTACAACAATG 3360 Q H C H R H L H Q * Q H C HR EL H Q * ST VI VI F I N N ST VI V IF IN N P S L S S S ST TV PS L S S S ST TV 3361 ACCACTGTTACTGCTACTTCTACAACAATGACCACTGTTACTGCTACTTCTACAACAATG 3420 OH C H R H L H Q * QE C H R H L H Q S TV IVI F I N N S TV IVI F I N N PS L $6 S ST T VP S L S $ S ST TV 3421 ACCACTGTTACTGCTACTTCTACAACAATGACCACTGTTATTGCTACTTCTCTAACAATG 3480 Q H C H R H L H Q * Q H CT R H L S Q * ST V XVI F I N N S TV IV IF L N N PS L S S S ST TV PS L L S S S I TV 3481 ACCACTGTTACTACTGGTTTAACAACAATGACCACTACTACATCTACTATAACTTTCATA 3540 Q H C H H OF Q Q * Q H H H L H Y Q FT S TV II V L N N N ST I I Y II H FT P S L S S W ITT VP S ST S S I S L I 3541 AATACTGAAACTATGAATATTTCGAGAAAATCAAAAATTACTACAGATATTACTACGAAA. 3600 K H S Q Y KY L E FL K * H HR Y H H K N I V K ISI F SK * N K II D III S * SK S V * LARK T K L ST * L S A K 3601 CAA ACAATCAATACCAAGATCACAACTTTGTCTTTGTATAAAATTTCAATTACCAAATAC 3660 T Q * N H N * H Q FL F MN * L * H N I Q K N TI T R TN F C F CI K F HIT * NT L * PELT S VS V Y K L T L P K H 3661 CAGTGGATGATAATGTGTATGATTAACAACCAACGCAAGACACAATGAACATTACGTCTT 3720 TV * * * VT * N N TAN Q TV Q LAS P * R S N C M SIT P Q T R H * KY H L DO VI V CV L Q Q N RET N S TIC F 3721 TAATGGAAAATTCAAATTCCTAAATCGATAACTTTTATACACCAATAGAATATTCCACCC 3780 IV K * T * PH L * Q F Y T TI KY P P F * R K L K L I * S N F I HP * RI L H N OK L N L S K A I S F I H ND * L T P (tzi[ia1e
U.S. Patent Jan. 10, 2012 Sheet 65 of 119 US 8,092,994 B2 3781 AATATTAGTTTCAAAACAACTAATAAATGACTGGTGATAAGGATTTCGATAACAAAACGG 3840 NY D F N Q Q N N V SW * E * L * Q K A TI IL T K NI I * Q G S NE F S N N Q * L * L K T S * K S V V I G LA IT KG 3841 AGTTCCACCAAAACATCGACTAAAACGAATAACCAAAAATTTGGTCAAACTATAATTACG 3900 EL H N Q LO W Q K NT K L G T Q Y * H R L T T KY S I K SIP K * V L K IN I * P P K TA S K A * Q N K F W N S I LA 3901 CATACGATTAACCACAACAAATTTTACACCAAAAAGAAAACTAAATTTACCAAACCTACG 3960 T H * N TN N L I H N K K Q N L H N P H R IS I PT T * FT T KR K I * IT Q I Y AL Q H Q K F H P FE K S K F P K S A 3961 AAACAAAAAAATACCTCTATAACACAGAGTACAAACATTCACACCTGTATTATACTGAGA 4020 K T K K H L Y Q T E H K Y T H V Y Y SE S Q K K IS IN HR MN T L T S M I HE K N K * PS I T D * T Q L H P CLIV R 4021 TTATCGTCGCCTGAATGGAACATGTAATGTAAAAAGTAATAAACTACTGTTAAAAACACG 4080 L L L PS V KY MV N K M I Q H C N K H * Y C R V * ETC * M K * * KIT IF T IA A S K G Q V NC KEN N S S L K Q A 4081 AAAAACGTGGGGATTTTTTTAAAAATAACGACGTACACGACACCTACATTTGCAAACAGT 4140 K K C G * F F K * Q Q M H Q PH L R KS SF AG KR F N K N S CT S H I Y V NT K Q V G L F IF IA A HAT ST FT Q * 4141 AAGACATCGACAATATCCACTACTTGTTTATCTACCATTCAAACAATGATTTAAATCACC 4200 N Q L Q * L H H V FL H VT Q * * I * H MR Y S N Y TI FLY IT L K NSF K T E T AT I PS SC IS P L N TV L N L P 4201 ACTATTTAAACTAAAATATCATCCAATACCTTACAGTAAATCATACAGAAGAAAACTCAA 4260 H VI Q N * L L N H F TM * VT K K Q T TIP K I K Y Y TI S H * K T HR R K L S L N S KIT P * RID N LID E K S N 4261 TGGAGTTAACATACCAAACACATATTGTGGATTACATACAAAACAATTTCCACTATAATA 4320 V HIT H N T Y L V * H IN Q* L H Y * EL Q IT Q TV C R I Y T K N F TIN G * NY P K H IV G L T H K T L P 3 1 1 I IC 1
U.S. Patent Jan. 10, 2012 Sheet 66 of 119 US 8,092,994 B2 4321 TTTACAACGATCTGAACAATTTCGACTACAATAACAATTAGGACGATTACCCGTATACGA 4380 L H Q * V Q * L Q H * Q * D Q * HAY A Y I N S S K N F S I N N NIH S I P M H F TA L ST LAS TI T LG AL PCI S 4381 GGTACCACCACCTCAACGTTTTCGATATCGACATCGACGTCCATTTTTTAAAAGATTTCT 4440 G H H H L Q L L * L Q L Q L Y F I K * L E NT T S N C F ST S VS CT F F K R F W pp PTA F A IA TA A P L F N EL S 4441 TTGACGACGATACCAATTTAGATTTCCACAAACGGTTCATCCTCTAACAATACAAAGATG 4500 F Q Q * P * I * L H KG L L L N N H K F S S S H N F R F TN A L Y S I TIN R VA A IT L DL PT Q NT PS Q * T E V 4501 GCCACCATTTAATACATTTTGTTAAGAATTATAACATCCGGGACTACGATCTGTTCTACC 4560 R H Y I I Y FL E * Y Q L G Q H * V L H G T T F * T F C N KIN Y A HISS L I pp L N H LVI FLIT PG SAL C S P 4561 TTCTGTTAGAATACAAAACAATCGTGCACGAATATTCGTAGAATTATTAATACTAACAAC 4620 F V IN H K T L V H K TAD * Y N H N N S S L RI N Q * CT S I L M K 1111 T LCD * T K N AR A * L C R L L * S Q Q 4621 AAACAGATGAGAGTATAGCCGACCATATAAATCACAAGGACGACTACACAGTAATTGAAT 4680 NT * F * I P Q TI * H E Q Q H T M L K T Q R SETH STY K T NH S I H * * S K DV FM D API N L T GAS PD NV 4681 GGAAGATCCACAACAACTATTTGTTCAATAGGAACAATCATTATTATTTCTTCTAAAACT 4740 GEL H Q Q Y V L * G Q * IVY L L N Q V K * TN N IF L ND K N TI I F F I K R R PT PS L CT I AT L L L L S S K S 4741 ATAATAAGTTTTTACAGTTTAATGAAGTCAACAACCATGATTTCGTAACCGACAATCTAA 4800 Y * E F ID F * K L Q Q Y * L HP Q * I INN L FT L N S * N NT SF C Q S N S II * F H * IV E T T P V LANA T L N 4801 TTGACGATTACATCCGGCACAATAATTTAAACTCTGTCTACGTATGTTTGAAAAAAACTC 4860 L Q * H L G H * * I Q S L H MC V K KS * S SIT A TN N F K L CI CV F K K Q VA L T PR TI L N S V SAY L SK K L
U.S. Patent Jan. 10, 2012 Sheet 67 of 119 US 8,092,994 B2 4861 ACCACTACTAACAAAACAAAGTTTAAGAAGACAATATGTTCTTCAAAATAACGAAGCAGT 4920 H H H N N Q Kb N K Q * V Lb K I A ED T TI IT K N * I ER NY L F N * Q KT P S S Q NT E FEE TICS T NH SR * 4921 ACTATATGTTAACTTATTACTGCAAGCACTAATAAACAACAGATTCTACTGATCAGAAGG 4980 NY VI S Y H RE H N NT T * S S * D E MI Y L Q XIV N TI I Q Q R L H ST K S IC NFL ST ES * K ND LIV L R G 4981 ATTTCTAACCGCAGAATAGTTATTTAAACTACAATAATTGCCACAATTTTGACAATTCAT 5040 * L HAD * * Y I Q H * R H * F Q * T R P I PT K DI F K INN VT N F S N L L S Q ER ILL N ST IL PT L VT L Y 5041 AAAACTCACAGGATTAAGATAAATATATACATCAGTCCCATTTCTGAAACCAATACATAC 5100 N Q T D * N * KY I Y D P Y L S Q NH I I Kb TN INN I Y T T L T F V K TI Y K S H G L El * I EL * P L S K P * TN 5101 ACTACCAAGAAAAATATTTCGTTGACAATTAGTTCAAACACAAAATAATCGATTCTTCTA 5160 H H N K KY Lb Q * DL NH K I L* S S TI T R K IF C S N IL NT N** S L L S P E K * LA VT L * T Q T K HAL F I 5161 TCTACAAAACGAATGACATCTACCACAATTAAAATTTAGATAAAGAGAATGACATCCACT 5220 L H K A * Q LEE * N * I * K E * Q L H YIN Q KS Y IT NI K F RN R KS VT ST KS VT S PT L Kb DIE NV T PS 5221 TCA AAnCCATTTTATGAACCATTACAAAAGACACTACCGTAACTACAATGATTCAATTT 5280 L K Q Y F V Q YE K R H H C Q H * * T L F N K T F Y K TINE TI A NIH S L * T K P LISP L T K Q S PM ST V L N F 5281 CACATCACTAAAAATACGGCTATTTTAAAATATAGTCATACTTTTAAACAGAAATCGACT 5340 T Y MN K HEY F K I D TN PH T K L Q L T TI K I G IF N * I L I F I Q R * S H L S K * AS L I KY * Y S F K D K AS 5341 ATAAAGACGACATGTTTCAAGTAAACCCAAACTAGTCGTTGTTAACGAACGAATAATATT 5400 Y K Q Q V F NM Q T Q D A VIA Q KEY IN R S IL T * K P KILL L Q K S I I I E AT CL EN PH S * CC NSA * * L
U.S. Patent Jan. 10, 2012 Sheet 68 of 119 US 8,092,994 B2 5401 AAAAAATTGTCATACATTTACCAGACATCAACAATTGCCAGGTAAAAAAAGAAAACTTGT 5460 N Kb LIII T Q L Q * RD M K K K Q V I K * CIT F PRY N N VT W K KR K F K K VT H L RD T T T L PG N K E K Sc 5461 CAGAGTATTATTAACAATACACTTACATCGAACAGAATACAACGTCGTATAATTAGAATT 5520 T KY Y N N H S H L K D * T A Al * D L KM II TI HI Y ST K HQ L MN I K D * L L Q * T FT A Q RI N CCI L R L 5521 TAAATTATTTACCGTCACCGTCCTTCGTACCATACTTAAAGCACGACCGTCTGGTGTATC 5580 I Y I AT A Pb M T HI EM Q CV VI F K IF Pb Pt. F C P I F K T SASH M N L L H C H CS A H IS N R A P L CCL 5581 CAATCAACGAGAACAAAATCGATTTCCAGTAAAATTTAAACTACTTGGTAGTCTACGATG 5640 T L Q E Q Kb * L D N * I Q MV ML H * P * N S K N * SF TM K F K IF W * IS N TART K AL P * Kb N S S G D S A V 5641 ACTAAAATAAGCACAACAAAACTTTGTTCGACTAAATAGTCCACGTTAAACACTTAATCT 5700 Q N * E H Q K S V L Q N IL ELK NIL SIN N TN N Q FL S I * * T C NT F * SKIRT T K F C AS K D PA I Q S N S 5701 TGAATAAACACTAACACCATAATTTGTTCTTTCAGCACAACCACAACTACGACAATACGT S760 V * K H N MY * V L F D H Q H Q H Q * A F K N TI T T NFL FT TNT N IS N H S I Q S Q P I L CS L R T PT SAT IC 5761 AAAACCATGTAATCGTTTCTGACTAGAAAAATTACCAATATTCTAACCGACATTAACACG 5820 N Q Y M L L S Q D K * H NY S Q SIN H M K T C* CL S I K KIT IL N A TI T K P V N A F V SR Kb P * L I PQ L Q A 5821 TCCATCTTAACAGGTAACATGATTTAACTTACATGGPAAAAACTAAACAAGATTATGAGG 5880 L Y F Q G NY * IS H V M K SK N * Y E CT S ND M T SF Q I YE K Q NT R IS Pb IT W Q V L N FTC N K I Q EL V G 5881 AGACTCATTCCTAAATGGACTACTACAACAACGTCGATTGTACAAATACCCACATCCACA 5940 ES Y P NV Q H H Q Q L * CT * PH L H R Q T L I * RI I N N CS V S K H TIP R L L S KG 56 T TA AL MN I PT PT
U.S. Patent Jan. 10, 2012 Sheet 69 of 119 US 8,092,994 B2 5941 TCCGGTAATATGTGTAAACTTTACACCAAGTGGAATGGTTGTAATACTACGAACATCACA 6000 L G NY V N S I H N V KG V N H H KY H YAM I C M QFT T * R V L MI I S T T P W * V C X F H P E G * W C * S A Q L T 6001 ATTTTTTATATGTCCACAATCACCAACAAATTGACTGACGAACATAGAATTTTTAAATTG 6060 * F I Y L H * NM N L Q S ST D * F N L N P F IC TNT T T * S VA Q IX Fl * L F Y VP T L P Q K V S Q K Y R L F K V 6061 GGTCTGAAAATGTAGATACAACTGATTAATAAAAAACCTACTACAACTTTACCAACGAAT 6120 G SR * M * T S * N N K P H H Q F P Q K G L SR C R H Q S I I K Q I IN F H N S WV K V DIN V L * K KS ST SIT A * 6121 ATTGGGACTAGAAAGTGTTATAATAACACTATTACCATTCATAATATGTTTTGGATAATA 6180 Y G Q DR V INN H Y NY TN Y L V * * IV RI K * LII TI IT LII CF RN L G S RE C Y * Q S L PLY * V F G i l 6181 TTTCCGAGTCAAA.TTTGGTAAACGATTTCAACTGCCACAAATATGATTGAAATTCAATCA 6240 L PET * V M Q * L Q R H KY * S * T L Y L S L K F W K S F N V TN ISV K L * F A * ML G N ALT S PT * V L K L NT 6241 ACCTGTACTATAAACACGAGTTAACTTACTATTCAATCCAAAATTACATCTAAACGGCAA 6300 Q V H Y K HE IS NY T L N * ML NAT N S MINTS L Q I I L * T K ill Q R PC S I Q A * N F S L N P K L T S K G N 6301 ACAACTCATGTTTCATTGTCAGACCGGACATCGATGACCACTACAACAAAACCGTAGACT 6360 Q Q T CDL L R A Q L * Q NH Q K PM Q K N L V F Y CD PRY S ST INN Q C R T S Y L TV T Q G TA V PS PT HAD S 6361 ACTAAATATACACTTTGCAATAAAATTTCCTACACTTTGAAAACCATTCGGACAATAAAC 6420 H NI H S V N N * L I H F K Q IA Q * K II * IN F TI K F ST F SR T LEN N SKY T FR * K L PH S V K P L G TI Q 6421 CAAAACAGTACTACTTCGTAGTAACTTAAGAGAATGAATAAAATTATTTGGATCAAAATT 6480 T RD H H L MM S NE * K N * IV * N P K TM IF C * Q IRKS I K IF P. T K N Q * S S AD N F KR V * K L L G L K L 1
U.S. Patent Jan. 10, 2012 Sheet 70 of 119 US 8,092,994 B2 6481 TAGACTTTTATCTATATCACAAAACAGACAACTAAGACATAGACTCCTCAGTGTTCCATT 6540 I Q FYI Y H FT Q Q N Q I Q P TV L Y F R F I SIT N Q RN I R Y R L L * L T D S FLY L T K D T SET D S S DC P L 6541 ACACCAATGAAGACAATACCTTAGCGTCTAATCATGATTTCTCCAATTCAATTTCCCACA 6600 H P * K Q * P I AS * Y * L P * T L P H I H N S RN H FR L NT S F L N L * L T T TV E T IS DC I L V L S T L N F PT 6601 ATCTTTCTGACAATTTTATCTTCTACGATAATAACAATTACTACTTTTATCAAGATAATT 6660 * F S Q * FL L H * * Q * H H FIN * * N S L S N F Y F I S N NH I IF IT RN L F VT L I S S A ll T L S SF L E IL 6661 CCAACAATTTTCAAATAGAAATCAACTACAAACCCTATACATAAACTGTCCAACACTAAT 6720 P Q * F N I FL Q H K P Y TN DL N H N L N N FT * R * N IN P I H I Q CT TI T T L L K D FT ST Q S I Y K VP Q S 6721 ACAACAAACCCAACGATTACTTAACAGTGCGGATCAATTTAGTGGTTGTCAATCCCTTAT 6780 H Q K P Q * H IT V DL * IV L L * P I INN P N S IF Q * A * N F NC NP F T T Q TA L S N DR R T Lb G VT L $ Y 6781 ATATGCTATACCATAATTTGGATAATGATATGGATATCTAAACAATACAAATTCTCTACT 6840 Y V I H Y * V * * * V * L N TIN L L H II SIT N F RN SIR II Q * T * S I I R Y P I L G I V I DISK N H K L SD 6841 ATTAGTTTGAGAAAATCAAGGATTTTAAAAATTTCGTTCTCGATATCTTAAAATACCAAA 6900 ID F E K L E * F K * LLL * L I K H N I I L S K * N R F N K F C S S IF KIT L * V R K T DL I K L ALA IS N * P K 6901 AAACTTCACCAACAAATAAATACAAAAATCAAATAATGTAAAATGTTTACTATTTTGGTA 6960 KS T TI * K H K * N IV N * L NY F W K Q L P Q K N IN NT * * M K C I I F G K F H N WI * T K L K N C K V F S L V M 6961 AAAAATATGATGTCTTTATCGAAGATTCAAATGAAAATTAAACAAAACAAACCGAGAATT 7020 K Cy * L FL K * T * K * NT K N P E N RISC F Y SR L K SKI Q K T Q SIC K * V V S I A EL N V K L KM Q KARL
U.S. Patent Jan. 10, 2012 Sheet 71 of 119 US 8,092,994 B2 7021 TTTACGAAAAGTCTGTAAATCTACCTCATATAAATATTTTCCAAAAGAACAACATCGGTG 7080 F H K ES M * I S VI * L L N E Q Q L W F IS K L CREPT Y KY FT K K NY G F A K * V N L EL I N IF P KR T TA V 7081 ACACAAAAACAAAACCAAATTAAAAAACATATATTTACAATAAAAATCACTGAAAATAGA 7140 Q T K T K T * N K TV L H * K * MS K D S H K Q K P K IX Q I YIN N K TV K I TN K N Q N L K KY IF T IX L S K * R 7141 AGGATTATAATCACAAAAAGGATAAAAACACCCTTCTTAACAATACACCTATTTCCGATG 7200 E * Y * H K E * K Q P F F Q * TEL P K RI N TN K RN K H S S N N H P IL S CLI L TX CI K T P LIT I H IF A V 7201 AAAACCAAACCAATGTTAAACACTAAAAATAAGATTCAATCCACATCCAAAATGTTCAGT 7260 K Q N P * L K H N K N * T L H L N * L D SK T Q N C N TI XI R L * TV T K C T K P K TV I Q SR * EL N PT P K V L 7261 AAAAACATTACCATCAAAATATACACTTAACACAGTAAGACCAAAACTATACAACCTATG 7320 N RYE IN * I HIT D N Q N Q VT P Y MX TI T TRY T F Q TM R TX I H Q I K Q L P L KIM S N H * E P K S IN S V 7321 TATACGTCGATATCTAAAACAAGTCATACTTCATCTATCTGCACAAAATAAACTAATACA 7380 M H L * L N Q E T H L LIVE K I Q N H CI CS Y I K N L I F Y I S T N * K II Y A A I S K T * Y ST S LET K NE * T 7381 ATCAAATCAGTTTAATTAACAACTTGAGCAATAACCAATAAGTAATATGTGTCATACCAA 7440 * N L * IL Q Q VT * Q N N MI CLI T N T * D F * N N F E N N T I * * V C Y P L K T L N ITS ST I P * RN Y VT H N 7441 AATAGGTAATAAAACAGAATAACCAAATGTTAATAAATGATGTACCAACGGACTAAACAA 7500 RD MIX D * Q N VII * * MT A Q NT KIN * K TIC NT * L * K SC P Q RI Q * CNN Q RIP K C N NV V H N CS K N 7501 ATACAATCTTTGATACGTAACCAACTAATCTAAATAACATAAACATCGATTATACAATGG 7560 * T L F * A NT S * I * Q I Q L * Y TV K H * F S H M P Q N SK NY K VS I H * INS V IC Q NIL N IT N TA L I N G
U.S. Patent Jan. 10, 2012 Sheet 72 of 119 US 8,092,994 B2 7561 ACGAAAACAGAACAACGCCAAAATATATCAACAATGACGATACATATTTCATCAACCAAA 7620 Q K Q R TAT KY L Q * Q * T Y L L Q N R S K D Q Q P K I Y N N S S H I F Y NT A KT K N RN * ITT VA I Y L T T P K 7621 ATAATCCGTATAACAGATACCAACATTATTTCGACCAACAAATAAAACAATATTTGCTTT 7680 * * A Y Q R H NY Y L Q N N I K NY V F K N P MMD ITT IFS T T * K TI F S IL CIT * P Q L LAP Q K N Q * L R F 7681 AACATCACAAGCACAATTCACATCATGATAACAACCACCACATTAAGCAATAATACTATA 7740 NY H E H * T Y Y * Q Q H H LEN N H Y ITT N TN L T T S N NT T Y N T il l Q L T R T L H LVI T PP TI R * * S I 7741 ATGACGATTACCACCATCACCAAAAACACAATTTGTAGTTACCTTAACAAAATTAACGGT 7800 * Q * H MY Q N K H * V DISH N * NC N SSI T T ST K TN F ML PIT K IA VA L P P VP K Q T L C * H F Q K L Q W 7801 AAGAAAATTTGGTCCATTGTGAAAATATTGACATCTTCGACGATATCTTGAAAGATTTCT 7860 N K * V L Y C K * L Q L L Q * L V K * L M R K F W TV SKY S Y F S S Y F KR F E K L G P L V K IV T S A A IS SE L S 7861 CGAATTTCCTCCACATTTAGGTTGACTACGAAGTGTAATACATCAATGACTATAATTCGT 7920 A * V V Q L DL Q H K V N H L * Q Y * A L K F SR Y I W 8 IS * MI Y N S I N L S L R CT F CV S A E C * T TV S I L C 7921 TCAACCAACATACTACGCAAACAAGATACTATCTCTACCTGTCGCACAAATGCTACTACA 7980 L Q NY SAN T R H Y L MV AM KR H H L N T T H H T Q El ISIS L TN VII T P Q I I R K N * S L S PC R T * S ST 7981 ACTACGATCAAATAAACATCTATAATTATTAGACAATGTAAGATTTCAATTTCAACAAGG 8040 Q H * N I Q L Y * Y D TV N * L * L Q E HIS T * K YIN II Q * MR F N F N N SALK NT SILL R NC EL T L T T G 8041 ATTAAACATACATCAACATCATCTCTCACTACGACTATCTCGATTAAAAGACTTACGACA 8100 * NT H L Q L L S H H Q Y L * NE S H Q RI Q I Y NY Y L TI S I S SIK Q IS L KY T T T T St S AS LA L KR FAT Ii 1
U.S. Patent Jan. 10, 2012 Sheet 73 of 119 US 8,092,994 B2 8101 ACACAAAATACGTGTTAGTAACATATCCGGATATAATGAACATCTGTTTTTCAATTAATG 8160 Q T K H VIM T Y A * I V O L CF T L N H K IC L * Q I PRY * KY V FL * N TN * AC D NY L GINS T S L F N IV 8161 ATGTCGAACATTACCATAGAGACATTGGGTCTGATACAAACTACAAATACAACTATGAAA 8220 * L KY H Y R Q L G $ * T Q H K H Q Y K SC ST IT DRY G L S H K I N IN IS VA DL PIE TV WV INST * T S V K 8221 ATACAGAGTAAAACTACAACTATCTTTCTCAAAATTATTAAAACAATTGTAACGAGTACG 8280 * TEN Q H Q Y F S N * Y N Q * C Q E H K HR M KIN IS L T K I I K N VHS M ID * K ST S L FL K L LET L MA * A 8281 AAGAGAATCTCTCCCACACGTTAATCTTTTCCAAAATCTATGAAAACACCCTACACATGC 8340 KR * L PH A IL F P K L Y K Q P I H V SR K S L TEL * FL N * I S K H STY ER L S PT C N SF T K S V K T PETE 8341 ATTTACAACAAGGTAACTAAGTCTACAACTTTGTTCTAAATAATGATTTAGATACTATAG 8400 YIN NW Q N L H Q F L I * * * I * S I T FT T G NI * IN F CS K NSF R H Y L H Q EMS E ST S V L NI V L D II D 8401 ACGTCATCGACGACCAAACCTTAAATGACTACTTTTAATATTGTTAAACCATGGATGTAT 8460 Q L L Q Q NP I * Q H F NYC N P V * M R C Y SS T Q F K S I F I IV I Q Y R C A TA A P K S N V SSF * L L K T G V Y 8461 AAATTTCTCACTATTATAACATCGACGACTAAATCCACAAGAATATGTCTTACCACGATT 8520 N L SKY Y Q L Q Q EL HE * V SE H * I * L T I I N Y S S I * T N K Y L I T S K FL SLIT A ASK PT RI C F PAL 8521 CGTACATGTCCCATTACAACGATTCCGTCGATTATAAAGAACATATACCAAATAACTACG 8580 A H VP Y HQ * P L * YE KY IT * Q H L MY L TINS L CS IN R TIP K NI CT C P L TA L AA LIE QIEN I $ A 8581 AAAATTAGTTGAATGACGACTAAATGTCGTATTTAATTTTTTTCGTACACAATTTTGACC 8640 K * DV * QQN VA Y IL FL ME * F Q SKI LESS I * L M F * F F C TN F S K L * S VA S K CCL N F F A H T L V P TEIJYii
U.S. Patent Jan. 10, 2012 Sheet 74 of 119 US 8,092,994 B2 8641 GAACTTCAATTTTAACTGAAAATTATTCGTTCTCCGTTCACAGGGATAAGAATGTTGTGG 8700 S ST L I S K * Y AL P L H G * E * L V A Q L * F Q SKILL L CT DR N K CC K F N F N V K L L CS ALT G I R V V G 8701 GAAAAGTGAATTTCCTCCACAACATAACTCATTAAACAATATATATAATAAAAAACAATC 8760 R K V * L L H Q IS Y NT I Y I I K Q * C K * K F ST NI Q TI Q * IV * K K N K ES L PP T TN L L K N YIN N K T L 8761 AAATTAGACAAAATATAATAACACCCGAAATAACGGATGTATATCACAAATATTCAGACT 8820 N LEN * I I T P K IA * M Y H KY T Q T * D T KY * Q P 5 * Q R CIT NILE K I Q K I N N H A K N G V Y L T * L D S 8821 ATAAGTAAACGGACGAATACCATCAAAATTTCAATAACTATTACCACAACAATCTCTATA 8880 YEN AG K H * N * L * Q Y H H Q * L Y IN M Q R S I ST K F N N I I TN N S I I * K G A * AL K L TI S L PT T L S I 8881 AAGTCAATTACTAAATACAAAACGATTATTTAAAAAGGTTAAACTAGTTACCATACTCAG 8940 K L * H N IN Q * Y I K G I Q BITE T N * N il * T KS IF K EL K IL P I L E T L S K H K ALL NEW N S * H Y SD 8941 GTGAAAACCCAGACAAATGATAGTATTAAGATACCTAACGGGATAACATCACCGTCAATA 9000 WE Q T Q KS D Y N * P N G * Q L P L G SEP RN V I M I R H I A RN Y H C N V KB D T * * * L El S Q CIT TA TI 9001 CCTACTTCTATAGCCAAGATGATACAAATTACAAGGATGATTTCA7aAACTCTGTACCGAA 9060 PH LYE N * * T * HE * * L K S V H S H IF ID T ES H KIN R SF EQS M A S SSI P E VI N L T CV L TEL C P K 9061 AGTACAAAATGTAAAAAATTGAATACGTAAACGATCACTATCACAAGTCACGATATGTGG 9120 EN K V N K L K H M Q * H Y HE T S Y V K M N * M K * SICK ST IT N LA IC * TEE K V * ANAL S L T * H * V G 9121 TGTATAAGTCTAAAGAATATTACTAAAAATACGATCACCAACACAAAATAGTAGAAACAC 9180 V YES K K YEN K H * H N H K I M K T W M N L N R I I I HIS T T TN * * R Q CI * I E * L S K * ALP Q TED D K H
U.S. Patent Jan. 10, 2012 Sheet 75 of 119 US 8,092,994 B2 9181 ATGATACAAATTTTCTCCACTACCATGTGGTGTAGGAATAACAATAAGTCTACCACAATA 9240 Y * T * FL WHY V V D K N N N L H H T S H K F S TI T C W MR I TI * I TN VI N L L P S P VG C G* Q * ES PT I 9241 CTTCTTACGAAGAAACATATGTAGAAACCAAGGTGTATGTGCAATATCGGAACGATTAAG 9300 S S H K K T Y M K P E V Y V NY G Q * N H L IS R Q ICR Q NW M C TI A K S I F FAR KY V D K T G CV R * L R ALE 9301 ATTACCAAAATATTCTAAAGGACTACAATAATCACTTCCATAACATGCATAACATTCTTG 9360 * H N * LIE Q H * * H L Y Q V Y Q L F FIT KY S KR INN T F TN Y TN Y S L P K IL N G S TI L S PIT FIT L V 9361 CGCGAGATACTGAATAACATCTCACCCACGTACACTTATGCGGCTTCTCCCATATACAAA 9420 AS * 5K NY L P H M H I R FL P YIN R A R H SITS H T CT F V G FL T IT RE IV * Q L T P A WS? AS S P I H K 9421 ATTAAAATTATCAAGGACCCAAAACTTATTACTAATAATATCTTCATACGGACCTTGAAA 9480 * N * IN R P K S Y H N N IF IA Q F K K I K IT G P N Q lu ll s T H R S S L K L LW Q T K FL S * * L L I G P V K 9481 AACACCATCTCTAGAAAAACTAAACAAAATAGTTAAAAAATCATCAAATTAAGCAGGATA 9540 K H Y L OK Q NT K D I K * Y N LED K T T SIK K I Q K ILK K T T * NT R Q P L SR K S K N * * N K L L K IF CI 9541 TCTAAAGAAAAGAGAATGACGATCAAGATAAAAACCTCGATATAACCGATATCAACAACA 9600 L N R KR * Q * N * K Q L * IF * L Q Q II ER R KS ST R N KS S Y Q S Y N N SR K ER VALE I K PAIN A ITT T 9601 CAACCAAAAAATAATAAATTATTTTGAATTCGCACGAAAACCTCTAATATGATCACAACA 9660 R P K K N N L L V * A H K Q L NY * H Q D Q N K II * IF K L T SR S I I S T N K T K * * K IFS L R A K PS * V L TV 9661 TCAATATTTACAACAACAAACCACATAATTAAAAGAATACGAAAAACAAAAAGTTCAAAT 9720 L * LW Q Q K TI * NE * A K Q K ELK IN YIN N N PT NI K K H K K N K L N TI FT T T Q H ILK R IS K T K * T * RI,a
U.S. Patent Jan. 10, 2012 Sheet 76 of 119 US 8,092,994 B2 9721 AGGATAAACACGTACACAAATACGAACAAAAATAAAAATACATTGTAACATAAAAGGAAG 9780 D * K H M H K H K N K N KR L M TN EN I R NT CT NI ST K I K I Y C Q I KR G I Q ANT * A Q K * K * TV KY K GE 9781 ACTTTAATCACATTAATACGTAAACGTTACCTAACAATACATACCACGATATTACGGAAA 9840 Q Ft H L * A N A IS Q * TN H * LA K R F N T Y N H M Q L P N N H ITS Y HR S I L TI IC K CHIT I Y P A I I G K 9841 AACCAAAACACAGTGTATACATCGATACCAATAACGTTTGGTACAAAATACCAATAAAAG 9900 K T K H * M H L * P * Q L G H KIT IN K P K T DCI VS H N N CV M N* P* K Q N Q T V Y TA I TI A F W T K H N NE 9901 TATAACATCCTTTTAACCACAATTACATACATCACTATCATGTAAACTTCTTTGTAGAGA 9960 NI NY SF Q H * H I Y H Y Y M Q L F ME * IT P F N TN I Y T TI T C K F F C R Y Q L F I PT L T H L S L V N S S V DR 9961 ATGATGAAAATACTAATGATTTCTAAGAATAACATCTAATTTCTTAAGACAAAGACTACA 10020 * * K * S * * L N K NY I L S N Q K Q H K S SK H N S F I RI T S * L I RN RI V V NI IV L SE * Q L N F F E TEST 10021 ACGGATGTTATCTATAAACTCAAACATATTATTCATAGCAATGATATCACCATTTTACCT 10080 Q R C Y INS N T Y Y T D N S Y H Y F P N G VI S I Q T Q II LIT VI T T F H At L L Y K L KY L L Y R * * L P LIE 10081 ATGACGACGGATATCTCTTCGCCGCACAAGAGTCAATCGATTTCGATACCTTTGTAAATT 10140 Y Q Q R Y L L PT NET L * L * P FM I S S CI SF R R T R L * SF S H F C K VA A * LEA A HE * N ALA ISV N L 10141 AGTGTTATTACCATTACTACAGAATATGGTTGGAGGATGTCGTAGACAAAGATGTAGAAA 10200 D C Y H Y H H RIG VS * L M Q K * MN I VII TI ID * V L R R CC R KR C R * L L P L ST KY W G G VA D T E V ID K 10201 AAACGTTAGTCCATAACATTTCTACCATAGAGGATGCAGTTTTTAACTTGGAACATAACA 10260 NAIL? Q L S PIE * T L F Q V K Y Q K Q L * TN IL H Y R R R * F N FR TN K C DPI T FIT D G V D F IS G Q IT 1i [e~i~ici7 Y~1 P7
U.S. Patent Jan. 10, 2012 Sheet 77 of 119 US 8,092,994 B2 10261 ATCACAATGAATACCATCATACTGAAACTTACCAAATACCAATCTACTGTTTCAAATAAC 10320 * H * K HI IS KS H N IT L H CL K N N TN SIT T H S Q IT * P * I V F N I LIV * P LIV K F P K H N S S L T * Q 10321 AGGAGCAGTACAATATACAAGTAGGAGATTATACTTGCTTGGACTAATAAGACGGAATAA 10380 0 E D H * I NM R * Y SR V Q N N Q RI TRIM NY T * G RI H V FR II KG * G R * TI HR DEL IFS G S * E A K N 10381 CACATCTCAATGAGATCCACTAAAATGATATTACAGACCACCCTACTCAAATTGTCAACA 10440 T Y L * EL H N * * L T Q D S S N L L Q Q T S N S * T I K S Y HR PP HI * C N H L I V R P S K 1110 PR ILK V IT 10441 CAGAATGGTCTACGTCCCGACAGTTGAACAAAACTGTCAGAGAAATGTTTTAGGAATGTG 10500 T K GSA P SD V Q K S L R K V F D K C HR V L H LA IL K N Q CD R * L I R V 0 * W IC P Q * ST K VT E K C F G * V 10501 AGGTTTTATATGAAAACCATTACAATTTGGACCACTTTGAAAATGACAAAATCGACGCAT 10560 ELI Y K Q Y H * V Q H F K * Q K L Q T SW F IS K TIN FR T F S K S N * SR G F Y V K P L T L G PSI K VT K A Al 10561 ATTACCGGCTGGTGTTCCCCGTAAAGTACAATGATACGCATCATCAATATGATAATTTCC 10620 Y HG V IL PM E H * * A Y Y NY * * L II AS W L PC KM N S H ITT IS N F L P KG CPA N * TV I R L L * V I L P 10621 AAGAAAAAACACACCCAGTACACCTAGACAACCAATACATAATTGTCCACTATCACAATT 10680 N K K T HIM HI Q Q N HILL H Y H IRK Q PP * TSR N T I Y * C TI TN E K K H PD H P DIP * TN V PS L T L 10681 CAAACATATATACGTAGTTAATCTCGAGTCATGACCAACAGTGTGACCGTGACTAAAATG 10740 T Q I Y A D I LA * Y Q N DC Q C Q N L K II H ML * LETS IT V S A S I K N T Y I C * N S SLVP Q * VP VS K V 10741 ACCATTAAAAATACCAGGTATATCTCTACGAGTTCAACATGTCAACGGTCAATTCCTGAT 10800 Q IN K HO MY L H EL Q VT AL * P S ST I K ITW ISIS L NY L Q W N L V P L K * PG IL S A * TIC N GIL S ra:re*
U.S. Patent Jan. 10, 2012 Sheet 78 of 119 US 8,092,994 B2 10801 GCAGGTCTGACAATTACAATAACGAACCGAGATACGTCGATATGAATTATTAACACGAAC 10860 R G S Q * H * Q K AR H L * V * Y N H K VOL S N INNS P El C S Y K I IT S T WV T L T IA Q S * A Al S Lb Q A Q 10861 CAAACATGTTTTACTACAAACAAGATGACTTCTAAAATTACAAACCCGATACCGTTTACC 10920 T Q V F H H K N * Q L N * H K P * P L H P KY LII NT R S F I KIN PS H CI NT CF ST Q E VS S K L T Q A IA F P 10921 AAAATCGGTTCATTTTCGTCTAGAACAGAATCTACGAAACCGAAGTTACTGTCCACAAAG 10980 N * CL Lb L D Q R L H K P Kb S L H K T K A L Y F C I K D * I S Q S * H CT N Kb NT F AS R T K S A K A El V PT E 10981 ATAACTTTGAAATAACCGACGATAATTCGCAGATATATACCCTAAAGTTCCAGCAGTTTA 11040 * Q F K I P Q * * A D I Y PIE LOOP RN F S * Q S S N L T * I H SK L PT L ISV K N A A lb R R YIP N * PR * I 11041 TGATCCTTCAACATGAAAACTTCTACTTAACCGTGGAAGACTGCAAATAGTTGTTAACCG 11100 V L F N Y K Q L HIP V K Q R K D VIP Y * ST PS K F IF Q C R R V NI Lb Q S Pb Q V KS S SNAG ES T * * C N A 11101 ACCACAATTTAACGTTAGATTTTGTTTTTCTAAATAATTTCTTTGTTAAATAACCTAAAA 11160 Q H * IA I * FL F I * * L FL K N S K ST N F Q L R F CF SR N F F C NIP N PT L N CO L V FL N I L S VI * Q I K 11161 CTATAGATGTAAAAACAAATCAACATATTAAAGACGTAAACAATTTACCTGATATAAATA 11220 S I * M K T * N Y L K Q M Q * IS * I * Q Y R C K Q K T T Y NH C K N F P S Y K ID V N K N L Q I I E A N T L H VI NI 11221 CATATAATTATGTGTATACTAACCACAATGTAATACACATGAAACAAAACAATCAAAATA 11280 T Y * Y V Y S Q H * NIH V K N Q * N HIM IC NH N TN C * PT HP K NT K Y IL V CII PTV N H PS Q K T L K I 11281 CTACAATGATCAATTTGTATTCGTAAAAATAAACTGATACATATATTAAGGACATGAGAC 11340 ST V L * V YANK N $ * T Y L E Q VP. H H * * N P ML M K I Q S H I Y N RYE I MS T L CL C K * K V I Y II CT S Q
U.S. Patent Jan. 10, 2012 Sheet 79 of 119 US 8,092,994 B2 11341 ATGGAACAAAATACATTTAATAAATCAACAAATATTCCTTCCAAAATCTCCAAAATGAAT 11400 Y R T K H L N N L Q KY P L N * L N * K T G Q K I I II * N NIL PT KS T K S V K N * T F * K T T * L S P K L P K V 11401 ACAGACCGAGAGTATAAAACAAGGACGACACTTAAAATGAATACAAATACTTCATAAAAAT 11460 HR AR MN Q E Q Q S N * K H K H L I K ID PS * I K KR S H I K SIN IF Y K T Q SE Y K T GA IF K V * T * ST N 11461 ACCAACATAAAATACACAAAAACGATAAAAATATTGATACGTATCATAATTAGTACTGTA 11520 H NY K I H K Q * K * L * All * D H C I T T N * TN K S N K IS H M TN I MV P Q 1K H T K A I K IV IC L IL * S M 11521 AAAAAGAAACTACAAAAACCAACCATCTTATCAATGAAATTAAAGATACACCATAAAACC 11580 K K KS T K P Q Y FL * K L K * TI N Q N KR 0 H K Q NT SINS * KR H P I K K E KINK T P LIT V K I El H Y K P 11581 CAGCTTAAATCTTCTCCTACAAAACAATAAATAATGTCGGAAAAATCCATGAATATGTAC 11640 T S N L L P H K TI * * L R K L Y K Y M PR I * F L I N Q * K N C G K * T SIC D F K S S SIK N HI VA K K P V * V H 11641 CTGGTGATAAAACAGTAATCGATATCGTTTTTAACAACGATTAACCAACAGACAATTATA 11700 S W * K TM L * L L F Q Q * NT T Q * Y PG S N Q * * SIC F N N SIP Q RN I V VI K DNA IA FIT AL Q ND T L I 11701 TAR AATAAAATGTCTACATGGAATATAATTTAACTAAGAGAACTCAATGAATAAATATCC 11760 I K N * L H V KY * IS ER S N S I * L Y K I K CI Y R IN F Q N E Q TV * KY N * K V ST G * IL N IRK L * K N I P 11761 CATATAAAATAGAACAATAACCCCTAAAAAGAGAGAAAATTTGTCACAAAAATCTTACGG 11820 TI K I K N N P I K RE K L C H K * F A PIN * RI I P S K ER K * VT N KS H Y I K D Q * Q P N K ER K F LT K L I G 11821 ATACCCACAAATATTAATATTTTAAAGACAAGTTCTTAACGCAATATACTTACGATTACC 11880 * PH KIN IF K Q ELI A N IS H * H R H TN III F N R N L F Q TI H IS I I PT * L * LIE T * S N R * IF AL P r
U.S. Patent Jan. 10, 2012 Sheet 80 of 119 US 8,092,994 B2 11881 GAATGCAGGTGGAGCATTATCAAAACTCCGATAAAACAATTTAAATTTTGACGAACCTTA 11940 S V DV KY Y N Q P * K T L N L VA Q F A * T W R TI T K L S N Q * I * F Q K S KR G OR L L K S Al KU F K F S S P I 11941 TCCACCGCACGGTCAATAACTTCAGAGGGTTTAAGTTAGTTTTAACTGACTACACTTTAC 12000 L HR AL * Q L R OF HI L IS Q H S I Y TA H HUN F DO L N L * F QSI H F P PT G TI STE WI * D F N VS T F H 12001 ACGATTACAACAAAACAATTTAACAAATGTCGTAAACGTACAACGAAGATTAAGATTCAA 12060 H * H Q K T L N N V A N A H Q K * N * T PSI N N Q * IT * L M Q MN SRI R L ALT T K N F Q K C C K CT A EL EL N 12061 CACCGTCATAACATCACAAAATGTATTACTTTATGATAGATGAAGTCTAAACTCACATCG 12120 TAT N Y H K V Y H F VI * K L US EL Q P LIT TN * M IF Y * R S * I Q T Y H C Y Q L T K CL S I S DV ES K L TA 12121 AAAACTATTCGAACGAGTTAATAACTAACAAAATAAGCGGTTAGGACGACGTCAACTATG 12180 K Q IA Q El IS Q K I R W D Q Q L Q Y SKI L K S L * Q N N * HG IRS C N I KS L S A * U NIT K N AL GA AT S V 12181 ATTCACAGAACGTTCATATCTACTTCAATCGCTACTAATACAAGTTCTATCATGGCAAAA 12240 * T D 0 L IL H L * R H NH ELY Y R K S L T K CT II F NA III N LIT G N L H HA L I S ST L $ 5 * T * S L V T K 12241 CGTCCGAAACOTTTCACTCAAACATTTATACCGATCAAAACAACTTATACTTCAGCGTTT 12300 A P K A F H T Q L Y P * N Q Q I H L R L Q L SQL T L KY I H ST K N F IF DC CA K CL S N T F IA L K T S Y S TA F 12301 CTTTTTAAACCGACTACGATTTTTATCACCAAGACAATTAGTTGTTGTCTATTTTGTCAA 12360 SF NP Q H * F Y H N Q * D V vs L V T L F I Q S I S F I T T H U ILL L Y FL F F K A SAL FL PET L * CCI F CU 12361 TCTTTTTCGTACATTATATCGATTCAGACACATACTTGCACTATTTCGACATCGAGCGTT 12420 L FL N? IL * T Q T H V H Y L Q L E C * F F C TI IS L R H IF TI F S ISA SF A ELI AL D TV SR SLAT AR L
U.S. Patent Jan. 10, 2012 Sheet 81 of 119 US 8,092,994 B2 12421 TGAACTTGCATACCGTCTGGATCGTGAATGATTGTACATATTTCTCCGAGCCTAATTACT 12480 V Q V Y FL G L V * * CT Y L P ES * H F K FT H C V * C K S V H IF L SF N I S SR I A S R A S V L MY L S AR IL S 12481 ATTCTTCTCATTTCAACAAAGGCGAAACGTCTGTTACGAAAAATCGTACCAAGCATTTAA 12540 Y S S Y L Q K R K A S LA K * C FEY I ILL T F N N G SQL C H K K A H NT F L FL L T TEA K CV IS K L MT R L N 12541 CCTATTAGTCCGAAATTTAACATAAGACCTATTACGACAATTTCCAACACATGGAAACTC 12600 P Y D P K L N * HP Y H Q * L N H V K S Q I IL S * I R N Q I IS N FT T Y R Q S L * A K F El R S LA T L P Q T G K L 12601 ACGATAAGGTCGTAACCGACGATTATGAAATTGATATCATTATGGTCTATTTGTTCAAAA 12660 H * ELM PQ * Y K L * L L V L Y V L K T S N W C QS S I S * S Y Y Y W IF L N Al GA N A A L V K VI TI G S L CT K 12661 ACTATTTCAACAACTATTACAAATACAATGTATACGACCATCACATACCGTATATGTCTG 12720 Q 'IL Q Q Y H K H * NH Q Y H IA Y VS K IF N N il NI N CI ST TV PM Y L S L T T S L T * TV YAP L T H C IC V 12721 ACAAGTTCTACGACTACCATAATTATTTGTCAATTGACTATAATCACAACTAAGATTAAC 12780 Q E L H Q H Y * Y VT L Q Y * H Q N * N S N L ISI TN I FL * S I N TN I RI T * S AS P ILL C N V SILTS E L Q 12781 CGGAGAACAATAGTAACGCTTGTCCATATTACTTCAACGATTACGACAATACGTCTTATT 12840 A E Q * * Q SC T VS L Q * H Q * AS Y PR K ND N R V P I I F N S I S N H L I G R TIM A FLY L ST ALA TIC FL 12841 ACTCAACTACGGAGTATTTAATTTTTATGTTCAACAATTATCACCAAGACTATACTTAAC 12900 H T S A EVIL F V L Q * Y H N Q Y S N IL Q H R M F * F Y L N N ITT R I H I S N I G * L N F IC T T L L P E S I F Q 12901 ATTATAAGGATGAGTTACAATAATATTATTACCATCATCACCATCTTATCAAATACGACA 12960 Y YE * SIN N Y YE Y Y H Y FL K HQ TIN R S L TI I I I T T T VS 'IN IS L I G V * H * * L L FL L FLIT * AT
U.S. Patent Jan. 10, 2012 Sheet 82 of 119 US 8,092,994 B2 12961 AGAATCACTACAACTACCAGAATTCATATGATTCTATTACTTTCTACTACCTTTAACACA 13020 E * H H Q H D * T Y * S L S L H H F N H N K TI NIT K L I SLY H F II SIT R L ST S PR L Y V LII F S S P F Q T 13021 ACAAAATCTCGAACTAGGAGGAACATTTAAAAGATATGTTCTACAATTCCCTGAATTTTA 13080 OK LA Q D E KY I K * V L H * P V * F N N * L K I R R T F KR Y L IN L S K F T KS S S G GO L N E IC ST L P S L I 13081 ATTCATAGAAATAAAATAATTTCCTACATTGTGAAATCGATCTCCCACCCAACAACCATG 13140 * T D K N * * L I Y C K L * L Pp Q Q Y N L I K I K N F ST VS * S S P P N NT L Y R * K IL P H L V K AL PH T T P V 13141 AAATAGAAGTTGTTAATCTAACGTCCGACCACAACGATGACTCATACGTCGATTAAGAAG 13200 K I K L L * IA P Q H Q * Q T H L * N K S * R * C N SQL ST N S S L I C SIR K DEVIL N CA P TA VS VA A LEE 13201 ATATGAAAGTAATACACGTAAAAGACATCTAGGATTCTTTTGAATAAATCTAATATATGT 13260 * V K MIEN K Q L D * SF K N L NY V R Y K * * T C K R VI R L F S I * I I Y IS EN H A NETS G L F V * KS * I C 13261 TGTTCCACCACATGGATATTAA.TTAACACAATTTTACGAGACACTAGTACGACCATGACC 13320 V L H H V * L * N 14 * FARED H Q Y Q L L T TI R Y NIT N F H ET I M ST S C PP T G I IL Q T L IS QS * A P V P 13321 ATACCGGTAATGATAATTTGGACTCCGATGATAATTGGTTCTAAGAATACCACCACGGAG 13380 Y P W * * * V Q P * * * G L N K H H HR T HG N S N FR L 3$ NV L I RI T T G IA MV IL GSA VI LW SE * P P A E 13381 TCAAACATAAATAACGGCACGTGCACATCTCGTAGGTCTACATCTACCATATACATTTAA 13440 L KY K NC H V H LA DL H L H VI VI * N TN IA T C T Y L M WI Y IT Y T F T Q I * Q R ART SC G ST S P I EL N 13441 TGCACCATTTAAACATGTTCAGGGAAACCCATATTTTCTAGGATAAGAAATACACAATTG 13500 V HI I Q V L C K P Y L L D * E K H T L * TV F KY L DR Q TV F I RN K I H R P L NT CT G K P I F S G I R * TN V
U.S. Patent Jan. 10, 2012 Sheet 83 of 119 US 8,092,994 B2 13501 TGTACTACAAACAGTTCAGACACCAAAAACCTCTCTACCGTCAACAAGGACACATCCAAG 13560 V H H K Di R H N KS L H C N N R H L N CM INTL D T T K PSI AT T CT VT Cs T Q * TQP K Q L S P L Q E Q T P E 13561 TTCACAGCGACAAGTTAGATTTCTAAATTTAAAAAATTTGCCCAAGCCCCATGATCACAC 13620 L HR Q El * L Ni N Ni R T R P Vi T * T D S Ni R F I * I K * VP E p y * H L TAT * D L S K F K K F P N PT ST H 13621 TTACGGGCCGATCATGGGACACGATCACCAAATAGATGACTACAAGTTAATTCCCGTAAA 13680 FAR ST G Q AL P K D V S T * N LAN S H GA L V R H * H NI * Q H Eli PM IC P * Y G T ST T * R SIN L * PC K 13681 CTGTAAACATTATGGTTATCTCGACCATATCCAAATATAATATTTCACTTAACAACGGCA 13740 S M Q L V L LA PIP KY * L T F Q Q R Q C KY Y H FL Q FL N IN FL S N N G V N TI G IS ST VT * II F HIT AT 13741 AAAGTCGCATATCTACTGCTGCCATTATTTAACCTATTCAAGAAACAACAGTTTTCTTGA 13800 K * RI S S S P Li N S L N K T T L L V NE A FL HR R Y VIP FT R Q Q * F F K L T Y IV VT IF Q I L E K ND F S S 13801 TTAAATCTTCAAATATTATTTCTCTTTTGAATAATACTCAACTGATTTTCAACACCACAA 13860 L K ST * Li SF V * * S N Vi L Q PT * N L L KY FL SF K N H T S * F N H H I * F N I I FL F SIX L Q SF T T TN 13861 CACCGACTTGTACTAAAGAAATGTAAACTATAACTACCATCAGCGCACGGTGTATATCAA 13920 TA SC S K K V N S ISP L R T CCI T Q P Q V H NH * M Q Y Q H Y DR A V Y L H SF MIEN C KIN IT TA H W MY N 13921 GCATCCTTAGAAAGTTTCATATGATACAATCTAGAAACGATACGTAACGCAGTAAAACTA 13980 Hi F REF Y VI N SR Q * A N R * KS E Y SDK L T Y * Ti 13K S H MAD N Q T P I K * L I 5 H * IN A IC Q T M K I 13981 GCATTACTAACAAGTTATAACACACTTTAAGAAACACTCATACGACTAACATTTCTTAGG 14040 R L S Q HI N H SIR Q S Y A SQL SD D Y H N N LIT H FE K H T H Q NY L I TI IT * y Q T F N K T L I SIT F F C
U.S. Patent Jan. 10, 2012 Sheet 84 of 119 US 8,092,994 B2 14041 ATGAAAACATTCTTTCTAACCATACTAAAACAACTTTTAGGACTATAATAATTATATATA 14100 * K EL F S Q VS K T SF G S I I L I Y R $ K * S L N T H N Q Q F 13 Q Y * * VI V K R L F I P I I K N F I R I N N I 'Y I 14101 TTTTTTAATCCGGGATAAAAATTATCTCGAAATGAATTATGACAGTAAAAACGTCTGTGG 14160 L F N P G I K L LA K S L V TM K A S V Y F IL G * K * Y L K V * Y Q * K Q L C F F * AR N K I S S * K I SD N K C V G 14161 AATCAACTTCATCCAAATCAACCACAAAATTGAAATCTATTGGTTCTAAACATACCAGTT 14220 K T S T P K T PT K V K S LW S K Y P EL Q L L N L Q H K L K L Y G L NT 1413 * N F Y T* N T N* S* IV L I Q IT L 14221 ACCATACTAAAACCACTAAAATATGTTTGTCGGGGTCCCAAACCACACCGTCAACGTCTA 14280 411 SK PS K IC VA G P N PTA TA S IT H N Q H N * V FL G LT Q NFL Q L P I I K TI K Y L C G W P K T H C N CI 14281 AGAATGATAAGAATATACTACGGATACAACTGATACACAGTACATAATCTAACACTTAAT 14340 E * * E * II GIN VI H * TN S Q S N N K S N KY S A * T S * T D HI L N H I R VI RI H H R H Q S H TM Y * IT F 14341 AAACAATTACTATCAATATCTGTTAAGCTAGAACATGTCATACTAAAATGACTAATGTTC 14400 NT L S L * L C N SR T C Y S K V S * L I Q * H Y N Y V I RD Q VT H N * Q N C K N il TI OLE I K Y LII K S I V L 14401 AATCTCAACAAATTATTCATAAAATTCATAACCCCATACTTCATAGTAGGATTATGACAC 14460 N S N N L L Y K L Y Q P I F Y * CLV T T L T T * Y TN * TM P Y S T D D * Y Q * L Q K IL I K LI FT H LIME IS H 14461 CTAACACTATTACTATCCACATAATAAGTAACACGATTAAAATTATATGATAAATCATAC 14520 S Q S L S L H II * Q AL K LION L I P N HI H Y TI * EN H * N * Y VI * Y IT III PT N N MT S I HI Y * K TN 14521 CAAAATGGATTATGAACAAAACCAGGGGAACAATCTGTTTAAAAACATCTACCACATGGC 14580 T K CLV Q K PG R T LC I KT OPT G P K V * Y K N Q DC Q * V F K Q L H H V N * RI ST K T G K NO L N K II TIE
U.S. Patent Jan. 10, 2012 Sheet 85 of 119 US 8,092,994 B2 14581 AAACAACAAAGATAACCAATGGTAATGTTTCTCAATCCACATCAATACTTGAATCTACAA 14640 NT T LIP * W * L S N PT TX F K ST T Q Q K * Q HG N CL T L H L * S S L H K N N RN TV MV FL * TV N H V * I N 14641 CTGTGTGTGGCAATAGCAAACAGAGAATTTCTAAATGAAGAAATACGTCGTCTAGGACGA 14700 ST C R * R K BR L S K SR * A A SO A Q CV G ND NT E * L N V E K H L LB Q V CV TI T Q R K F I * K K ICC I R S 14701 TACGTGCAACGTAGACGATCACGAGACGAACTAAATGCTTGAACAACAAAATCACATCGA 14760 IC TA BA L AR S S K RV Q Q K L TA * AR Q M Q * HE A Q NV F K N N * H L H V NCR ST S Q K I * S S T T K T YE 14761 CGGTAATGTTCACCATATTTTAAAGTTTGACATTTTGGTCCATTGAAATTGGTTCTGAAA 14820 A MV L P I F N * VT F OP L K LW S K Q W * L H Y L I E F Q L V L Y S * G LB G NC T TV F K L S Y F W TV K V L V K 14821 ATGCTCAAACAATTTTCATTTCCGAACAAATTTCTCCCATCATGTCAACTAAACTTTGTA 14880 * S N T L L L P K N L S P L VT S K F C KR T Q * F Y L ST * L P Y Y L Q N S V V L K N F T FAQ K FL T T C NI Q FM 14881 AAAAAGAAATGAGTTCTACCATTACGACGTTAATGACTAATATTAATAATATTCATATTA 14940 K K K V * S P LA A IV S * L * * L Y L N KR * EL H Y H Q L * Q N Y N N Y TV K E K SLIT I S C N S 11111 L I I 14941 AATGGATGATACCAACTATAATTCGTCAATAACAAACATAATCTTCAACAAATATTTATA 15000 KG V ITS 1 L CNN NT N ST T * L Y NV * * P Q Y * AT IT Q ILL Q KY I * R S H N IH L L * Q KY * F N N IF I 15001 AAACTTTAAATACTACCACCAACATATGGTCGTAGTGTTCAATAACAATTATTAATACTA 15060 K S I * S PP Q I GAB C TI ILL * S N Q F K H H H NY V L MV L * Q * Y NH K F NI ITT TV W C * L N N N 1111 15061 TTTTCACGACCAATAGGTAAATTATTTAAACCATTTCGGTCTGAAATAATACTCCGTAAT 15120 L LAP * G N L L NP LA L S * * SAN Y F H Q N D M * VI Q Y LW V K N H PM FT ST I NK IF K T F OS K I I L C iaitwz:isjiij
U.S. Patent Jan. 10, 2012 Sheet 86 of 119 US 8,092,994 B2 15121 AGTAAACTCCTTGTCTTACTTTAAATACGTATATGATTTGCATTACAAGACGGGTGGAAT 15180 D N S S C F S I * A Y V L R L T R G V K I MQP VS H F K H MY * V Y H HA HR * K L FL IF N IC I S F TI HOGS 15181 TGAGTTTACTTAAATTTTATACGATAGTCACGATTCTTATCTCGAGCGTGACATCGTCCA 15240 V * I F K F Y AX LA L F LA R VT A P L E F S N L I H * * H * S Y L SC Q L L S L H I * F IS D T S L I S S AS Y CT 15241 CAAAGATAAGAATCATGATACTGTCCGGCTTACAAGGTAGTTTTTACAAACTTCTCATAT 15300 T SIR LVI VP FIN W * F H K F L I H K * E * Y * S L SF T G D F I N S S Y N RN K T S H CA SHE ML FT Q L T Y 15301 CGTCGATGGGCTCCACAAGGACAACAATATCCTTGGTGATTTAAAATACCACCAACCCTG 15360 A A V R PT ST TIP V V L N * P P Q S L L * G L HE Q Q * L F W * IX H H NP CS G ST HR NH IS G SF K ITT P V 15361 CTATACAATGCAGTAGAATATTTCCTACAACTGTTGCGACAAGAATACCCAACCCTAATA 15420 S I HR * RI F ST S L STRIP QS * RI TV A D * L PH Q C G Q E * PH PH I H * TM KY L I NV V RN K H T P I I 15421 GGATTTACACTAGCACGATACGGTTTATAAAACGCATAACAATCATCAAATCAAAACCGG 15480 CL H SR Al CF I K RI ILL K T K A 17 * I H 13 H * AL Y K A Y Q * Y N L K P R F TI IS H WIN Q TN NT T * N Q G 15481 GCGTTTGTACTTAAAACAACAAGTGTACCACTATCTAAAATAGCGGAACGCTTACTTACA 15540 R L CS N Q Q E C PS L N * FR A F S H SC V HI K N N V H H Y I K Y> G Q S HI A F M F K TI * M TI S K IA K RIFT 15541 CGAGTTCAAAACTCACTTTATCAATACACACCGCCAACGATAATACAATTCGGACCACCA 15600 A * T K L SIT ISP P Q * * IL G PP H ELKS H FL * T H RN S N H * A Q H S L N Q IF IN H TA T A XI N L FIT 15601 TGATCGTCACCACTACGTTGATGACGAAAACGATTAAGACAAAAATTATATACAGTCCCA 15660 V L L PS A V V A K A LET K L I H * A Y * C H H ML * Q K Q * N Q K * Y ID P SAT TICS S S K SIR N K I Y IL S
U.S. Patent Jan. 10, 2012 Sheet 87 of 119 US 8,092,994 B2 15661 CAATGACGATTACAAACAAGAGAATACCGGACATTACCGGTATTCTAACTTCTAAATTCA 15720 TV ALT Q ER IA Q L P W L I S S K L Q * Q * H K N E * PRY H G Y SQL N L N S S I NT R K HG TI AM L N F I * T 1S721 TATGCGTTAAATGTTTTTGCGAATATGAGATTACAAATAGCATGTCTAATACAACTAATA 15780 I R L K CF R KY EL T * R VS * PS * Y V C NV F VS I S * H K D Y L N H Q N Y Al * L F A * V RI NIT CII N i l 15781 TGTAAACAATTACTCATAATACTTAAAAATACATTCGTAAAATCATACTACTAAAACTCA 15840 V NT L S Y * S N K EL C ELI II EL Y M Q * H TN HI K IVAN * VS S KS C K N I LII F K * T L M K T H H N Q T 15841 CTACTACCACAACAGACAATATTGAGACTAATACGATCATTCCCAATATATCGATTATAT 15900 ES PT T Q * L E S * ALL P * IA L I H H H H Q RN VS Q N H * Y PH Y L * Y I I T N D T I V P I I S T L T I Y S I Y 15901 TCACAAAAAGTTGTTCAAAACATGATAGTCTTATTACAGAAATACAGACTTAGATTTACA 15960 L T K * CT KY * * FL T KID SD L H L H K E V L K T SD S YE R * T Q I * I TN K L L N Q V IL I I D K HR FR FT 15961 ACCCAACTTTTACTATAATGATTACCAGGAGTACTTAAAACAAGGGTTGTATGATACAAT 16020 Q T SF S I V L PG * S HQ E W C V I N NP Q F H Y * * H SE H I K N G V Y * T P N F II N SIT R M F K T G L MS H 16021 CAATTCTATCTACCACTAATACAAATAAATGGTATAGGTCTAGGAAGATCTTAAAATCCT 16080 T LISPS * T * KG Y G S GEL 11< P L * S L H H N H K NV MDL D K * F K L N L Y IT II NI * WI WIRE S N * 5 16081 CGACCAACAAAACAACTACTAAATAACTTCTGACTGTCACAAGAAAACTATCTCGCGAAA 16140 A P Q KT S S K N F VS L T RE IS R K L Q N N Q Q H N ISS Q CE HE S LAS ST TEN II * Q L S VT N K Q Y LA K 16141 CATTCAGATCGATATCTACGAATGGGAAATCATGTAGTACTTTTACTTCTTATGGTTTTT 16200 PLE A ISA * G K T C * SF S S Y W F Q L DL * L H K G EL VS H F H L I CF VT * S Y ISV R * Y MM F I F F V L F ra w r
U.S. Patent Jan. 10, 2012 Sheet 88 of 119 US 8,092,994 B2 16201 CAGAAAGCACATATAAATCTTATATATTTTTTTGACATATTACTAGAACCATGACTCTAG 16260 T KR TI KEY I F F S Y L S R P V * I L RE H IN L I IL F VT Y H D Q YES D K TI I * F I Y F F Q I I I K TEL D 16261 AATCTATCAATATCACAATAAAATTCATGAACACTACCAAATTTCAAATGACTTCTTAGT 16320 K EL * L TI K L V QEP K F NV S SD R LYE Y H * K L Y K HEEL T * Q L I * IT IT NE * T ST IT * L K S F F 16321 AAAATGTTCTTATACATAAATTTTTCACGGCACTACGTCTCACATCCACGTACGCAACAA 16380 N * L F I Y K FL A TIC L T PART T M K CS IT EL F HR SASH L H MR Q K V L I HI * FT G H H L T Y T C A N N 16381 ACAAGTAGTGTTTGAAGAAACGCAACACCGTCAACATATGCATTCGGAAACAATACAACA 16440 Q ED CV E K R Q FL Q I R L G K N H Q K N MV F K K A N H C N IVY A K TIE T * * LER Q T TAT T Y T L R Q * T T 16441 TTTACAACAATACTGGTACAATACCGTTGATTAGTATTTATACAAAACTCACAGAGTGGA 16500 L H Q * SW TI A V L * L Y TEL T HG YIN N HG H * FL * DII H K ERR V FT TI V HER C SIR F I N Q T D * R 16501 ATGCAAACATTACGTGGATTGACACTACACTCACTACAGTGGTTTAATATAAACCCGCCA 16560 * T Q LAG LOST L S TV L N YE P P K R KY H V * S H HERR * WI I N PR V N TIC R VT I H TI D G F * I Q AT 16561 TACAGAATGATAACACTTTTGGTATTTGGGGTAATAAGTAAATTCAATCAATACTTACCA 16620 ID * * Q SF W L G W * E N L N TIFF IT K SERF G IV G N NM * T L * S H HR VIP F V H F G MI * K L * N HIT 16621 TACCAGAAACCAAACATATTTGTTAGAACGTGCCCAAGTGGAATATATCTACTAAAATTA 16680 IT K P K IL CD Q V FE G * IS S K L Y PR Q N TI VI K C P N V K IL H N * RD K T Q IF L R A R T * RI Y I I K I 16681 TTCTATCGATCAACATTTACCTGTCTTCAACTACTAATACAAGACCGTTTACTCACATAA 16740 L I AL OLE VS T SE * PR A F S H I YS L * N Y I S L L Q H N HE FL H TI LIST T F P C F N III N Q C IL TN 1
U.S. Patent Jan. 10, 2012 Sheet 89 of 119 US 8,092,994 B2 16741 CTTGCAAATTTCAATAAACGACGTCTTTGAGTTTTCCGTTGACTTCTCCGAAAATTTGTT 16800 SR K F N N A A S V * F A V S S A K L C Q V N L TI Q Q L FE F P L Q L P K * V FT * L * K S C PS L L C SF L S K F L 16801 TCGATACGAAGACGATGGTAAGTTCTCTAACAATCACTATCTCTTCAATAAAACACAACC 16860 L * AS A V M * SIT L S L ST I K HO F S H K Q * WE L S Q * HILL * K T N A I SR SO N L L N NT I S F N N Q T P 16861 CTCTGTCCATTTCAATTTGGTGGTGAATTATTTTTAATACAAAAGTGTCCGATGGTAAAA 16920 S VP L IL G G S L L F * T K V P * W K P SLY L * V V V * IF N H K * L SC N L CT FN F WV K IF I I N E CA V M K 16921 TGATCATGACCATTCTGTCAAAATCCACTCATACAAAAACTATTTTCACTTAATTGATTG 16980 V L VP L VT K P SIT KS L L S N V L * * Y Q IS L K L HI H K Q IF H IL * ST ST L C N * T L I N K IF IF * S V 16981 CCACACATAATGGCGCGATGTTGATGAATATTTGAAAGATATCCACTACAAAAACAAAAT 17040 PT Y * R A V V V * LEE I PS T K T K R H TN OR * L * K IV K * L H H K Q K T H IV A SC S S I F KR VT INK N 17041 TGTAGTGTAAGACATCGATCAAATTCACGTGGATGTGAACAGGGTGTTCTCTTGATACGA 17100 V DC ETA L K LAG V ST G CS F * A L MV N Q L * N L H V * V Q G V L S S H C * M RI ST * T CRC K D W L LVI S 17101 TCATATTCTAAAAGATCACAAATATCACAAGGTAACCACAAAGTTTTATTACAACGATTA 17160 L I L N EL T * L T GM TN * FL TA L * IL I K * H K YES M PT E F Y H Q * T IS KR TN IT NW Q HR LII N S I 17161 ATAGTCGTGTAACCTTACTTTGCAATAACGTGACAAGTTCCAGGGGGACCATGCCCTTTC 17220 * * CM P I FR * Q VT * PG G P V P F ND A C Q F S V N N C Q EL D G Q Y P F IL V N S H F TI AS N L TORT R S L 17221 AGAGTAGAACGATATCCAGATCGACAAATAATGATGTGTCGTGCACATCAAATATGACGA 17280 D * R Al PR AT * * * VAR T T * V A TED Q * L DL QE N SC L V H L RI Q R M K SIT * S N IV V C C TIN 155
U.S. Patent Jan. 10, 2012 Sheet 90 of 119 US 8,092,994 B2 17281 CGATCAGTACGACGACATCTACGTAACACACTTTTTCGAATATTCAAAAATTTATAATTG 17340 AL * A AT S A N H SF A * L N K F IL Q * D H Q Q L H MT H F L KY T K L Y * sr MS S Y IC Q T F F SILK * I N V 17341 CTAACATGTGCATAATAAGGACGATTTCAAGCACATCTAACAATACTATTCAAATTTTAA 17400 S Q V R It GAL T FT S Q * S L N L I R NY V Y * E Q * L E H L N N NY T * F IT CT N N R S F NT Y I TI IL K F N 17401 TTACTATGGTGAACATTCATACAAAAATGGTGTTATTTACGTAATGGTCTCAACCAATGT 17460 L S V V Q LIT K V VI FANGS N TV * HI W KY T H K * N L LEN V L T P * II G ST LINK G C Y IC * W L Q N C 17461 CTATAACAACAACAACTACTTCAATCATACGAATGATTAATACTTAACAGACAATATTTA 17520 SIT ITS ST L I S V L * S N D TI F L Y Q Q Q Q H L * IA * * N H IT Q * L INN N N IF NT H K S I IF Q RN Y I 17521 CGAGCATAATTTCGATTTGTAATACATATATAACCTCTAGGACGAGTTAATGGACGTGGT 17580 A RI LA L C * T YIP S GA * N GAG HEY * L * V N H I Y Q L D Q E IV Q V ST N F S F MIII N SIR S L * R C W 1'7581 GCACACGACAACTCGTTCCCAAGAAATCTTGGATCCGTGAAGTTAAGATAATGATTTTAT 17640 FT S N L L PEK S CL C K L El V L I V HAT SC P N K L V * AS * N * * * F T H Q Q ALT R * FR P V HI RN S F Y 17641 TACACAACAAATCCAGGACTATAGAAAAACCCTTTAACAATATCCACAGGATTTCTTTAA 17700 IHQ K PC SIK K P F Q * LEG L S I L TN N L D Q Y R K P F N NIT D * L F H T T * T RID K Q SIT I PT R F F N 17701 CATCTTTGACAAAGTCGTAACCAAATACTATTATTTGAGTTCCGATTTTTACTATTATCA 17760 IS VT E ANT * S L L S LA L F $ L L Q L F Q K L M P K FYI V * P * FEY Y IFS N * C Q N III F EL S F I I I T 17761 AGTAATACAAAATTTCATATAAAATTCCCTGTCTGTTGTGTACTCTCAAGTTCACGACAT 17820 E N H K L TI K L PC V V CS L FLAT N M 1 9 * L I N * P V S L V H S N L H Q * * T K FYI K L S L CC ML T * T S Y ief[i]i11
U.S. Patent Jan. 10, 2012 Sheet 91 of 119 US 8,092,994 B2 17821 TTATAAGTTGTCTATATAGATTAATCATTTAAAAATTTTCGATTAGGTCAAACCTTATCA 17880 F I * CI Y R ILL N K F AL G T Q FL LYE VS I D L * Y I K L L * DL KS Y I N L L Y I * NT F K * F S I W N PIT 17881 CGACAAAAATAATCAGGAATATTATCAGTCTTAATACAACGATTCGCACAAAATCCACAA 17940 H AT K IL G * Lb * F * T A L R T K PT Q K * * D KY Y D S N H Q * A H Kb K S N K NT R I IT L I X N S L TN * TN 17941 GTTTGTGTTTGACATCTAAGACGAGTTCCAAGCCTTATACTAATACAATATATAAGTGTT 18000 * V CV T SEA * P ES Y 5 * TI YE C HP V F Q L N Q EL N P I H N H * IN V L C L S Y IRS L T R F I I I NY I * L 18001 TGTCGTCTTTGTCGGGTAAGACAATTACAATTAGCTAAATTACAACGGTATTGATCTCGG 18060 VA S VA WET L T L R N L T A MV LA F Lb FL G N Q * H * D I * H 0 H L * L CC F C G MR N I N I S KING Y S S G 18061 TTCTTCCCGTAAAAAACACAATACTCATTATACGTTAATAAACTTAGAGAATTAAAATAA 18120 L F PM K Q TILL IC N N SD EL K I 145 PC K K H * 5 Y Y A ll Q I E * N * Lb A N K TN H TIE L * K FR K I K N 18121 TGAGATGGAAATCTATTTTAAGTTTTAGTTTGAAATGGAGCAAACGTAACGTGTTGATTA 18180 V KG K S L I * F * V KG R K CO V V L * E V K LIFE F OF K V EN A N CL * S * R * I F N L I L 5 * R T Q MACSI 18181 GAAAAATTTCTAACATCATTTTCAACGAATCCAATAGTAGGTCGCGTACGGGGGAGTAAA 18240 R Kb SQL Lb Q K P * * GA CAGE N D K * L NY I F N S L ND 0 LA HG RN K K F ITT F TA * TIE W KEG G * K 18241 AATCGTCAACTACTATTTATATTCCAATTACTTTTAAACCGACATTTAAATTTATAAACA 18300 K AT S SLY L T L SF K AT F K F I Q K Lb Q H 1112 * H F NP Q L N L Y K * C N il F IL N IF 1 9 8 Y I * IN T 18301 CTTGGACAAAATTGTATAAGAGCAAATTATAGAGAATACCCAAAATTTAATCTAAACTGA 18360 S CT K V YE R KID RIP Kb N SKV H V Q K L MN EN LIE * P N * IL N S F RN * CI R T * IRK H T K F * I Q S 1
U.S. Patent Jan. 10, 2012 Sheet 92 of 119 US 8,092,994 B2 18361 GAACTACCAATAAGATTTAACAAATAATGATTTCTACTTCGGTAATTTGCACAATCTCCA 18420 R S P * EL N N IV L S SAM LET L P E Q H N N * I T * * * L H LW * V H * L K IT I R F Q K N SF I F G N FT N ST 18421 ACCCAACCAAAACTACAACTCCCGCGAGTACGATGAGCGCTTTTGTAACCTTGTTTGAAA 18480 QT P K ST S P A * AVE SF M P V F K N P Q N Q H Q PEE H * ER F C Q FL S P NT K IN LA S M SS A F V N SC V K 18481 GGTGACGTTTATCCAAAAAGTTGACCACACCTAAAACATCAACTTCGATGACCGAATAAA 18540 G Sc I P K E V P VS K T T S A VP K N EVA FL N K L Q H P N Q L Q L * OS I W Q L Y T K * ST H I K Y N F S S A * K 18541 CGACTCTCTCTAACAATATGAAAATTTTTTTGACATCGATTTCGAGGAGGACCACTTTTT 18600 AS L S Q * V K L F VT A LAG G PS F Q QS L N NY K * F F Q L * L SEQ H F S L SIT IS K F F S Y S F S RET F F 18601 AAATTTGTAAATTATGGGGAATACAGTTTTCCAGTTTTCACCCTATAACAATCTTAATCT 18660 N L C K I GRID F P * F H SI T L I L I * V N L V G * T L L D FT P Y Q * F * K FM * Y G K H * FT L L P I NNS N S 18661 TAACAAGTTTACAATAGACTAATAGAAAATCTGGAAAGACTATCACATCATAAATAATGA 18720 IT * IN D S * R KS R ES L T T N IV F Q E F TI Q ND K L OK Q Y H L I * * N N LW * RI 11< * V K R I T Y Y K N S 18721 ACCAGACGGTCAAAACTTGAATGAACAAATTCCATAAAACGATTTAATCCGTCTCTCGAA 18780 Q II AL KS S V Q K L Y K AL NP L SS K T Q W N Q V * K N L TN Q * IL CL A PR CT K F K ST * P I K SF * AS L K 18781 TTAACATTACACACAAGATTAGCACGATGTACGATGTTAAGATCTTGACCAATAATACCA 18840 L Q L THE L R A V H * L EL VP * * P * NY H TN * D H * MS C N * F ON N H IT I H T RI T SC A V IRS S T I I T 18841 ACAACCGCGGTATCAATATGAACACTAATACACATATTAGGTGAATAACATCTATATGTT 18900 Q Q R W L * V Q S * T Y L G SITS IC N NAG Y NY K NH H T Y D V * Q L Y V T P AM TI ST II H I I W K NY I Y L 1
U.S. Patent Jan. 10, 2012 Sheet 93 of 119 US 8,092,994 B2 18901 GTCACCCCAATATGTCCAAGAAATTGATCATTAGTGCTATATTAAACATTACATGTATTT 18960 C H P * V P E K V L L * S I I Q L T CL VT P NY L N K L * Y DRY L KY NV Y L P TIC T R * ST IVI Y N TI Y M F 18961 CCACGTGTACAACGCAGTCGACTACGTTAATACTGAGCAACAAATCGTTAGATACTAACA 19020 PA C TAD A S A IIV R Q K Al * S Q L NV H Q T L Q H L * SE N N L L R H N T CM N R * SIC N H ST T * CDII T 19021 AAAACATTTAGACAATTAACCTTAAATCTCATAGGTTATTAAAGATTACTCCAGTCATAT 19080 K Q L D T L Q F K S Y G il EL ST L I N KY I Q * N S N L T DL L K * H P * Y K T F RN I P I * L I WY N R IL D T Y 19081 TTATGTAGAACATCCAATAACGTCGCACAGTACGAATTTCGACGGTACGATACATTATCT 19140 F V D Q L N NCR TM S LA A MS H L L L Y M KY TI A A H * A * L Q N Al Y Y ICR T P * Q LTD H K F S G H * TI S 19141 ATGTTGAATACAATACTGTATCCGTTAGGATTTCCAAATCGAACACAGTTTCTAATACTT 19200 Y L K H * $ M Pt G L P K A Q T L S * $ IC $ IN H CL CD * L N L K H * L N H V V TI V Y AIR F T* ST D F I I F 19201 AAATTTAAA.A.TACTACGAAAAGGACATCGGTTCAGACAATTTGTCAATAAAATACAGATA 19260 N L N * S A KG T A L ST L CNN * T I * I K H H K SQL NT Q * VT I K HR K F K II SK R Y G L RN FL * KID I 19261 CTACACGTATTTCTATTAAAATTTCTACCAAATACATACAAAACCTTAACATTACAACTA 19320 ST CL St K L S P K H IN Q F Q L T S H HAY L Y N * L H NI Y T K S N Y H Q INN F I I K FIT * T H K PIT IN I 19321 TTTATAGGTAGATTAAGTTAACAAACATCTAAACTGTGAGCTCACAATTTATTTAATTTG 19380 L Y G DL El T Q L N S V R TN FL N F Y I D M * N L Q KY I Q C E L T L Y I L FIN RI * N NT S K V S SR * IF * V 19381 GAAGGACCTACATTACCACCATCAAACATACAATTATTTGTACGTAAGGTATGATTAGGA 19440 R G PH L PP L K VT L L C A N WV L G GE Q IV H NY NT H * Y V H MG Y * D KR ST ITT T Q IN IF MC EM SIR
U.S. Patent Jan. 10, 2012 Sheet 94 of 119 US 8,092,994 B2 19441 AAATGATCTTGACAAAAACTTTTAGAATTCGGATACGGAAAAAAGATAATAAGTCTATGC 19500 K V L V T K S FR L G ICE K * * ES V K * * F Q K Q F D * A * A K K RN N L Y K S S S N K F I K L R H R K E li * I R 19501 GGAACACACATCCATCTACCAAATCTTAGATTTGTTCAACTAATGCAAGGAAATTCTTCG 19560 G Q T VT S P K SD L CT S * T G K L L A K H T R L H N L I * V L Q N R E K L F R TN V Y IT * FR FL N IV N R * S A 19561 CGGTGAACATAGTGTGCCACATTAGATCCACCTCGACAAACAAGTTTCGTACGACTTCTT 19620 A V Q IV R H L R PP AT Q E F C A S S R W KY * VT Y DL H L Q K N LA H Q L CS T DC P TI * T S S NT * L M SFF 19621 ATAACATTGATGGAACTCAGAATATTATATCAATGATGTCGTCCGAAATGAAAAACCCAA 19680 Y Q L * R SD * LIT V VA P K V K Q T I NY SC Q T KY Y L * * L L S * K K P IT V V K L RI I Y N S C C A K S K P N 19681 ATATTCTTAAAACTAAAAATATTAAATACCTTGTGAAAATGATGCAATGTCTCAAATCTT 19740 * L F K SIC * L K H F V K V V N CL K S KY S N Q N KY N IS C K * * TV S N L IL I K I K II * P V SE SR * L T * F 19741 TTGCATTATATATTGAACCAATTACAACCAGTAATACTACCTGCATGTCCACTTAATGGA 19800 F TI Y L K T L T P * * S PR VP S NC F R LIV S P * H Q ON NH V Y L H IV V Y Y IV Q N IN TM I I S T CT F * R 19801 ACACGATAATACTTACTGTTTCAACAACAATTCTAATTATTACATCTATGACAATAAAAA 19860 Q A ll F S L T T T L I L L T S VT I K K H * * S H CL Q Q * S * Y H L Y Q * K T S N H IV F N NH L NI IV IS N N K 19861 TTTTTATTATGTAGTAAAGGATGATTATATCGACAACTTAACAAATGTTTTGCATCATAG 19920 L F L V ON G V L I AT S N NV FR L I * F Y Y MM E * * Y L Q Q IT * L V Y Y F II C * KR S I Y S N F Q K CF T TO 19921 GCCGTGGTGGGACTTGAATTCTAAGAATCTTTAAACTTGTAACTATAAACAACCTTCGTA 19980 R C HG S S L I R L P K F MS I Q Q F C GAG G Q V * SE * F N SC Q Y K NSA P V VP. F K L N K S I Q V NIH T P L M
U.S. Patent Jan. 10, 2012 Sheet 95 of 119 US 8,092,994 B2 19981 CAGGACACCCTAATACAATTTCTATCAAACAAAACATCAAGGTGAATACCACAAACATTT 20040 FR H S * TL S L K N Q L E V * PT Q L HG T P N H * L Y NT KY NW K H H KY D Q P I I N FIT Q K T T G S I TNT F 20041 ATGTGTCTAAACTTCAAGTAGCTTTTAAACTTATATGAAAAACTACCAGCACTGTGACCG 20100 Y VS K F NM S F K F I S K S PR S VP I CL NSF * R F N S Y V K Q H D H C Q V C I Q LED F I Q I Y K K ITT VS A 20101 CGAAATCTTCGAAAATCTTTTCGTTCTTTACCACAAAAATAATCATGACTTTTTAATTCA 20160 A K S A K L F AL F PT K IL VS F N L R K L L K * FL L F H H K * * Y Q F IL S * F S K S F CS I T N K NT SF F * T 20161 TCCAATAGTTACTAATTTCCAGGCGTTGCTCGACTAAATTTACCACACTAACACCTATTT 20220 L N D I I L P CC R A S K F PT IT S L Y TI L S * L D A V L Q N L H H S Q P Y P * * H N F FR L S S I * IT H N H IF 20221 CAACCTCTTGAGTTTCAACTCAAAACCAAGCGATACTCTTTTCTACCACTGCTACAATAG 20280 T PS S L T S NQ NAIL F S PS S TI L Q L V * L Q T K FR * SF L H H R H * NSF E F N L K P ES H S F I TV IN D 20281 AAGTCGGCTTGTCTGTCGGATACGAGTTCGGTAATGACCTCGGGTGTTCCATTAGATCCA 20340 EL R VS L R H EL H * Q L G C P L R P R * G FL C G IS L G N S S CV L Y DL E A SC VA * A * A MV PAW L TI * T 20341 CCATTAACGCGCCCATTACAGTAACCATTACTACGAGATTGTGCAAAATGATAGAAATGA 20400 P L Q A P L TM P L S AR V R K V I K V H Y N R P YE * Q Y H HE L V N * * R T IA R TI ON T I I S * CT K S D KS 20401 GTCTCAGCACATAACAGTTCAAAACTTGGAGCGAGTCTAAATCTTGCCCTAAAATAACTA 20460 * L R TN DL K S G RE S KS R S K IS ESDE IT L N Q V ES L N L V P N * Q L T T Y Q * T K FR A * I * F P I K NI 20461 TACCTACTATTAGACAAATAACGATTTATACCAAATCTTCTGATACGTAAACTAGTATAT 20S20 199 L RN IA L VP K S S * A N S * I Y PH Y D T * Q * I FIN L L S H M Q D Y H II I Q K N S F I T * F V I C K I MY
U.S. Patent Jan. 10, 2012 Sheet 96 of 119 US 8,092,994 B2 20521 CAAATACCATCAAAATTGGTATTTCAATATCCTCCAAACGTAAACGAATATCCGAATAAA 20580 T * P L K LW L TI F P K C K SIP K N L K H Y N 'k G Y L * L L N A N A * L S I NIT T K V M F NY ST Q M Q KY A * K 20581 GCATCCTTTTTTTTTAGATTAAACAATTAAGTTCTCAAAAATGTCATACTAAGATCATAA 20640 R L F F F DL K N I * S N K C VS ELI E VS F F I * NT L EL T K VT H N * Y T P F F FR I Q * N L L K * LII R TN 20641 GTAAGTATAAAATAATGACTAGTCCTCACACCATCATCATTCTCACAAACATGTCAATAA 20700 * E Y K I V S * S HP L L L L T Q VT I EN MN * * Q D PT H Y Y VS H KY L * M * I K N SILL T T T T L TNT C N N 20701 CTAAATAATAATCTACTAAAACAAAGATAACAATTCAGTAATTTAAACTCAACACAATCA 20760 S K N N S S KT SIT L D N F K L Q T L Q N I I LEN Q K * Q * TM L N S N H * I * * * I I K N RN N L * * I Q T TNT 20761 TTTCAACAATTATAATTACAACTAAAATTCCTAAAAGTTAAATACAACACCACATTACTA 20820 L T T LILT SK L S K * N IN H H L S Y L Q * Y * H Q N * P N NI * T T TV H F N N IN I N IS L I K L K H Q PT II 20821 TTATTTTAATACTGAAAAATAGGATTTTACGTTCGGTGATTACTAACCTTTGGACCGATA 20880 L LII V K * G L IC A V L S Q F G P Y Y F * S K K D * F AL W * H N S V Q S IF N H SKI R F H L G SIX P FR Al 20881 AGATACGGACAAAACATATTCATAAACTTACAAGGTAATCTCTCTCAGAGAAATACCTTA 20940 E I G T K Y L Y K FT G N S L T EKE F N * A Q K TV TN S H E ML S L R K IS R H RN Q I L I Q IN W * L $ DR * P I 20941 ATACCATTTGGATAATTAAACGGATGTCCGACATACTACTTACAACGATTCATGTGAGTT 21000 * P L GIL KG VP Q II F TA L Y V N H Y V * * N A * L S Y S S H Q * T CE I T F RN I Q RC A T H H INS L V S L 21001 AATACAGTCATAAACTTATGATGTTGTAATCGACAAGGACAATTATACGCACAAAATGTA 21060 NH * Y K F V V V NAT G T L I R T K C I I D TN S Y * L ML Q E Q * VA H K V * T L I Q IS CC * S N R NI H TN * M tt1eijh I]
U.S. Patent Jan. 10, 2012 Sheet 97 of 119 US 8,092,994 B2 21061 AATCCACGTCCCAGACTATTTCTTCATCGAGGTCCAAGACGACAAAATTCTGTCACCAAT 21120 K P A PD S L STAG PH AT K L C H N N L H L T Q Y L L L EL N Q Q K L V T T * T C P RI F F Y SW T R S N * S L P 21121 GGTAGACCATCATAAGAACATCTATTACTAAATTTGGGTAAACAATCGCTATCAAATCAA 21180 G D P L I R T S L S K F G NT L S L K T V M 0 Y Y E Q L Y H N L GM Q * KY N L W FT TN KY III * V W K N A IT * N 21181 TGAATAAAACCTCTAACATACTGAAATGGTAAACTAACAGTAACCCTAAACTATTATAGA 21240 V * K P $ Q I V KG N S Q * Q S K I I D * K N Q L NY S K V M Q ND NP N S L I SIK SIT H S * W KIT M P I Q Y Y R 21241 CTATACATACTAGGAGAATGATTTTTATAACCACTAATATTACACTCATTCCTACCCAAA 21300 STY S C R V L F I P S * L T L L S P N Q Y TED E * * F Y Q H NY H S Y PET I H I IRK S F I N TI I I H T LIP K 21301 AAATCAATGTAAACACTAAATTAACCACTATTTAATAGAAACCCACCATCACATCGATAT 21360 K V * M Q * K IRS L ND K P P L T Al K * K C K ON L E H V II K PH Y H L * KS V N PM * N TIP * R Q T T T Y S Y 21361 TTTTAATGTCTCAAAAGAACCTTACGACTAAATATATTTAATTACTCAACAAAACGTAAA 21420 F IV S N E Q F A SKY L NIL Q K A N L F * L T K KS H Q NI Y IL S N N Q M F N CL K R P I S I * IF * H T T K C K 21421 ACCTGTCAAAAAACATGATTACATTTACGAAGAAGATCACTTCCCAAAAATTATCCATAT 21480 Q VT K Q V L T F A EELS P N K I P I K S L K K Y * H L H K K * EL P K L L Y PC N K T S I Y IS RET F P K * VT Y 21481 TTAATGGACCCATTTAGAAGAAAACTTTATCTACCGTTACAATACGTACGATTGATAAAC 21540 F * R P LD E K S IS P LVI CAL * K L N G P Y I K K Q FL H C H * A H * S N IV Q T FR R K F Y IA IN H M $ VI Q 21541 AAAACCTCTTTATCATGTTGTACCTTACCGCCACGAATATCAAATAAACTATACTGATTT 21600 N Q L FLY VHF PP A * L K N S IV L P1< $ F Y Y L MS H R H KY NI Q Y S * K PSI T C C P I AT SIT * K I R S F r
U.S. Patent Jan. 10, 2012 Sheet 98 of 119 US 8,092,994 B2 21601 AAAAGAAACTTTAACCGACCGTGACGACAACAATTAAATTCTGGTCTAGTTAATTTACTA 21660 NE K F N A P VAT T L K LO S * N F S I K K S I P Q C Q Q Q * N L V L D IL H KR Q F Q S AS S N NI * S NIL * I I 21661 AATCAAATAAGAGAATAACTTTCTCCATTTAATAATCAAGCGCTATGCGCATTTCTCTAA 21720 K T * SRI S L P L N NT R S V R L S I N L K N E * Q FL VII L E R Y A Y L S * N IRK N F ST P * * N Al R T FL N 21721 AAACAACCACTATCAGAACATTTATGAACAATCTAGAGTAATTTAGATTTGATACAATTA 21780 K T PS L R T F V Q * I E N FR F * T L K Q Q H Y D Q L Y K N SR ML DL S H K NT IT KY I S T L D * * I * V IN I 21781 ATAAAAAAATAAAAAAATAAAGACAATACCAAAATTACTTGGAGAATTACAACACAGAGT 21840 * K K I K K N RN H N * H VS * H Q PS N N K * K K IS TIP K IF R KIN HR IRK N K * K Q * P K L SO R L T PD 21841 AAATTTGGTACTGACCAAAAATAAACCACTATCAGCAAGACTAACATTGGTATAATTATT 21900 N L OHS T K I Q MY D N Q NY G Y * Y M * V M VP K * K PITT RI TV M NI K F W S Q N K N P S L RE SQL NIL L 21901 AAATTTTTAATTTTTAAAACTAATAAACCTATAAGTGGGATCAAACACGTTGTTACCATT 21960 N L F * F N Q N N P Y KG * NT CC KY I * F N FIR II Q IN VET Q A V IT K F IL F KS * K S I * CLX H L L P L 21961 CTAAAGTAGATCACGGCCACTAAGATAAAAATTCTCAAAAGTAAAGTGAGCTAAAATATT 22020 SK M * HR H N * K * S N EN * E I KY L N * R TOT I RN K L T KM ES SKI I E DL A PS E I K L L K * K V RN * L 22021 AATGTGACCGCTTCCACTAGTTTAATAAAAAATACTCCCACAATTAAAATTAGGAATAGT 22080 N C Q EL H D F * K K H PH * N * D K D IV S A F TI L N N K IL TN I K I RI * VP S PS * I I K * S PT L K L G * * 22081 ATCTAAATTCACAAAAGGATTACCATCATTACTACATACCGAAGAATTGTTCCATTCTAA 22140 II * TN E * H Y Y H H IA E * C P L I MS K L T K RI T TI I VP K K V L VS L N L H K CL P L L ST H SR L L T L N t
U.S. Patent Jan. 10, 2012 Sheet 99 of 119 US 8,092,994 B2 22141 AATAGCACGGAATATAAGATTATACCGGAAAAAAGCAATAGAATGAAAACAACTATAAGG 22200 ED HR IN * Y PR K END * K Q Q YE KIT G * I RI HG K K TI K S K N I N * HA EYE L IA K KR * R V K T S I G 22201 AATATTACAAAGAGAAAGATTCAAATTAAGAACATTTTCACTATAAAATAGTGAATTGTT 22260 KY H K ER * T * N KY F H Y K I V * C RI IN R KR L K I R T F TIN * * K V * L T E HE L N L E Q L L S I K D S L L 22261 AGGATAAAAATAATTAATAAGATTCCTTCAAATAAAATGAAATAATCCAACAAGAGAAAT 22320 D * K * * N N * P L EN * K II. N NE K I RN K N il RI. F N I K S * * T T R K G I K I L * EL ST * K V K NP Q ER 22321 AAATCATGGCGAAACGGAAAAATTTAGATTGAAATCAGTCATGATAATATTGTATCTATG 22380 N L V A KG K * I * S * D T S N Y CI. Y I * Y R K A K K FR V K T LVII V Y I K T G S Q R K L DL K L * Y * * L MS V 22381 ACCGAGACAAATACCAAAAACATTACAACAAATAGGACTAAATCTGACATAAATATAAAG 22440 Q S Q K H N K * H Q K D Q ML S Y KY K S A R NIT K RI N N I RI * VT N IN PET * P K EL T T * OS K S Q I * I E 22441 AGAATTTGGTCCAAGAATATTTCAAAGGTGGTGACGTGGAAAAAATAGGAATGGATGATT 22500 E * V L N KY L RE W Q V K RI R V * * R K FE T RI F N G G SC RE * G * R S HI. G P E * L T E V V A GRE DR G V L 22501 TCGAGAGACAAAACTATTTAGATTTGTTAAACATGGACATGTCCAACAACTAAGATCTAC 22560 L E RN Q TI * VI Q V Q VP Q Q N * I F SET K IF R FL K Y R Y L N N IRS AR Q K S L DL C NT CT CT T SE L H 22561 CTTGTTGCTCGCACGGACTCTATAAAGAAATAGACAACGTACAGTTAACGGTATAACAAT 22620 S C R A RELY K K I Q Q MD IA MN N P V V L TO * IN R * RN C IL Q WIT FL SR A ES I E RD TA H * NC Y Q * 22621 AAAAGCGTTAAGAAGACGATTAATACAA.CCGTTCATACTATAATTGGTGCCACTATCACC 22680 NEC N K Q * N H Q CT MY * GREY H IRA I HR 51 1 N A LII N V V TI T K R LEE A L * T PLY S I LW PS L P I ate rsxi
U.S. Patent Jan. 10, 2012 Sheet 100 of 119 US 8,092,994 B2 22681 AAAATAAAGATAAAATAGACCAGAAAATATATTACAAAGAACATAAAGTATAATACCACA 22740 N * K * K I Q D K I Y H K KY KM N H H T K N RN * R T K * I I NE TN * I IT K I E I K D PR KY L T E Q I E Y * P T 22741 TAAAAATATACTATTAAAATGTAGGTAAACCGGGATAATAAGAAAACCATCCACAGGATG 22800 I K I H Y N * MW K A RN N K Q Y T D Y K * I I I K C G N P C II R K T PT R N K Y S L K V D M Q G * * SIC P L HG V 22801 TAGAAGATAATAATTTGTAGGTTAAACACAAATACTAAAAAACGGATAATAAAATGTTCC 22860 M K * * * VOL K H K H N K A * * K V L C R RN N FM W N TN I I K Q RN N * L D Eli L C G I Q T * S K K CII K C P 22861 ATAAAATAATACAAATCGAAATGAAAAACAACAAAAAGATAATAAAAACAATATATTGCT 22920 Y K II ML K V K Q Q K El I K T I Y R TN * * T * S * K K N N K * * K Q * IV I K N H K A K SK T T KR NH K NY L S 22921 ATTTAGAGTAATTTAGATTTGTACAATAATTAATAAAAATAAAACGGATGTTGTAATCGA 22980 VI EN FR F MN N il K I K G V V N A IF R ML DL C TI L * K * K A * L ML L D * * I * V H * * N N K N Q R CC * S 22981 CAATATCCACTAAAATTAACATGATTAAAACGATAATTACTAAATTTGTGGTGTCAAGGA 23040 TIPS K L Q V L K A IL SK F V VT G Q * L H N * NY * N Q * * H ML C W L E NY TI KIT S IR S NI I * V G C N R 23041 GCGTATTCACTCATACAACACCTACAAAGAATACCAAACCCATGTATAATATATGAACTA 23100 RN L S VT T STE * P K P V Y * 165 E CL H T H Q PH K K H NP Y MM Y V Q A VT L I N H IN R IT Q T CII Y K I 23101 GCACAAATAAATTTATGATGATATAATAAATGACCAATAAAGGGATTTAGACCACGGTTA 23160 R T * K F V V INN VP * KG LID PAL OH K N L Y * * II * Q NM G * IQ H W T NI * IS S Y * K S TIER FR PG I 23161 AAATCCCTAGATAGAAATTTTCCATGATGTATAAACTCATGAGAAACCATAGTCTTTGGG 23220 K L SR D K F P V V Y K L V R Q Y * F G N * P D I K L L Y * MM S YE K TO S V K 91 * R * FT SC I Q T S K P IL F G FIG. 3 CONT'D
U.S. Patent Jan. 10, 2012 Sheet 101 of 119 US 8,092,994 B2 23221 AAAAATAGACTAAAATTATTACCATAAAAAAGATCTCAATTCTTATGATTCAACATACAA 23280 K K D S K L L PIKE L T L F V L NY T R K I Q N Y H Y K K * L * S Y* VT H K * RI K II TN KR S N L I S L Q IN 23281 TTATTTTGAAACATATCACTCAAATCATGATATCAATATCCATCACAAAAATAATTGTTG 23340 L L V KY L S N LVI TI P L T KILL * Y F K T Y H T * Y * L * L Y H K * * C IFS Q IT L K T S Y NY T TN K NV V 23341 AGAATATGATAACAACAAGTTGGAGTATTACCACAAAACCTCTAATGTCGAACAGTTATG 23400 E * V ITT * G * L PT K S I VA Q * Y S K Y* Q Q EVE Y H H K P S * L K DI R I S N N N L EMIT N Q L N CS T L V 23401 TGATACACACTCATAGGAGTATGATAAACATTTAGATTTCCATCAAGAGCATTACTTAGA 23460 VI H S Y G * VI Q L DL P L EELS D C * T H T D E Y * KY I * L Y NE Y HI S H T L I R M S N T FR F PT R T IF R 23461 ACCGTAAAACTATTTAGACTTGGAAACACAGACAAGTTCTTTTTAAAATGAATATTACAA 23520 Q C KS LOSS K HEN L F F K V * L T K A N Q VI Q V K TOT * SF N * KY H PM K I FR FR Q T Q ELF I K S I I N 23521 AGATGTCTAACCAACATAAAAGTAAAAATAGTTCTTGCACCGTGAAAAATACGAATAATA 23580 E VS Q NY K * K * * SR P V K * A * * K * L N VT N E N K DLV H C K K H K N R CI P Q I K M K IL F TASK ISII 23581 CGACTAAGACCGTACGGATGATGAAAAAATAAATCAAACATAGAACCATGAGAAAATAGA 23640 A SEP MG V V K K N L K Y R P V R KS H Q N Q C A * * K K I * N T D Q YE K I SIR A HR S SK * K T Q I K T SK * R 23641 GTAATAATACAAAACGGAAACTGAACATTACGATATAGAAGATTATGACTATTACTCTGA 23700 * * * T KG KY Q LA ID EL VS L S V EN N H K A K SKY H * I K * Y Q Y HE M II N Q R Q ST IS IRE ISII L S 23701 AATGTTATAACCCAGTGTGGAAACAGATTTGCGGTTATAGAAGAATTTAAACTGTTGGCA 23760 K C Y Q TV GE DL RN Y R ELMS L R K YIN P * V K T * V SIDE * I Q CS * LIP DC R Q R F A L I K K F K V VT
U.S. Patent Jan. 10, 2012 Sheet 102 of 119 US 8,092,994 B2 23761 CCACAATAATGATTACGACAACTAACAAGATCATCAAAGAAATCGCTCTAAGTTACATTT 23820 PT IV LA T S Q E ILK K L S I* H L H H * * * H Q Q N N * Y NP * PS SIT TN N S IS NIT R T T E K AL N L T F 23821 TGATTTAGAAATAATGGATTATGACCACAAATACTGAATAGACCAAAATGACAATTCGGA 23880 V L OK NO L VP T * S K OP K VT L C F * I K I V * Y Q H K H S I Q N * Q * A S PR * * RI ST N IV * PT K S N L R 23881 CAACGTTGACATGTAGCAGCATAAGGACTAAATGGACTAACACTGTAACTATTTACCGAA 23940 TA VT C R RIG S KG S Q S M S L H S Q Q L Q V D DV E Q N V Q N H C Q VIA N C S Y MT TN RI * RI TV N IF P K 23941 TTGTTAAAATTACATGGGAGTGGAGAATTAACCCTTGCATTTTAAAAAAGATTAACGTTG 24000 L L K L T GE OR L Q SR LIKE L Q L * C N * H V R V E * N P VT F K K * N C VIP ITO * R K I PS T F NE R IA V 24001 AAATTAAACTCATGAAACGAAGCAAATCAAGTATGACTAAGAAAAAGAACATTATTAAAA 24060 K L K L V K SR K T * VS EKE 0 IL K S * N NY K A E N L E Y Q N K K KY Y N K I Q T S Q K T * N MS I R KR TI I K 24061 CTACTTAGATTCTATATACCATCAACAAAATTCTCATAACAAAATCTATTTWCGGTAT 24120 S SD L I Y P L Q K L LIT K S L N AM Q HI * S I H TN N * ST Q K LVI Q W ISP L Y ITT T KIT N N * IF KG Y 24121 GGGTTGAGGTCTGCTAGACTAAACGTCAACCCGTCAAGACCAAAAGACGTTAGAAGATTA 24180 OLE L RD S K C N PIE P KR CD EL V W SW VI Q NAT PC N Q NE A I K * G V G S S RI Q L Q AT PT K Q L PR I 24181 ATATTTTAACTGTGATGAAGATCAAGAACAGTTAACATAATATCAAACGGACGTTAATTA 24240 * L I S V V EL E Q * NY * L K GAIL NY F Q C * K * 14K DI TN Y N A Q L * IF N VS SR T PT L Q I I T Q PC 141 24241 CAATGATAATTATTAATATTAGGAAGAAGAACCTTATCTTCCATACCAAAATTATTAAAA 24300 TV ILL * L GEE Q FL L Y P K ILK H * * * Y NY DR K K ST FT H N * Y N N S N III I KR PP ISP IT K I I K
U.S. Patent Jan. 10, 2012 Sheet 103 of 119 US 8,092,994 B2 24301 TTAAACTCGAGAGTATCACAACAAATGAGTGCAATAACAAAAAGACAATTATTATGAAAA 24360 L K L E * L T T * E R * Q K E T L L V K * KS SE Y H Q KS V N NW K Q * Y Y K I Q AR MT N N V * TI T KR N I I 5K 24361 ACAGGAACACGATTTGGAAGAAAACGAAGTTCAACGTTCTCAGTATTTGGTGGAAGACGA 24420 Q G Q AL GE K A EL Q L L * L G GE A K D K H * V K K Q K L N Cs D Y V V K Q T R T SF RE KS * T AL TM F W RE S 24421 AGGACAGGATAACCATQATTAATAGCAAGAACACTCTCATGATGACATGAGCTGGTGTGA 24480 E Q G I P V L * REQ S L V VT SS WV KR D* Q Y* ND N K H S Y* QV R CC GTE NT S I IT R T L T S S YE V VS 24481 CTGACCACATCCACAAGAACAAATGGACTAGGATATTGACGAATACTGGGATCCAGAACA 24540 S Q EL H E Q KG S CI V A * 5 G L D Q Q STY TN K NV Q D * L Q K HG * T K V P T PT R T * R I R Y S S I V R PR T 24541 AGAGTTTTTTTCAGAGACCAACCACAACCACTTGTAACACGTCCCAAGCCACAACTACTT 24600 E * F F DR T PT PS C QA P N PT S S NE F FT E PQ H Q H V N H L TEN Q H ELF L R Q N TNT F M T C PH TN I F 24601 CTTTTCACACCACATAACCTACCTAGTATATTACAAAGAACAGAAACATCATGACTACGG 24660 SF H PT N S PD Y L T E Q R Q L V S A L FT H H I P HI MY H K K D KY Y Q H FL T T Y Q I S * II KR T K FT S I G 24661 AAAGATCCAACCAGAATACTGTGAACGCAGTCATTGTTGGCAACATTATAAAAAAGATTA 24720 KR P Q D * S V Q T L L L R Q LIKE L EEL NT K H C KR * Y CC NY Y K K * K * VP RI VS AD TV VT TIN K RI 24721 AAATAAAATTTACCATAGTTATCACCATGGTGAACAAGATTACTAAATAACGTCGGATTA 24780 K I K P PILL P V V Q EL S K NC CL N * K L H Y * Y NY W K N * H N IA A * K N * IT D I T T G STE I I * Q L R I 24781 TGACTTCAAAAATGACTACAAACACAACTAATGCTGGAAATACCATAATGTCCTGTTCCA 24840 VS T K VS T Q T S * SR * P I V PC P Y Q L K * Q H K H Q KR G K fl 3' * L V L SF N K S I NT N IV V KIT NC S L T icizni
U.S. Patent Jan. 10, 2012 Sheet 104 of 119 US 8,092,994 B2 24841 TAAAAATTTCTTCAAAGACGACAAATAATATTATCAACCGTTTTAGAAAACATACTAAGA 24900 I K L STE AT * * L L Q CF R KY S E Y K * L L K Q Q EN Y Y N A F D K THU N K F FUR S N I I I T P L I K Q I I R 24901 TTACCGTTGTAATAACCAAAATTTCTAAAACAATGATTATTTTGTATATTATAAAAGGGA 24960 L P L NIP K L S K TV L L VT L I K G * H CC * Q N * L N Q * * IF MY Y K G IA V N NT K F I K N S I F CII NE R 24961 ACAATACGTCCTTCTCAAAGACGACGAAAAGTAGTTTTACGAAGGAGAAACCGAAATGAA 25020 Q * A P L T E A A K * * F A HE K A KS K N H L FL K Q Q K ED F H KR K P K V TICS S N R S S KM L I S G R Q 5 * K 25021 ATAGCATTAAATTTTACATCGATACAAAACTTATTATAAAGAAATTGATGAGTCGGTATA 25080 * R L K F H L * T K F L I E K V V * G Y K D Y N L I IS H KS Y Y K K L * E AM IT I * F TA IN Q II N R * S S L WI 25081 AAACTATCAATAGAACCAACGCAAAAATTACGACTATTAAATTGACTAATAAGACAAAGA 25140 K S L * R P Q T K LAS L K VS * EVE N Q Y ND Q N R K * H Q Y N L Q N N Q K KIT I K TANK 1911 * S I I RN R 25141 AGAACACGAGAAGCGTACCCATCACCAAAAACACAACTAATATTGAGTGGAAGAAGAAGG 25200 E Q A HR NP L P K Q VS * LEG E HE K K H HE C PT H UK H Q NY S V K K K R T S K A H T T T K TN III * FR KG 25201 AGAAGCGCAGCATTTGCATCTTCATAAAGACGAAGAATAGCAAAACAATGAAAACTTGGG 25260 E ERR L R L LIE A E * R K TV KS G R K A D Y V Y F Y K Q K K D N Q * K Q V R R T T FT ST N KS RI T K US K F G 25261 AAATTACAGTCAAAACAATTACTGTCATAACTCAGACACCCACCAGAAATACTCTAGTTT 25320 K L T L K T L S L I EDT P P R * S I L R * H * N Q * H C Y Q TQP ND K H S KID T K NI VT N L RE TV K IL OF 25321 TAAGGGTGATTGAAATGATATCAACCAGTTCTCCTTAAATAAGTTTGATTAAGAGGATTT 25380 I G V L K V IT P * ES NI * V LEG L FEW * S * * L Q DL P I * E F * NE N GE V KS INTL L F KU L S I KR F
U.S. Patent Jan. 10, 2012 Sheet 105 of 119 US 8,092,994 B2 25381 CAATGATAACTAACAAGAAATAAACAGACAAGATTAATACGTCGAACGGTACTGAATAAC 25440 T V I S Q E K NT Q E L * A A Q W S K N L * * Q N N K I Q RN * N H L KG H S I MS NIT R * HOT R I I Cs A MV * Q 25441 AGTCTCATACCGTGAAAAACACTATTATAATTATCATAAAATCTACTTCAATTACCAAAT 25500 OS Y P V K Q S L I L L I K S ST L P K T L T SC K K NY Y * Y Y K L H L * H N * L I AS K TI I N IT N * I F N IT 25501 GAACTATGATGAGTTAACGTACATCGACTATGAGAATACGTTCCACAGTGTGAATCGAGG 25560 S S V V * N C TA S V R IC PT VS L E V Q Y * E IA H L Q YE * AL H * V * S HI S S L Q MV S I S K H L T DC K AG 25561 TTAGAATTATGATTAAACGTAAAACTACAACTATTATAATTAAAATTTAGGGATCAACCT 25620 L R L V L K C KS T S L I L K L D R T P W D * Y * N A N Q H Q Y Y * N * I G L Q I K IS I Q M K I N I I NIH F G * N S 25621 ACAAATCCAGGTGTGACGCCAAGAAGAAGAGCAAGAAAAAAACTTCTAAATAACAAACTG 25680 H HP G C Q PEE E RE K KS S K N N S IN L DV SR N K KR NH K Q L NIT Q T * TV VAT R R R T R K K F I * Q K V 25681 TTTCAATTTGAAAGTCTACAACCAAAACAACTTCGAATATTGTTAACATGACCACCATCA 25740 L T L SE ST PH T S A * L L Q VP P L CL * V K L H Q N Q Q L KY C NY Q NY F N F K * I N T K N F S I VI T ST T T 25741 CTTTAATCTCTAGAAGAAACACATGTTAGGAAATTACCATAATTTCAAAACGGAGGATAA 25800 S I L S R R Q T CD K L P I L T KG G I H F * L DR K H VI R * H Y * L K AS * F N S I K K T Y L G K I TN F N Q R. RN 25801 AACAGACTTAGAGTTTAAAGACCAATGTGGTGTCGGCGATGACAACGACGATACAAAGGT 25860 HO SD * IS P * V VA A VT A A IN G K T Q IS P K Q N CV L R * Q Q Q * T E Q R FR L NP. TV G C G SS N S S H K W 25861 GGTACCAGTCGTCGTCGACCGTATGGTAAAAGAGAATTACATGTTATATCTTAATTACCA 25920 G SD A A A PM G N ER L T C Y L I L P V MT L L L Q CV S H E * H VI Y F * H YJ P * CC SAY W KR K IV L I SW IT
U.S. Patent Jan. 10, 2012 Sheet 106 of 119 US 8,092,994 B2 25921 AACCCACAATGATACCTACAAGAATTATTTTTAGTTTTCAACTATCGATGACGAAAATTA 25980 K P TV I ST R Lb F * F N I A V A K L N P H * * P HE * Y F D FT S L * Q K * Q TN S H I N K IF ILL Q Y S S S K I 2S981 TTACGAGAAGAAAGATAAGTCTTACCAAAATCACGATGGTTGAGACGTGAACGATTTTAT 26040 LA R RE I * F P K LA V LEA S A L I Y HE E K * E S H N * H * W S Q V Q * F I S K KR N LIT K T SO V R C K S F Y 26041 GTTTCACAACAATTAAGATTACGAGTTCGTGAATTATCAAACAATGTCGTTAATAAATTA 26100 CL T T L EL A * AS L L K FCC N N L V F H Q * N * H EL V * Y N T VA I I * L TN N IRIS L C KIT Q * Lb * K I 26101 TTTAAACCACGTTAATCAAGAAGAAATGTTCTTTAAAATAGAGCAGAGCTACGAAATCTC 26160 L N PAIL E E K C S I K DR R SAKS VI Q H L * N K K V L F K IS DR H K L F K T C NT HR * L F N * R T E I S * L 26161 CGAGTCCAAGTCTAACTATCCGAATAATTACCAGCAAATTGACGAAATTTACGAATACAG 26220 A * T * IS L S I L PR K VA K F A * T PEP E S Q Y A * * ND N L Q Kb H K H S L N L N I P K N ITT * S S * 121 D 26221 AGAGTTGTCGAATCACTATAAAGAGAACATTTTAAACCACGACGAAATCGATACCTCTTC 26280 E * CS L St ER T F N PA A K A ISP REV A * H Y K E Q L I Q H Q Kb * P S R L L K TIN R KY F K T S S * S H Lb 26281 CAATTACTCACACAATTTTCAGTTAGAGGAGCATAATTAAAAACACCATTACCATTAGTA 26340 T L S H T L L * 130 RI L K Q PS PS P * H T H * F DIE E Y * N K H Y H Y D NIL TN F T L HR TN I K T TI TIM 26341 TAAAACAGTAATCAAGTTTTACGAGGAATACCAAACAACAAATACGTAAAATCAATATTT 26400 11< D NT * FAG * P K N N IC K L * L Y K TEL E FEE K H NT T * AN * NY N Q * * N L I SRI T Q Q K H M K TI F 26401 GGATAAAGAAAATTTTGACAAAATCATTCAGGACCAAACACATATAGTCCACTACATCCA 26460 G 1 EEL VT K T L OP K H ID PS VP V * K K * F Q K Lb D Q NT Y IL H EL RN R K F S N * VT R T Q TV * TI VT
U.S. Patent Jan. 10, 2012 Sheet 107 of 119 US 8,092,994 B2 26461 TAACGTGGATTTGTTCCCATAAAATAATTTGTATTACTAGTAACCTACAAGTGACCATCA 26520 I AG L C PIN IL CL $ * Q I N V P L Y Q V * V L TN * * V Y H El N S T * Q Y N C R FL P I K N FM I I M P HE ST T 26521 AGAATGATAATAGGACTTGGTTAAAGTCTATTTTTACAACAAAAATACTTATGAACAAGA 26580 E * * * G S G I ES L FT T K I F V Q E N K S N D Q V L K L Y F H Q K * S Y K N R VII R F W N * IF INN K H I S T R 26581 CAATTAAAATGATTTCGCGGAGAACAAATAAACTTAGTAAGACATGGTTTTAACAGACTA 26640 T L K V LAG FT * K F * E T G F ND S Q * N * * LA E Q K N SD N Q V LIT Q N I K S FR R K NI Q I MR Y W F Q RI 26641 AAACTTAGACTCAATAGACTAACCAATTTTTACTTTGTAGGTAACGCGGATTAAACTGA 26700 K SD S ND * Q N L. F * V D M A CL K V N Q I Q T I E NT * F D FM W Q A * N S K FR L * R HP K F IL CC N FRI Q S 26701 AATTTAGAAGTATGATAATTACGATGAAAAAATCTAAACATAATACTCTACTTAGAATAA 26760 K F R * V IL A V K K S K Y * S I FRI K L D E Y * * H * K K L N TN H S SD * * I K M S N ISSK * I Q I I L H I K N 26761 GTTCTCAGATAATTCAGAAACTTATTATCAATATAGTTAGAATTTCTATATCCATGTATA 26820 * SD IL D K FL L * IL FL SIP V Y EL T * * T KS Y Y NY * D * LILY M L L RN L R Q II TI D I K Fl VT CI 26821 CTTTACATACATTTTACCGGAACCATACAAACCGATGATTAAAGAAAAAGTAAATATTAT 26880 S I Y T F H G Q Y T Q 581 EKE N i l H FT H L I A K T H K A V L K K KM * L F H I Y F P R PIN P * * N R K * KY y 26881 AAGGAACATARCGAGAAAAAATATACAACAACATGACCAACACCAAGACGTACAAAATCA 26940 N R TN SK K I H Q Q VP Q PEA H K L IS Q IA R K * INN Y Q NH N Q MN * E K Y Q EN K VT T T ST T T R CT K T 26941 TTTACAGTATTAACAACACTACTCATACCACCAGTAGTACTAAAACAATAGTTTTGTAGA 27000 L H * L Q Q S S Y P P * * S K TI L V D Y ID Y N N H H PH H D D H N Q * * F M F TM ITT IL ITT MM I K ND F C R
U.S. Patent Jan. 10, 2012 Sheet 108 of 119 US 8,092,994 B2 27001 GTACTACTAATCTTAGAGAACAGTCTAGAGTAATTTAGATTTGAAATAAATACCTGCAAA 27060 * S S * FR K D SR M L DL S * K H V N E H H N SORT ID * * I * V K N I S T M I I L I E Q * I EN FR F K I * PR K 27061 CCTCTGGATCGATGTGTGTAAGAGAACAATAATCTCTTAAACCACAATGTTTGGAACTTC 27120 PS R A V CM R K N N S F K T N CV K F Q L G L * V C ER TI L S N PT V FR S S V * S C V N E Q * * L I Q H * L G Q L 27121 TAAACACAGATTTCATATTAATGACAGTTGGATAACAACCAATGACATAACATGGAAATT 27180 I Q T * L I I V T L RUNT VT N Y R S K HR F Y L * Q * G IT P * Q IT C K NT DL T Y N SD V * Q Q N S Y Q V K L 27181 TACAAACCACAGCGTTCAAACCGTTTAAACGAAGAGTGAAATGTAATGCATCACTGCTAT 27240 IN PTA L K A F K SR V K C * T TV I FT Q HR L NFL NAT * K V N R L * S H K T DC T Q CI Q K ES * MV Y DRY 27241 AAAGGGTATCATTATTAAAACCACAACATTGATCAAAATGATGAATACCATTATGACAAA TGATGAATACCATTATGACAAA 27300 N GM TI I K TN Y ST KS S IT IS N I E W ILL K PT TV L K V V * P L VT KG Y Y Y N Q H Q L * N * * K H Y Y OK 27301 GACTCCGACACAGATCTAATCAACTTAGTCGAAGACTTAAATAACAAACCCCACGTCTCC 27360 R IS HR S * N F * SR F K N N PT CL ES AT DINT SD A ES NIT Q R AS Q P Q T * II Q ILK Q I * Q K A H L P 27361 GTGAATTATTCATACCAACTAAATAAAAAGTTACTATGACGAACCATGTATCCTGTCTAA 27420 C K IL ITS K UK L S VA Q Y M PCI A S L L Y P Q N 11< * H Y Q K T CLV S V * IT H NI * KS 1155 P V VS IN 27421 AATCAAAATCAAAATAAAACAGAATAAAGAAATTAGAAACAACAACGAAAAAATCGTTGA 27480 K T K T K N Q RISE I K T TA K K A V K L K L K I K D * K K L R Q Q Q K K IL * N * N * K T K NE * D K N N S K * CS 27481 TAATTCGAAACATACGTTGAAACACCAAAAACATTAAACAAATAATAAAGTGGAAGCCGA 27540 II S Q IC S Q P K Q L K K II E GE A * * A KY A V K H UK Y N R * * K V K P NI K T H 114 T T K TIE K UN * KR S
U.S. Patent Jan. 10, 2012 Sheet 109 of 119 US 8,092,994 B2 27541 ATGCAAATATTTTCTCCATACGTCAACATATTCAGAATATCACTTGTTCAATATGGTGGG 27600 * T * L L P I C NY L D * L S C TI G G KR K Y FLY AT T Y T KY H V L * VT V N I F ST H L Q IL KIT FL NY HG 27601 TGAAGTCTAATAAATTAGATTTAGATTTGTAATACTTATTTAGAAAAGAAGGAGTTAAAT 27660 VHS * K I * I * V N H IF R K KR L K W K L N N L R FR FM I F L D KR G * N S * I I * DL DL C * S Y IKE E HI 27661 GAAGACTAGTTCGACAATGTAAGAATTTTCTTACCTTAAAGAGAAACCCACATTATGATG 27720 SRI L S NCR * F F P I ER Q T Y Y VHS * AT V N K F S H F K E K PT I S K Q DL Q * MR L L I S N R K P H L V V 27721 AAAAATAATGATAGTATAACGTCAAGCCAATATGCTCGGCATCATACAAACAAATAGAAT 27780 K K N S D Y Q LET I RAT T H K NI K S K IVI MN C NP * V L R LINT * R K * * * * IA T RN IS G Y Y T Q K D 27781 AGTTCTACTAATAAGAAACCGAATACACCGGTAACTGATAGTGGAACTGATATAAATTAA 27840 DL H N N K P K HP W Q SD G Q S Y K I IL I I I R Q S I HG N V I V K V IN L * S S * E K A * TAM S * * R S * I * N 27841 CAAAAATACGAAACTTATTACGAAAAGAACGTAAAAGATATCACAAATGATAATAAAGAT 27900 T K IS Q I I S K K C K R Y H K S N N R Q K * A K F LA K RAN HIT N V I I E N K H K S Y H K E Q M K * L T * * * K 27901 AACAATATACCTAAGAAATAAAACAATTATCATAAGCCGAAAAATAATCTTGACCGTCAA 27960 N NY P N K I K NIT NP K K N S S AT I TI HI R * K T L L I R SKI L V P L Q * I S E K N Q * Y YEA K * * F Q C N 27961 CCACCTCAAAATTAGGTCTCTGGTTATTAGAATACACATAACTATACTTTCCGTTCTACA 28020 PP T K I W L G I I K H TN I H F AL H Q H L K L G S V L L RI HIS I F P L I T S N * DL SHY D * T Y Q VS L CS T 28021 AACAATCCGGTCAATAACTCCTGATAGTGTGTAATTGACGATGACAATAAGCACCAGTAG 28080 K N P W N N LVI V C * S S S N NT TM NT L G TI 59 * * V N VA V TI HP * Q * AL * Q P 5 DC ML Q * Q * E H D D I LE1)I 1J
U.S. Patent Jan. 10, 2012 Sheet 110 of 119 US 8,092,994 B2 28081 AA ATATATGTCCCACAGTTTGAACCGTGACCAATATGAGAAAGTCTAAACGGGCATATAC 28140 K I Y LTD F K A S TI S K * I Q G y I R * IC PT L S P VP * V RE S K G T Y KY VP H * V Q C Q N YE K L N A RI H 28141 AATGACATCGATTCCACGTTCATGAAACATGGATATTTGCACGGAAAAATCTATTCAATC 28200 N S Y S L DLV K T GIFT G K * I L TV T ALT CT S Q V * LB A K KS L N * Q L * P A L V K Y BY V H R K L VT L 28201 TACAATTATCACCAAAACGACAAAAACAATTCAGATTTCAACCATTGATAGCAAATGGCA 28260 IN ITT K S N K N LB F NT V IT * R ST L L P K AT K T L DL T FL * R KG H * Y H N Q Q K Q * T * L Q Y SD N VT 28261 GATCATTTGGATCACCATACCTATGACGGAACAATTCTCGAATTTAGATTTGATAATCCT 28320 R T FR T T H IS G Q * S S L DL S N P DL L G L P I S VA K N LA * I * V IL * Y V * H VP Y Q R T L L K F R F * * S 28321 ACAGAATATGAGGGCCAGTAATACGACCTTCATCTTCGAGGAGACCTTTAGCAAGTCCTT 28380 H R IS G T MIS ST SAG R SIT * S ID * V G P * * A FL L LEE P F R IF T K YE RD N H Q F Y F SR Q F D N L F 28381 AGGAGTTCTTTTGAAGAACCCGACTGGTTAGACTCGCTTTAATGGTTTGGAAATTATCTC 28440 DEL F S R P S V LB L S I V L G K IS I R L F V E Q AS W D SR F * WV K L L G * SF K K P Q G I Q A F N G F R * Y L 28441 CGTCTTTTTGGGTTGGATTTAAGTGACACAGATGAGTTGGTGTTCCTTTATGATAGGGTG 28500 A S FG LB F ES HR S LW L S I SD W P L F V W G L N VT DV * G C P F VI G CF F G V * I * Q T * IV V L F Y * G V 28501 TAATAAGGACCAAGAGGCCCTAGTGAGTTAAAGTTTTTCCATCTCTGAAATTTAAAAGTC 28560 MI G PEG PD S L K L FT S VHF K C * E Q NE P IV * N * F P L S K L NE N N R TB R S * El IF L Y L S * I K L 28561 TACCAGTTCCTCAAGGGTAACGAAAGCCTCATGGGGGAAGACTTCGTTTTCCTATAACCA 28620 IT L S N G N SES Y GB R F CF SIP SF * FT G MA HP T G GE S A F P Y Q H DL L E W Q KR L V G K Q L L LINT
U.S. Patent Jan. 10, 2012 Sheet 111 of 119 US 8,092,994 B2 28621 TATCTGTGTCGGCCGCAAGAAAATTTTGTCGACTACCAGTTGTTTTCGTCAACAATGGCT 28680 IS V APT R K F C SIT L L L L Q * R Y L CL R RE K L V AS P * CF C N N G Y V C CAN K * FL Q SD V FAT TV S 28681 CTACCATAAAGATGATAGAGCCATGGCCGGGTATACGGTTACGTAGGATACCACTTAGGG 28740 SPIN VI ST GA WIG IC 0 1 T F G L MY K * * R P VP G Y A LAD * PS D IT ER SD R Y R GM NW H MR H H I G 28741 AGCTTCCCCAGAAGACCCAACGATTAGTGGTTCGACTGTGAAGATGAGGGAGGCTACAAA 28800 E F PD NP N S I V L S V SR S G GIN R S PT K Q TA L * WAS VS VS ES T AL PR R P Q * D G L Q C K * ER R H K 28801 GCAGTTCCCTAGGATGATGAGTTCTTCGATAGGGATGATCCAAAGGCGGACCATGCTAAA 28860 R * P I R S S L F SD R S P KR R T RN ED L SC V V * S Al G V LEG SF VI T L PD * * EL L * G * * TEA Q VS K 28861 ACGGAGTTCCGATAATACAACTTCCGAGTCCTTCCAGACGAAGATTATCAGCTGGTCCAA 28920 Q R L A ll N F A * 595 SRI T SW T KG * P * * TSP E PLO A EL L R CF A EL S N SQL S L FT Q K * Y DV L N 28921 GTGCAAGAGTTAGTGCACCTGGGTTATTAGCAAGTAATTCATCTTCATTAAGATTAAAAT 28980 * T St * PEG I I T * * T ST I RI K ER E * DR PG L L REEL L L L E L K V N NI V H V WY ON ML Y F Y N * N 28981 CTGTAAGTCTAAGATATCATTTTGGACTATACCGACTACTCTAGCGATTAGAACAAAATC 29040 S M * I R Y Y FRI H S I L D S I K N L CE S SI T F G S IA S S IA L R T K V N L N * L L V Q VP Q H SR * D Q K L 29041 GGTTCGAACCATTTCTAAGATTTGGAGTCGTTCAGTGATTCGTTTTACGGTTCCTTTAGT 29100 G L K T FIR FR L L DEL L I G L F D AL SF LEE L G * CT V L CF AL S I WA Q Y L N * VEAL * * A F NW P F 29101 CCGTATTTTAAAATTGTTTTGGAGCGGTTTTCGC'PTGAGGATTATTTGTAACATTACAAG 29160 PM F N * C FR ALL S SRI F MT IN L CL I K V F G R W FR V G L L C Q L T A Y F K L L V NSF A F E * Y V NYSE
U.S. Patent Jan. 10, 2012 Sheet 112 of 119 US 8,092,994 B2 29161 TTGTCACAAAACCATTTTCTCCTGGAAGAGTTTTAAAACCATTACGACTTTACAATTTCG 29220 L L T K T F S SR ELI K TI S F H * L * C H K P L L PG E * F K P L A SIN F VT N Q Y FL V K E F N Q Y H Q FT LA 29221 AACCATGATTACTAGGAGTCAAAGGATAAGAACGTCTTAATCGAGGATGTGGTCCACGAA 29280 K TSI I R L K RN K CF * SR C VT S S P V LEG * NC IRA SNAG V G P A Q Y * ED E T E * E Q L I L E * V L H K 29281 AAAAGAAACCAAGATTTAATCTGAACCAATTTTCTCTAAGGCTCCGACTGAGTGGACAAT 29340 K EXT R F * V Q N F S I CL S V * RN K K K P EL N SET L L SE GA SECT KR Q N * IL S P * FL NE P Q S V Q 29341 TTCTACAAAAACTTGAAGTAATAAGACCAAGATAATCCAAACTATCATGAAATGGTCCGA 29400 FINK F K M I R T R NP KITS * WA LET K S $ * * E P El L N S L V KG P L H K Q V EN N Q N * * T Q Y Y K V L S 29401 AACTCTGTTAATACTTTCAAGAACTTCTCTTAAATTTACGAATGCAATTAAGATTAGTCT 29460 K L C N H F N K F L I * ISV N I R IL KS VII F TESS F K F A * T L EL * Q S L * $ L E Q L S N L H K R * N * D S 29461 TGTGACTAAGACTAAGCAACTCAAGATTTGGAGTCGCATTTTCTCCACAATTTGTTAATG 29520 VS I R I R Q T R FELT PET NFL F VS ES EN L EL G * EL L PT L C N C Q N Q NT S N * V E A Y FL H * VI V 29521 GTCTTGTCAAACTGAGAGAATTAAATTCACGACCATGAGTCGTGTAAAGTTTACTAAAAT 29580 W FL K V R K I * T STE L V N * I I K G SC N SEE L K LA P V * CHEFS K L VT Q S E * N L H Q YEA C K LEN 29581 GAGGACTCCTAGTATCAAATGAACGATGAGAACTACTAGGAATACATCTTCTGAGACAAC 29640 EEL IN T * K SEE II RI Y F V RN V CS S * LEE AVE E G G * T SEE T E Q PD Y NV Q * E Q H D K H L L S Q Q 29641 GAATTACTCTTACTTAGGATTAAGCTGTGATCCACCATTGGGGAGCGATAATAAGCCTTA 29700 EL SFS D * ES V L H Y GE A ll R F A * ES H I R I R C * T TV GE * * E S K IL IF CL EVE P PLC ES N N PT
U.S. Patent Jan. 10, 2012 Sheet 113 of 119 US 8,092,994 B2 29701 TCCTGTGAGAGATAGTCTTACTTAAGAACGACATTATTGTCTATCTCATCCAACAATGTC 29760 L VS E I L I F S Q Q L L L Y L L N N C Y S V R * * F S N K S Y Y C I S Y T TV P CE RD S H IRA T I V S L T P Q * L 29761 TGATATATAATTAATCATCTTTAAAATATAAATCTGTAAACTAACAATCTCATCAATATT 29820 V I Y * N T S I K Y K S M Q N N S Y N Y S * I NIL L F K I N L C KIT L T TI S Y IL * Y F N * I * V N S Q * L L * L 29821 CCAAATCGACATCATATTTGCGGAGGCCCTTCTCGATAGTTAACATCACAAATTATATAT 29880 P K AT T IV G G P LA IL Q L T * Y I L N L Q L IF A E P FL * * NY H K IV T * S Y Y L R R R S S.S D I T T N L I Y 29881 ATAATCATATACTAACTTTAATTAATATCGGAAAACCTCCTTAATGTTTTTTTTTTTTTT 29940 Y * Y I I S I L * L R K S S N C F F F F IN T Y S Q F * NY G K P P IV F F F F IL. I H N F N I I A K Q L F * L F F F F 29941 TT 29942 F F
I I Frame shift 1 2 3 4 5K 6 7 8 9 10K 11 12 13 14 15K 16 17 18 19 20K 21 22 29.942 11,977 12,835 13,576 871 2,632 8,7119 10,287 11,116 122531316 16,358 18,144 19,730 20,852 21,753 13.600K p28 41 p65 p210 Peptide 3CLPR0 p10 p12 HD2 Uo1 p23 GF Replioase Helicase Un2 035 Uo 3 2,633 AIR PLP1 PLP2 Oys 161 H 1312 VI\i IV (206-13,600) Amino acid sequence of AIR: {NDDEDVVTGD}14 NNDEEIVTGD NDDQIUUTGD ORF 1b (13,600.21,753) 21.754 22 23 C SC Hemagglulinin esterase (21,773-22,933) 24 25K 26 27 28 29K I TI ORF9 Spike glycoprotein ORF4 Small envelope (28,342-28,959) (22,942-27,012) (26,960-27,070) (27,373-27,621) FIG. 4 ORF5 Membrane glycoprofein Nusleocapsid protein (27,051-27,380) (27,633-28,304) (28,320.29,645)
U.S. Patent Jan. 10, 2012 Sheet 115 of 119 US 8,092,994 B2 r 3CLPRO!9E G1 HCoV-N163 763 PEDV Ronlicaca 1000 0.1 SNRS-CoV CoV.HK01 G2 Heliease I HCoV-OC43 ~,, PTGV G1 HCoV-229E G3 IBV 1000 977 PEDV ECoV MHV )AV 1000 CoV HKD1 623 BCoV G2 1000 HCoV OC43 FIG. 5A
U.S. Patent Jan. 10, 2012 Sheet 116 of 119 US 8,092,994 B2 BCoV HCoV-OC43 MHJ -SDRV G2-229E G1 C_ l60c PTGV FIPV No N HCoV-229E IBV HCoV-NL63 L1 ri / 000 \ Cl ~IPUCCcV PTGV PEDV/
U.S. Patent Jan. 10, 2012 Sheet 117 of 119 US 8,092,994 B2 Si SS CIeava e Site + HR1 HR2 TM 330 lob r CoV-EKU1 DYNSPSSSSS FR R K R HCoV-004-" DY-----SKN,R R SR KEY DY-----SKS R,R A D SDAV NY-----SIA.;R AR BCoV DY-----STK;R R S R PREY DY-----VTA' [ S R ECoV DY-----STA; R 0 R S2 COMM 4CoV-0043 MRV SJAV BCOV PEEV C0V CoV-NKU1 RCoV-0C43 MHV SDAV BCoV PHEV ECoV COMM FCoV-OC43 MHV SDAV BCOV PREV ECOV m CoV-HKUI K- HCoV-0C43 N- SDAV NT BCoV N- PREV N- ECoV G- CoV-HKU1 HCoV-0C43 MHV SDAV BCoV PREY ECoV CoV-HK01 ECoV-0C43 MFV SDAV BCoV PHEV ECcV G I SYTIVVQPNI----------- GS!TIF101 SY GSILGX SY----------- GS SY TILG_-- I1KL G T V 3Y SLIN----GNL T!, SY_ TVVN----SKT v Yc H FIG. 6 YERFY MAS\ CQYTI MASICQYTI IS' OQYTM Ql3~ CQYTN EITI-i Sal t + KG[TFYAvY E T YA 4 TI'YAYY 57 60 ti~ Gc 59 53 Sg 117 120 119 119 118 118 118 165 180 167 167 174 174 174 223 239 227 227 233 233 233 233 297 285 285 287 267 287 330 344 332 332 334 334 334
CoV HKUI in NPA (copies per ml) Serum IgG 1:8000 antibody titre 1:7000 1:6000 1:5000 1:4000 1:3000 1:2000 1:1000 0 7 14 21 28 35 Days from onset of illness FIG. 7
U.S. Patent Jan. 10, 2012 Sheet 119 of 119 US 8,092,994 B2 1 2 3 4 5
1 HUMAN VIRUS CAUSING RESPIRATORY TRACT INFECTION AND USES THEREOF CROSS-REFERENCE TO RELATED APPLICATION This application is a continuation of U.S. application Ser. No. 10/895,064, filed Jul. 21, 2004, now U.S. Pat. No. 7,553, 944, which is hereby incorporated by reference in its entirety. SEQUENCE LISTING The Sequence listing for this application is labeled "seqlist.txt", which was created on Jul. 21, 2004, and is 1,548 KB. The entire contents is incorporated herein by reference in its entirety. 1. INTRODUCTION US 8,092,994 B2 viral cultures were negative. Sputum for bacterial culture only recovered P. aeruginosa. Sputum for mycobacterial culture was negative. Blood culture was negative. Paired sera for antibodies against Mycoplasma, Chlamydia, Legionella, and 5 SARS-CoV did not show any rise in antibody titres. His fever subsided two days after admission. His cough improved and he was discharged after five days of hospitalization. Amoxicillin/clavulanate and azithromycin were continued for a total of seven days. The present inventors were the group 10 involved in the investigation of this patient. All tests for identifying commonly recognized viruses and bacteria were negative in these patients. The etiologic agent responsible for this disease was not known until the complete genome of CoV- HKU1 from this patient by the present inventors as disclosed 15 herein. Namely, the present invention discloses a novel human virus that has been identified from a patient suffering from pneumonia. The invention is useful in both clinical and scientific research applications. The present invention relates to a novel virus causing res- 20 3. SUMMARY OF INVENTION piratory tract infection in humans ["coronavirus-hku1 (CoV-HKU1)"]. The CoV-HKU1 is identified to be phyloge- The present invention is based upon the inventor's comnetically similar to known members of Coronaviridae. The plete genome sequencing of a novel virus ("CoV-HKU1") present invention relates to a nucleotide sequence comprising causing pneumonia in humans. The virus was discovered the complete genomic sequence or the CoVHKU1. The 25 from a patient suffering from pneumonia in Hong Kong. The invention further relates to nucleotide sequences comprising virus is a single-stranded RNA virus of positive polarity a portion of the genomic sequence of the CoV-HKU1. The which belongs to the order, Nidovirales, of the family, Coroinvention also relates to the deduced amino acid sequences of naviridae. Accordingly, the invention relates to CoV-HKU1 the complete genome of the CoV-HKU1. The invention fur- that phylogenetically relates to known members of Corother relates to the nucleic acids and peptides encoded by 3o naviridae. In a specific embodiment, the invention provides and/or derived from these sequences and their use in diagnos- complete genomic sequence of CoV-HKU1. In a preferred tic methods and therapeutic methods, such as for immuno- embodiment, the virus comprises a nucleotide sequence of gens. The invention further encompasses chimeric or recom- SEQ ID NO: 1 and/or 3. In another specific embodiment, the binant viruses encoded by said nucleotide sequences and invention provides nucleic acids isolated from the virus. The antibodies directed against polypeptides encoded by the 35 virus preferably comprises a nucleotide sequence of SEQ ID nucleotide sequence. Furthermore, the invention relates to NO: 1 and/or 3 in its genome. In a specific embodiment, the vaccine preparations comprising the CoV-HKU1 recombi- present invention provides isolated nucleic acid molecules nant and chimeric forms of said virus as well as protein comprising or, alternatively, consisting of the nucleotide extracts and subunits of said virus. sequence of SEQ ID NO: 1, a complement thereof or a portion 40 thereof, preferably at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 2. BACKGROUND OF THE INVENTION 100, 150, 200, 300, 350, or more contiguous nucleotides of the nucleotide sequence of SEQ ID NO: 1, or a complement In January, 2004, a 71-year-old Chinese man was admitted thereof. In another specific embodiment, the present invento hospital because of fever and chills for two days associated tion provides isolated nucleic acid molecules comprising or, with sore throat, rhinorrhoea, productive cough with purulent 45 alternatively, consisting of the nucleotide sequence of SEQ sputum, headache and nausea. He had history of pulmonary ID NO:3, a complement thereof or a portion thereof, prefertuberculosis more than 40 years ago complicated by cicatri- ably at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 100, 150, 200, zation of right upper lobe and bronchiectasis with chronic 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, Pseudomonas aeruginosa colonization of airways. He was a 900, 950, 1,000, 1,050, 1,100, 1,150, 1,200, 2,000, 3,000, chronic smoker and also had chronic obstructive airway dis- 50 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 11,000, ease, hyperlipidemia, and asymptomatic abdominal aortic 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, aneurysm. He had justreturned from Shenzhen of China three 19,000, 20,000, 21,000, 22,000, 23,000, 24,000, 25,000, days before admission. During his three-day trip to Shenzhen, 26,000, 27,000, 28,000, 29,000 or more contiguous nuclehe had no history of contact with or consumption of wild otides of the nucleotide sequence of SEQ ID NO:3, or a animals. On admission, his oral temperature was 37.6 C. 55 complement thereof. Furthermore, in another specific Physical examination showed tracheal deviation to the right embodiment, the invention provides isolated nucleic acid and inspiratory crackles over the anterior left lower zone. His molecules which hybridize under stringent conditions, as haemoglobin level was 14.7 g/dl, total white cell count 12.1 x defined herein, to a nucleic acid molecule having the 109/L, withneutrophil 9.7x109/L, lymphocyte 1.6x109/L and sequence of SEQ ID NO: 1 or 3, or a complement thereof. In monocyte 0.5x109/L, and plate count 303x109/L. His liver 60 preferred embodiments, such nucleic acid molecules encode and renal function tests were within normal limits. Chest amino acid sequences that have biological activities exhibited radiograph showed right upper lobe collapse and new patchy by the polypeptides encoded by the nucleotide sequence of infiltrates over the left lower zone. Blood culture was per- SEQ ID NO: 1 or 3. In another specific embodiment, the formed. Empirical oral amoxicillin/clavulanate and azithro- invention provides isolated polypeptides or proteins that are mycin were commenced. Nasopharyngeal aspirates for direct 65 encoded by a nucleic acid molecule comprising or, alternaantigen detection for respiratory viruses, RT-PCR for influ- tively consisting of a nucleotide sequence that is at least 5, 10, enza A virus, human metapneumovirus and SARS-CoV, and 15, 20, 25, 30, 35, 40, 45, 100, 150, 200, 300, 350, or more
3 US 8,092,994 B2 contiguous nucleotides of the nucleotide sequence of SEQ ID sequence of SEQ ID NO: 1 or 3 or a fragment thereof, includ- NO:1, or a complement thereof. In yet another specific ing the polypeptide having the amino acid sequence of SEQ embodiment, the invention provides isolated polypeptides or ID NO:2 or SEQ ID NOS:34-291 8 shown in FIGS. 2 and 3, or proteins that are encoded by a nucleic acid molecule com- encoded by a nucleic acid comprising a nucleotide sequence prising or, alternatively consisting of a nucleotide sequence 5 that hybridizes under stringent conditions to the nucleotide that is at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 100, 150, 200, sequence of SEQ ID NO:1 or 3 and/or any CoV-HKU1 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, epitope, having one or more biological activities of a polypep- 900, 950, 1,000, 1,050, 1,100, 1,150, 1,200, 2,000, 3,000, tide of the invention. The invention further provides antibod- 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 11,000, ies that specifically bind cells or tissues that are infected by 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, io CoV-HKU1. Such antibodies include, but are not limited to 19,000, 20,000, 21,000, 22,000, 23,000, 24,000, 25,000, polyclonal, monoclonal, bi-specific, multi-specific, human, 26,000, 27,000, 28,000, 29,000 or more contiguous nucle- humanized, chimeric antibodies, single chain antibodies, Fab otides of the nucleotide sequence of SEQ ID NO:3, or a fragments, F(ab')2 fragments, disulfide-linked Fvs, intrabodcomplement thereof. The polypeptides or proteins include ies and fragments containing either a VL or VH domain or those having the amino acid sequences of SEQ ID NO:2 and 15 even a complementary determining region (CDR) that spe- SEQ ID NOS:34-2918 shown in FIGS. 2 and 3, respectively. cifically binds to a polypeptide of the invention. The invention further provides proteins or polypeptides that In one embodiment, the invention provides methods for are isolated from the CoV-HKU1, including viral proteins detecting the presence, activity or expression of the CoVisolated from cells infected with the virus but not present in HKU1 of the invention in a biological material, such as cells, comparable uninfected cells. The polypeptides or the proteins 20 blood, saliva, urine, and so forth. The increased or decreased of the present invention preferably have a biological activity activity or expression of the CoV-HKU1 in a sample relative of the protein (including antigenicity and/or immunogenic- to a control sample can be determined by contacting the ity) encoded by the nucleotide sequence that is at least 5, 10, biological material with an agent which can detect directly or 15, 20, 25, 30, 35, 40, 45, 100, 150, 200, 300, 350, or more indirectly the presence, activity or expression of the CoVcontiguous nucleotides of the nucleotide sequence of SEQ ID 25 HKU1. In a specific embodiment, the detecting agents are the NO: 1. In other embodiments, the polypeptides or the proteins antibodies or nucleic acid molecules of the present invention. of the present invention have a biological activity of the Antibodies of the invention may also be used to detect and/or protein (including antigenicity and/or immunogenicity) treat other coronaviruses, such as Severe Acute Respiratory encoded by a nucleotide sequence that is at least 5, 10, 15, 20, Syndrome ("SARS") viruses. 25,30,35,40,45, 100, 150, 200, 300, 350, 400, 450, 500, 550, 30 In another embodiment, the invention provides vaccine 600, 650, 700, 750, 800, 850, 900, 950, 1,000, 1,050, 1,100, preparations, comprising the CoV-HKU1 recombinant and 1,150, 1,200, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, chimeric forms of said virus, or protein subunits of the virus. 9,000, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, In a specific embodiment, the present invention provides 16,000, 17,000, 18,000, 19,000, 20,000, 21,000, 22,000, methods of preparing recombinant or chimeric forms of CoV- 23,000, 24,000, 25,000, 26,000, 27,000, 28,000, 29,000 or 35 HKU1. In another specific invention, the vaccine preparamore contiguous nucleotides of the nucleotide sequence of tions of the present invention comprise a nucleic acid or SEQ ID NO:3, or a complement thereof. fragment of the CoV-HKU1, or nucleic acid molecules hav- In one aspect, the invention relates to the use of CoV- ing the sequence of SEQ ID NO: 1 or 3, or a fragment thereof. HKU1 for diagnostic methods. In a specific embodiment, the In another embodiment, the invention provides vaccine invention provides a method of detecting in a biological 40 preparations comprising one or more polypeptides isolated sample an antibody that immunospecifically binds to the from or produced from nucleic acid of CoV-HKU1. In a CoV-HKU1, or any proteins or polypeptides thereof. In specific embodiment, the vaccine preparations comprise a another specific embodiment, the invention provides a polypeptide of the invention encoded by the nucleotide method of detecting in a biological sample an antibody that sequence of SEQ ID NO: 1 or 3, or a fragment thereof, includimmunospecifically binds to the CoV-HKU1 -infected cells. 45 ing the polypeptides having the amino acid sequences of SEQ In yet another specific embodiment, the invention provides a ID NO:2 or SEQ ID NOS:34-2918 shown in FIGS. 2 and 3, method of screening for an antibody that immuno specifically respectively. Furthermore, the present invention provides binds and neutralizes CoV-HKU1. Such an antibody is useful methods for treating, ameliorating, managing or preventing for a passive immunization or immunotherapy of a subject respiratory tract infections caused by CoV-HKU1 by admininfected with CoV-HKU1. 50 istering to a subject in need thereof the anti-viral agents of the The invention further relates to the use of the sequence present invention, alone or in combination with various antiinformation of the isolated virus for diagnostic methods. In a viral agents as well as adjuvants, and/or other pharmaceutispecific embodiment, the invention provides nucleic acid cally acceptable excipients. molecules which are suitable for use as primers consisting of In another aspect, the present invention provides methods or comprising the nucleotide sequence of SEQ ID NO: 1 or 3, 55 for preventing or inhibiting, under a physiological condition, a complement thereof, or at least a portion of the nucleotide binding to a host cell, or infection of a host cell, or replication sequence thereof. In another specific embodiment, the inven- in a host cell, of CoV-HKU1 or a virus comprising a nucleic tion provides nucleic acid molecules which are suitable for acid molecule comprising the nucleotide sequence of SEQ ID hybridization to CoV-HKU1 nucleic acid, including, but not NO: 1 or 3 or a complement thereof, by administering to the limited to, as PCR primers, Reverse Transcriptase primers, 6o host cell the anti-viral agents of the present invention, alone or probes for Southern or Northern analysis or other nucleic acid in combination with other anti-viral agents. In a specific hybridization analysis for the detection of CoV-HKU1 embodiment, the anti-viral agent of the invention includes the nucleic acids, e.g., consisting of or comprising the nucleotide immunogenic preparations of the invention or an antibody sequence of SEQ ID NO:1 or 3, a complement thereof, or a that immunospecifically binds CoV-HKU1 or any CoVportion thereof. 65 HKU1 epitope and/or neutralizes CoV-HKU1. In another The invention further provides antibodies that specifically specific embodiment, the anti-viral agent is a polypeptide or bind a polypeptide of the invention encoded by the nucleotide protein of the present invention or a nucleic acid molecule of
5 the invention. In a specific embodiment, the host cell is a mammalian cell, including a cell of human, primates, cows, horses, sheep, pigs, fowl (e.g., chickens), goats, cats, dogs, hamsters, mice and rats. Preferably a host cell is a primate cell, and most preferably a human cell. Furthermore, the present invention provides pharmaceutical compositions comprising anti-viral agents of the present invention and a pharmaceutically acceptable carrier. The invention also provides kits containing a pharmaceutical composition of the present invention. 3.1 Definitions US 8,092,994 B2 The term "an antibody or an antibody fragment that immunospecifically binds a polypeptide of the invention" as used herein refers to an antibody or a fragment thereof that immunospecifically binds to the polypeptide encoded by the nucleotide sequence of SEQ ID NO: 1 or 3, or a fragment thereof, and does not non-specifically bind to other polypeptides. An antibody or a fragment thereof that immunospecifically binds to the polypeptide of the invention may cross-react with other antigens. Preferably, an antibody or a fragment thereof that immuno specifically binds to a polypeptide of the invention does not cross-react with other antigens. An antibody or a fragment thereof that immunospecifically binds to the polypeptide of the invention, can be identified by, for example, immunoassays or other techniques known to those skilled in the art. An "isolated" or "purified" peptide or protein is substantially free of cellular material or other contaminating proteins from the cell or tissue source from which the protein is derived, or substantially free of chemical precursors or other chemicals when chemically synthesized. The language "substantially free of cellular material" includes preparations of a polypeptide/protein in which the polypeptide/protein is separated from cellular components of the cells from which it is isolated or recombinantly produced. Thus, a polypeptide/ protein that is substantially free of cellular material includes preparations of the polypeptide/protein having less than about 30%, 20%, 10%, 5%, 2.5%, or 1%, (by dry weight) of contaminating protein. When the polypeptide/protein is recombinantly produced, it is also preferably substantially free of culture medium, i.e., culture medium represents less than about 20%, 10%, or 5% of the volume of the protein preparation. When polypeptide/protein is produced by chemical synthesis, it is preferably substantially free of chemical precursors or other chemicals, i.e., it is separated from chemical precursors or other chemicals which are involved in the synthesis of the protein. Accordingly, such preparations of the polypeptide/protein have less than about 30%, 20%, 10%, 5% (by dry weight) of chemical precursors or compounds other than polypeptide/protein fragment of interest. In a preferred embodiment of the present invention, polypeptides/proteins are isolated or purified. An "isolated" nucleic acid molecule is one which is separated from other nucleic acid molecules which are present in the natural source of the nucleic acid molecule. Moreover, an "isolated" nucleic acid molecule, such as a cdna molecule, can be substantially free of other cellular material, or culture medium when produced by recombinant techniques, or substantially free of chemical precursors or other chemicals when chemically synthesized. In a preferred embodiment of the invention, nucleic acid molecules encoding polypeptides/ 65 proteins of the invention are isolated or purified. The term "isolated" nucleic acid molecule does not include a nucleic acid that is a member of a library that has not been purified away from other library clones containing other nucleic acid molecules. The term "portion" or "fragment" as used herein refers to a 5 fragment of a nucleic acid molecule containing at least about 10, 15, 25, 30, 35, 40, 45, 100, 150, 200, 250, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1200, 1250, 1300, 1350, 2,000, 3,000, 4,000, 5,000, 6,000, 7,000, 8,000, 9,000, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 10 19,000, 20,000, 21,000, 22,000, 23,000, 24,000, 25,000, 26,000, 27,000, 28,000, 29,000, or more contiguous nucleic acids in length of the relevant nucleic acid molecule and having at least one functional feature of the nucleic acid molecule (or the encoded protein has one functional feature 15 of the protein encoded by the nucleic acid molecule); or a fragment of a protein or a polypeptide containing at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 90, 100, 120, 140, 160, 180, 200, 220, 240, 260, 280, 300, 320, 340, 360, 380, 400, 500, 600, 700, 800, 900, 1,000, 1,500, 20 2,000, 2,500, 3,000, 3,500, 4,000, 4,100, 4,200, 4,300, 4,350, 4,360, 4,370, 4,380 amino acid residues in length of the relevant protein or polypeptide and having at least one functional feature of the protein or polypeptide. The term "having a biological activity of the protein" or 25 "having biological activities of the polypeptides of the invention" refers to the characteristics of the polypeptides or proteins having a common biological activity similar or identical structural domain and/or having sufficient amino acid identity to the polypeptide encoded by the nucleotide sequence of 30 SEQ ID NO: 1 or 3, or the polypeptide having the amino acid sequence of SEQ ID NO:2, or a complement thereof. Such common biological activities of the polypeptides of the invention include antigenicity and immunogenicity. The term "under stringent condition" refers to hybridiza- 35 tion and washing conditions under which nucleotide sequences having at least 70%, at least 75%, at least 80%, at least 85%, at least 90%, or at least 95% identity to each other remain hybridized to each other. Such hybridization conditions are described in, for example but not limited to, Current 40 Protocols in Molecular Biology, John Wiley & Sons, N.Y. (1989), 6.3.1-6.3.6.; Basic Methods in Molecular Biology, Elsevier Science Publishing Co., Inc., N.Y. (1986), pp. 75-78, and 84-87; and Molecular Cloning, Cold Spring Harbor Laboratory, N.Y. (1982), pp. 387-389, and are well known to 45 those skilled in the art. A preferred, non-limiting example of stringent hybridization conditions is hybridization in 6x sodium chloride/sodium citrate (SSC), 0.5% SDS at about 68 C. followed by one or more washes (e.g., about 5 to 30 min each) in 2xSSC, 0.5% SDS at room temperature. Another 50 preferred, non-limiting example of stringent hybridization conditions is hybridization in 6xSSC at about 45 C. followed by one or more washes (e.g., about 5 to 30 min each) in 0.2xSSC, 0.1% SDS at about 45-65 C. The term "variant" as used herein refers either to a natu- 55 rally occurring genetic mutant of CoV-HKU1 or a recombinantly prepared variation of CoV-HKU1 each of which contain one or more mutations in its genome compared to CoV- HKU1. The term "variant" may also refers either to a naturally occurring variation of a given peptide or a recom- 60 binantly prepared variation of a given peptide or protein in which one or more amino acid residues have been modified by amino acid substitution, addition, or deletion. 4. DESCRIPTION OF FIGURES FIG. 1 shows a partial DNA sequence (SEQ ID NO: 1) and its deduced amino acid sequence (SEQ ID NO:2) obtained
US 8,092,994 B2 7 from CoV-HKU1 that has 91% amino acid identity to the RNA-dependent RNA polymerase protein of known Coronaviruses. FIG. 2 shows the entire genomic DNA sequence (SEQ ID NO:3) of CoV-HKU1 and its deduced amino acid sequences 5 therefrom in three frames. An asterisk (*) indicates a stop codon which marks the end of a peptide. The first-frame translation and amino acid sequences: SEQ ID NOS:34-456; the second-frame translation and amino acid sequences: SEQ ID NOS:457-723; and the third-frame translation and amino 10 acid sequences: SEQ ID NOS:724-1318. FIG. 3 shows the complement (SEQ ID NO: 1319) of the entire genomic DNA sequence (SEQ ID NO:3) of CoV- HKU1 in 3'-5' orientation and its deduced amino acid sequences therefrom in three frames. An asterisk (*) indicates 15 a stop codon which marks the end of a peptide. The first-frame translation and amino acid sequences: SEQ ID NOS:1319-1907; the second-frame translation and amino acid sequences: SEQ ID NO:1908-2453; and the third-frame translation and amino acid sequences: SEQ ID NOS:2454-20 2918. FIG. 4 shows the genome organization of CoV-HKU1. Arrows indicate the putative cleavage sites of the polyprotein encoded by ORF la and ORF lb. The peptides are shown in SEQ ID NOS:15-17, respectively, in order of appearance. 25 FIG. 5A shows the phylogenetic analysis of the chymotrypsin like protease (3CLP" ), replicase (Rep), helicase (Hel), and hemagglutinin esterase (HE); and FIG. 5B shows that of the spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins of CoV-HKU1. The trees were constructed by 30 the neighbor joining method using the Jukes-Cantor correction and bootstrap values were calculated from 1000 trees. A total of 303, 928, 603, 386, 1356, 82, 223 and 441 amino acid positions in 3CLP", Rep, Hel, HE, S, E, M, and N respectively were included in the analysis. The scale bar indicates 35 the estimated number of substitutions per 10 amino acids. FIG. 6 shows the important features of the S protein of CoV-HKU1 (residues 7-336 of SEQ ID NO:420) in comparison with those of other viruses, i.e., HCoV-OC43 (human coronavirus OC43; SEQ ID NO:21), MHV (murine hepatitis 40 virus; SEQ ID NO:22), SDAV (rat sialodacryoadenitis encephalomyelitis virus; SEQ ID NO:23), BCoV (bovine coronavirus; SEQ ID NO:24), PHEV (porcine hemagglutinating encephalomyelitis virus; SEQ ID NO:25), and ECoV (equine coronavirus; SEQ ID NO:26). The cleavage site pep- 45 tides are shown in residues 752-766 of SEQ ID NO:420 and SEQ ID NOS:28-33, respectively, in order of appearance. FIG. 7 shows the sequential quantitative RT-PCR (closed squares; copies/ml) for CoV-HKU1 in nasopharyngeal aspirates; and serum IgG antibody titers against N protein of 50 CoV-HKU1 (closed triangles). FIG. 8 shows the Western blot analysis of purified recombinant CoV-HKU1 N protein antigen. Prominent immunoreactive protein bands of about 53 kda were detected by the Western blot using the patient's sera obtained during the 55 second and fourth weeks of the illness (lanes 2 and 3). Only very faint bands were observed with the serum samples obtained from the patient during the first week of the illness (lane 1) and two healthy blood donors (lane 4 and 5), respectively. 60 5. DETAILED DESCRIPTION OF THE INVENTION The present invention relates to the CoV-HKU1 that phy- 65 logenetically relates to known Coronaviruses. In a specific embodiment, CoV-HKU1 comprises a nucleotide sequence 8 of SEQ ID NO: 1 and/or 3. In a specific embodiment, the present invention provides isolated nucleic acid molecules of the CoV-HKU1, comprising, or, alternatively, consisting of the nucleotide sequence of SEQ ID NO: 1 and/or 3, a complement thereof or a portion thereof. In another specific embodiment, the invention provides isolated nucleic acid molecules which hybridize under stringent conditions, as defined herein, to a nucleic acid molecule having the sequence of SEQ ID NO: 1 or 3, or specific genes of known member of Coronaviridae, or a complement thereof. In another specific embodiment, the invention provides isolated polypeptides or proteins that are encoded by a nucleic acid molecule comprising a nucleotide sequence that is at least about 5, 10, 15, 20, 25, 30, 35, 40, 45, 100, 150, 200, 300, 350, or more contiguous nucleotides of the nucleotide sequence of SEQ ID NO: 1, or a complement thereof. In yet another specific embodiment, the invention provides isolated polypeptides or proteins that are encoded by a nucleic acid molecule comprising or, alternatively consisting of a nucleotide sequence that is at least 5, 10, 15, 20, 25, 30, 35, 40, 45, 100, 150, 200, 300, 350, 400, 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1,000, 1,050,1,100,1,150,1,200,2,000,3,000,4,000,5,000,6,000, 7,000, 8,000, 9,000, 10,000, 11,000, 12,000, 13,000, 14,000, 15,000, 16,000, 17,000, 18,000, 19,000, 20,000, 21,000, 22,000, 23,000, 24,000, 25,000, 26,000, 27,000, 28,000, 29,000 or more contiguous nucleotides of the nucleotide sequence of SEQ ID NO:3, or a complement thereof. The polypeptides or the proteins of the present invention preferably have one or more biological activities of the proteins encoded by the sequence of SEQ ID NO: 1, 3, or the native viral proteins containing the amino acid sequences encoded by the sequence of SEQ ID NO:1 or 3. The invention further relates to the use of the sequence information of the isolated virus for diagnostic and therapeutic methods. In a specific embodiment, the invention provides the entire nucleotide sequence of CoV-HKU1 (SEQ ID NO:3), or fragments, or complement thereof. Furthermore, the present invention relates to a nucleic acid molecule that hybridizes any portion of the genome of the CoV-HKU1 (SEQ ID NO:3) under the stringent conditions. In a specific embodiment, the invention provides nucleic acid molecules which are suitable for use as primers consisting of or comprising the nucleotide sequence of SEQ ID NO:1 or 3, or a complement thereof, or a portion thereof. In another specific embodiment, the invention provides nucleic acid molecules which are suitable for use as hybridization probes for the detection of nucleic acids encoding a polypeptide of the invention, consisting of or comprising the nucleotide sequence of SEQ ID NO: 1 or 3, a complement thereof, or a portion thereof. The invention further encompasses chimeric or recombinant viruses or viral proteins encoded by said nucleotide sequences. The invention further provides antibodies that specifically bind a polypeptide of the invention encoded by the nucleotide sequence of SEQ ID NO: 1 or 3, or a fragment thereof, or any CoV-HKU1 epitope as well as the polypeptides having the amino acid sequences of SEQ ID NO:2 and SEQ ID NOS: 34-2918, respectively, shown in FIGS. 2 and 3. Such antibodies include, but are not limited to polyclonal, monoclonal, bi-specific, multi-specific, human, humanized, chimeric antibodies, single chain antibodies, Fab fragments, F(ab')2 fragments, disulfide-linked Fvs, intrabodies and fragments containing either a VL or VH domain or even a complementary determining region (CDR) that specifically binds to a polypeptide of the invention. In one embodiment, the invention provides methods for detecting the presence, activity or expression of the CoV-
9 HKU1 of the invention in a biological material, such as cells, blood, saliva, urine, sputum, nasopharyngeal aspirates, and so forth. The presence of the CoV-HKU1 in a sample can be determined by contacting the biological material with an agent which can detect directly or indirectly the presence, activity or expression of the CoV-HKU1. In a specific embodiment, the detection agents are the antibodies of the present invention. In another embodiment, the detection agent is a nucleic acid of the present invention. In another embodiment, the invention provides vaccine preparations comprising the CoV-HKU1 recombinant and chimeric forms of said virus, or subunits of the virus. The present invention further provides methods of preparing recombinant or chimeric forms of CoV-HKU1. In another specific embodiment, the vaccine preparations of the present invention comprise one or more nucleic acid molecules comprising or consisting of the sequence of SEQ ID NO: 1 and/or 3, or a fragment thereof. In another embodiment, the invention provides vaccine preparations comprising one or more polypeptides of the invention encoded by a nucleotide sequence comprising or consisting of the nucleotide sequence of SEQ ID NO: 1 and/or 3, or a fragment thereof, including the polypeptides having the amino acid sequences of SEQ ID NO:2 or SEQ ID NOS:34-2918 shown in FIGS. 2 and 8. Furthermore, the present invention provides methods for treating, ameliorating, managing, or preventing respiratory tract infections by administering to a subject in need thereof the anti-viral agents of the present invention, alone or in combination with other antivirals [e.g., amantadine, rimantadine, gancyclovir, acyclovir, ribavirin, penciclovir, oseltamivir, foscarnet zidovudine (AZT), didanosine (ddi), lamivudine (3TC), zalcitabine (ddc), stavudine (d4t), nevirapine, delavirdine, indinavir, ritonavir, vidarabine, nelfinavir, saquinavir, relenza, tamiflu, pleconaril, interferons, etc.], steroids and corticosteroids such as prednisone, cortisone, fluticasone and glucocorticoid, antibiotics, analgesics, bronchodialaters, or other treatments for respiratory and/or viral infections. In one aspect, the anti-viral agent of the present invention prevents or inhibit the binding of the virus or viral proteins to a host cell under a physiological condition, thereby preventing or inhibiting the infection of the host cell by the virus. In another aspect, the anti-viral agent of the invention prevents or inhibits replication of the viral nucleic acid molecules in the host cell under a physiological condition by interacting with the viral nucleic acid molecules or its transcription mechanisms. In a specific embodiment, the antiviral agent of the invention includes the vaccine or immunogenic preparations of the invention or an antibody that immunospecifically binds CoV-HKU1 or any CoV-HKU1 epitope and may neutralizes CoV-HKU1. In another specific embodiment, the anti-viral agent is a polypeptide or protein of the invention or a nucleic acid molecule of the invention. In addition, the present invention provides a method of preventing or inhibiting replication in a host cell of a nucleic acid molecule having the nucleotide sequence of SEQ ID NO: 1 and/or 3, or inhibiting the activities of the polypeptides encoded by the nucleotide sequence of SEQ ID NO: 1 and/or 3, a complement thereof, or a portion thereof, including the polypeptides having the amino acid sequence of SEQ ID NO:2 or SEQ ID NO:34-2918 shown in FIGS. 2 and 8, by administering to said host cell the anti-viral agent of the invention. In a specific embodiment the host cell is a mammalian cell, such as a cell of humans, primates, horses, cows, sheep, pigs, goats, dogs, cats, arivan species and rodents. Preferably, the cell is a primate cell and most preferably a human cell. US 8,092,994 B2 10 Furthermore, the present invention provides pharmaceutical compositions comprising anti-viral agents of the present invention and a pharmaceutically acceptable carrier. The present invention also provides kits comprising pharmaceu- 5 tical compositions of the present invention. 5.1 Recombinant and Chimeric CoV-HKU1 The present invention encompasses recombinant or chiio meric viruses encoded by viral vectors derived from the genome of CoV-HKU1 or natural variants thereof. In a specific embodiment, a recombinant virus is one derived from the CoV-HKU1. In a specific embodiment, the virus has a nucleotide sequence of SEQ ID NO:3. In another specific embodi- 15 ment, a recombinant virus is one derived from a natural variant of CoV-HKU1. A natural variant of CoV-HKU1 has a sequence that is different from the genomic sequence (SEQ ID NO:3) of CoV-HKU1, due to one or more naturally occurred mutations, including, but not limited to, point muta- 20 tions, rearrangements, insertions, deletions etc., to the genomic sequence that may or may not result in a phenotypic change. In accordance with the present invention, a viral vector which is derived from the genome of the CoV-HKU, is one that contains a nucleic acid sequence that encodes at least 25 a part of one ORF of the CoV-HKU1. In a specific embodiment, the ORF comprises or consists of a nucleotide sequence of SEQ ID NO:1 or a fragment thereof. In a specific embodiment, there are more than one ORF within the nucleotide sequence of SEQ ID NO:3, or a fragment thereof. In another 30 embodiment, the polypeptides encoded by the ORF comprises or consists of amino acid sequences of SEQ ID NO:34-2918 shown in FIGS. 2 and 8, or SEQ ID NO:2, or a fragment thereof. In accordance with the present invention these viral vectors may or may not include nucleic acids that are non- 35 native to the viral genome. In another specific embodiment, a chimeric virus of the invention is a recombinant CoV-HKU1 which further comprises a heterologous nucleotide sequence. In accordance with the invention, a chimeric virus may be encoded by a 4o nucleotide sequence in which heterologous nucleotide sequences have been added to the genome or in which endogenous or native nucleotide sequences have been replaced with heterologous nucleotide sequences. According to the present invention, the chimeric viruses 45 are encoded by the viral vectors of the invention which further comprise a heterologous nucleotide sequence. In accordance with the present invention a chimeric virus is encoded by a viral vector that may or may not include nucleic acids that are non-native to the viral genome. In accordance with the inven- 50 tion a chimeric virus is encoded by a viral vector to which heterologous nucleotide sequences have been added, inserted or substituted for native or non-native sequences. In accordance with the present invention, the chimeric virus may be encoded by nucleotide sequences derived from different 55 strains or variants of CoV-HKU1. In particular, the chimeric virus is encoded by nucleotide sequences that encode antigenic polypeptides derived from different strains or variants of CoV-HKU1. A chimeric virus maybe of particular use for the generation 60 of recombinant vaccines protecting against two or more viruses (Tao et al., J. Virol. 72, 2955-2961; Durbin et al., 2000, J. Virol. 74, 6821-6831; Skiadopoulos et al., 1998, J. Virol. 72, 1762-1768 (1998); Teng et al., 2000, J. Virol. 74, 9317-9321). For example, it can be envisaged that a virus 65 vector derived from the CoV-HKU1 expressing one or more proteins of variants of CoV-HKU1, or vice versa, will protect a subject vaccinated with such vector against infections by
US 8,092,994 B2 11 12 both the native CoV-HKU1 and the variant. Attenuated and length or partial copies of the CoV-HKU1 genome will be replication-defective viruses may be of use for vaccination generated in prokaryotic cells for the expression of viral purposes with live vaccines as has been suggested for other nucleic acids in-vitro or in-vivo. The latter vectors may conviruses. tain other viral sequences for the generation of chimeric In accordance with the present invention the heterologous 5 viruses or chimeric virus proteins, may lack parts of the viral sequence to be incorporated into the viral vectors encoding genome for the generation of replication defective virus, and the recombinant or chimeric viruses of the invention include may contain mutations, deletions or insertions for the generasequences obtained or derived from different strains or vari- tion of attenuated viruses. ants of CoV-HKU1. In addition, eukaryotic cells, transiently or stably express- In certain embodiments, the chimeric or recombinant 10 ing one or more full-length or partial CoV-HKU1 proteins can viruses of the invention are encoded by viral vectors derived be used. Such cells can be made by transfection (proteins or from viral genomes wherein one or more sequences, inter- nucleic acid vectors), infection (viral vectors) or transduction genic regions, termini sequences, or portions or entire ORE (viral vectors) and may be useful for complementation of have been substituted with a heterologous or non-native mentioned wild type, attenuated, replication-defective or chisequence. In certain embodiments of the invention, the chi- 15 meric viruses. meric viruses of the invention are encoded by viral vectors The viral vectors and chimeric viruses of the present invenderived from viral genomes wherein one or more heterolo- tion may be used to modulate a subject's immune system by gous sequences have been inserted or added to the vector. stimulating a humoral immune response, a cellular immune The selection of the viral vector may depend on the species response or by stimulating tolerance to an antigen. As used of the subject that is to be treated or protected from a viral 20 herein, a subject means: humans, primates, horses, cows, infection. sheep, pigs, goats, dogs, cats, avian species and rodents. In accordance with the present invention, the viral vectors can be engineered to provide antigenic sequences which con- 5.2 Formulation of Vaccines and Antivirals fer protection against infection by the CoV-HKU1 and natural variants thereof. The viral vectors may be engineered to pro- 25 In a preferred embodiment, the invention provides a provide one, two, three or more antigenic sequences. In accor- teinaceous molecule or CoV-HKU1 specific viral protein or dance with the present invention the antigenic sequences may functional fragment thereof encoded by a nucleic acid be derived from the same virus, from different strains or according to the invention. Useful proteinaceous molecules variants of the same type of virus, or from different viruses. are for example derived from any of the genes or genomic The expression products and/or recombinant or chimeric 30 fragments derivable from the virus according to the invention, virions obtained in accordance with the invention may advan- including envelop protein (E protein), integral membrane tageously be utilized in vaccine formulations. The expression protein (M protein), spike protein (S protein), nucleocapsid products and chimeric virions of the present invention may be protein (N protein), hemagglutinin esterase (HE protein), and engineered to create vaccines against a broad range of patho- RNA-dependent RNA polymerase. Such molecules, or antigens, including viral and bacterial antigens, tumor antigens, 35 genic fragments thereof, as provided herein, are for example allergen antigens, and auto antigens involved in autoimmune useful in diagnostic methods or kits and in pharmaceutical disorders. In particular, the chimeric virions of the present compositions such as subunit vaccines. Particularly useful are invention may be engineered to create vaccines for the pro- polypeptides encoded by the nucleotide sequence of SEQ ID tection of a subject from infections with CoV-HKU1 and NO: 1 or 3, including the polypeptides having the amino acid variants thereof. 40 sequences of SEQ ID NOS:34-2918 in FIGS. 2 and 8, or SEQ In certain embodiments, the expression products and ID NO:2, or antigenic fragments thereof for inclusion as recombinant or chimeric virions of the present invention may antigen or subunit immunogen, but inactivated whole virus be engineered to create vaccines against a broad range of can also be used. Particularly useful are also those proteinapathogens, including viral antigens, tumor antigens and ceous substances that are encoded by recombinant nucleic autoantigens involved in autoimmune disorders. One way to 45 acid fragments of the CoV-HKU1 genome; of course preachieve this goal involves modifying existing CoV-HKU1 ferred are those that are within the preferred bounds and genes to contain foreign sequences in their respective external metes of ORFs, in particular, for eliciting CoV-HKU1 spedomains. Where the heterologous sequences are epitopes or cific antibody or T cell responses, whether in vivo (e.g. for antigens of pathogens, these chimeric viruses may be used to protective or therapeutic purposes or for providing diagnostic induce a protective immune response against the disease 5o antibodies) or in vitro (e.g. by phage display technology or agent from which these determinants are derived. another technique useful for generating synthetic antibodies). Thus, the present invention relates to the use of viral vec- The invention provides vaccine formulations for the pretors and recombinant or chimeric viruses to formulate vac- vention and treatment of infections with CoV-HKU1. In cercines against a broad range of viruses and/or antigens. The tain embodiments, the vaccine of the invention comprises present invention also encompasses recombinant viruses 55 recombinant and chimeric viruses of the CoV-HKU1. comprising a viral vector derived from the CoV-HKU1 or In another aspect, the present invention also provides DNA variants thereof which contains sequences which result in a vaccine formulations comprising a nucleic acid or fragment virus having a phenotype more suitable for use in vaccine of the CoV-HKU1, or nucleic acid molecules having the formulations. The mutations and modifications can be in cod- sequence of SEQ ID NO: 1 or 3, or a fragment thereof. In ing regions, in intergenic regions and in the leader and trailer 6o another specific embodiment, the DNA vaccine formulations sequences of the virus. of the present invention comprises a nucleic acid or fragment The invention provides a host cell comprising a nucleic thereof encoding the antibodies which immunospecifically acid or a vector according to the invention. Plasmid or viral binds CoV-HKU1. In DNA vaccine formulations, a vaccine vectors containing the polymerase components of CoV- DNA comprises a viral vector, such as that derived from the HKU1 are generated in prokaryotic cells for the expression of 65 CoV-HKU1, bacterial plasmid, or other expression vector, the components in relevant cell types (bacteria, insect cells, bearing an insert comprising a nucleic acid molecule of the eukaryotic cells). Plasmid or viral vectors containing full- present invention operably linked to one or more control
13 elements, thereby allowing expression of the vaccinating proteins encoded by said nucleic acid molecule in a vaccinated subject. Such vectors can be prepared by recombinant DNA technology as recombinant or chimeric viral vectors carrying a nucleic acid molecule of the present invention. Various heterologous vectors are described for DNA vaccinations against viral infections. For example, the vectors described in the following references may be used to express Co V-HKU1 sequences instead of the sequences of the viruses or other pathogens described; in particular, vectors described for hepatitis B virus (Michel, M. L. et al., 1995, DAN-mediated immunization to the hepatitis B surface antigen in mice: Aspects of the humoral response mimic hepatitis B viral infection in humans, Proc. Natl. Aca. Sci. USA 92:5307-5311; Davis, H. L. et al., 1993, DNA-based immunization induces continuous seretion of hepatitis B surface antigen and high levels of circulating antibody, Human Molec. Genetics 2:1847-1851), HIV virus (Wang, B. et al., 1993, Gene inoculation generates immune responses against human immunodeficiency virus type 1, Proc. Natl. Acad. Sci. USA 90:4156-4160; Lu, S. et al., 1996, Simian immunodeficiency virus DNA vaccine trial in macques, J. Virol. 70:3978-3991; Letvin, N. L. et al., 1997, Potent, protective anti-hiv immune responses generated by bimodal HIV envelope DNA plus protein vaccination, Proc Natl Acad Sci USA. 94(17):9378-83), and influenza viruses (Robinson, H L et al., 1993, Protection against a lethal influenza virus challenge by immunization with a haemagglutinin-expressing plasmid DNA, Vaccine 11:957-960; Ulmer, J. B. et al., Heterologous protection against influenza by injection of DNA encoding a viral protein, Science 259:1745-1749), as well as bacterial infections, such as tuberculosis (Tascon, R. E. et al., 1996, Vaccination against tuberculosis by DNA injection, Nature Med. 2:888-892; Huygen, K. et al., 1996, Immunogenicity and protective efficacy of a tuberculosis DNA vaccine, Nature Med., 2:893-898), and parasitic infection, such as malaria (Sedegah, M., 1994, Protection against malaria by immunization with plasmid DNA encoding circumsporozoite protein, Proc. Natl. Acad. Sci. USA 91:9866-9870; Doolan, D. L. et al., 1996, Circumventing genetic restriction of protection against malaria with multigene DNA immunization: CD8+ T cell-interferon 6, and nitric oxide-dependent immunity, J. Exper. Med., 1183:1739-1746). Many methods may be used to introduce the vaccine formulations described above. These include, but are not limited to, oral, intradermal, intramuscular, intraperitoneal, intravenous, subcutaneous, and intranasal routes. Alternatively, it may be preferable to introduce the chimeric virus vaccine formulation via the natural route of infection of the pathogen for which the vaccine is designed. The DNA vaccines of the present invention may be administered in saline solutions by injections into muscle or skin using a syringe and needle (Wolff J. A. et al., 1990, Direct gene transfer into mouse muscle in vivo, Science 247:1465-1468; Raz, E., 1994, Intradermal gene immunization: The possible role of DNA uptake in the induction of cellular immunity to viruses, Proc. Natl. Acd. Sci. USA 91:9519-9523). Another way to administer DNA vaccines is called "gene gun" method, whereby microscopic gold beads coated with the DNA molecules of interest is fired into the cells (Tang, D. et al., 1992, Genetic immunization is a simple method for eliciting an immune response, Nature 356:152-154). For general reviews of the methods for DNA vaccines, see Robinson, H. L., 1999, DNA vaccines: basic mechanism and immune responses (Review), Int. J. Mol. Med. 4(5):549-555; Barber, B., 1997, Introduction: Emerging vaccine strategies, Seminars in Immunology 9(5): US 8,092,994 B2 14 269-270; and Robinson, H. L. et al., 1997, DNA vaccines, Seminars in Immunology 9(5):271-283. 5.3 Adjuvants and Carrier Molecules 5 CoV-HKU1 -associated antigens are administered with one or more adjuvants. In one embodiment, the CoV-HKU1 -associated antigen is administered together with a mineral salt adjuvants or mineral salt gel adjuvant. Such mineral salt and io mineral salt gel adjuvants include, but are not limited to, aluminum hydroxide (ALHYDROGEL, REHYDIRAGEL), aluminum phosphate gel, aluminum hydroxyphosphate (ADJU-PHOS), and calcium phosphate. In another embodiment, CoV-HKU1 -associated antigen is 15 administered with an immunostimulatory adjuvant. Such class of adjuvants, include, but are not limited to, cytokines (e.g., interleukin-2, interleukin-7, interleukin-12, granulocyte-macrophage colony stimulating factor (GM-CSF), interfereon-y interleukin-1(3 (IL-1(3), and IL-1(3 peptide or Sclavo 20 Peptide), cytokine-containing liposomes, triterpenoid glycosides or saponins (e.g., QuilA and QS-21, also sold under the trademark STIMULON, ISCOPREP), Muramyl Dipeptide (MDP) derivatives, such as N-acetyl-muramyl-L-threonyl- D-isoglutamine (Threonyl-MDP, sold under the trademark 25 TERMURTIDE), GMDP, N-acetyl-nor-muramyl-L-alanyl- D-isoglutamine, N-acetylmuramyl-L-alanyl-D-isoglutaminyl-L-alanine-2-(1'-2'-dipaimitoyl-sn-glycero-3 -hydroxy phosphoryloxy)-ethylamine, muramyl tripeptide phosphatidylethanolamine (MTP-PE), unmethylated CpG dinucle- 30 otides and oligonucleotides, such as bacterial DNA and fragments thereof, LPS, monophosphoryl LipidA (3D-MLA sold under the trademark MPL), and polyphosphazenes. In another embodiment, the adjuvant used is a particular adjuvant, including, but not limited to, emulsions, e.g., Fre- 35 und's Complete Adjuvant, Freund's Incomplete Adjuvant, squalene or squalane oil-in-water adjuvant formulations, such as SAF and MF59, e.g., prepared with block-copolymers, such as L-121 (polyoxypropylene/polyoxyetheylene) sold under the trademark PLURONIC L-121, Liposomes, 40 Virosomes, cochleates, and immune stimulating complex, which is sold under the trademark ISCOM. In another embodiment, a microparticular adjuvant is used. Microparticulare adjuvants include, but are not limited to biodegradable and biocompatible polyesters, homo- and 45 copolymers of lactic acid (PLA) and glycolic acid (PGA), poly(lactide-co-glycolides) (PLGA) microparticles, polymers that self-associate into particulates (poloxamer particles), soluble polymers (polyphosphazenes), and virus-like particles (VLPs) such as recombinant protein particulates, 50 e.g., hepatitis B surface antigen (HbsAg). Yet another class of adjuvants that may be used include mucosal adjuvants, including but not limited to heat-labile enterotoxin from Escherichia coli (LT), cholera holotoxin (CT) and cholera Toxin B Subunit (CTB) from Vibrio chol- 55 Brae, mutant toxins (e.g., LTK63 and LTR72), microparticles, and polymerized liposomes. In other embodiments, any of the above classes of adjuvants may be used in combination with each other or with other adjuvants. For example, non-limiting examples of com- 6o bination adjuvant preparations that can be used to administer the CoV-HKU1 -associated antigens of the invention include liposomes containing immunostimulatory protein, cytokines, or T-cell and/or B-cell peptides, or microbes with or without entrapped IL-2 or microparticles containing enterotoxin. 65 Other adjuvants known in the art are also included within the scope of the invention (see Vaccine Design: The Subunit and Adjuvant Approach, Chap. 7, Michael F. Powell and Mark J.
US 8,092,994 B2 15 16 Newman (eds.), Plenum Press, New York, 1995, which is response, depending on the host species, and include but are incorporated herein in its entirety). not limited to, Freund's (complete and incomplete) adjuvant, The effectiveness of an adjuvant may be determined by mineral gels such as aluminum hydroxide, surface active measuring the induction of antibodies directed against an substances such as lysolecithin, pluronic polyols, polyanions, immunogenic polypeptide containing a CoV-HKU1 polypep- 5 peptides, oil emulsions, keyhole limpet hemocyanins, dinitide epitope, the antibodies resulting from administration of trophenol, and potentially useful adjuvants for humans such this polypeptide in vaccines which are also comprised of the as BCG (Bacille Calmette-Guerin) and Corynebacterium various adjuvants. parvum. Such adjuvants are also well known in the art (see The polypeptides may be formulated into the vaccine as Section 5.4, supra). neutral or salt forms. Pharmaceutically acceptable salts 10 Monoclonal antibodies can be prepared using a wide variinclude the acid additional salts (formed with free amino ety of techniques known in the art including the use of hybrigroups of the peptide) and which are formed with inorganic doma, recombinant, and phage display technologies, or a acids, such as, for example, hydrochloric or phosphoric acids, combination thereof. For example, monoclonal antibodies or organic acids such as acetic, oxalic, tartaric, maleic, and the can be produced using hybridoma techniques including those like. Salts formed with free carboxyl groups may also be 15 known in the art and taught, for example, in Harlow et al., derived from inorganic bases, such as, for example, sodium Antibodies: A Laboratory Manual, (Cold Spring Harbor potassium, ammonium, calcium, or ferric hydroxides, and Laboratory Press, 2nd ed. 1988); Hammerling, et al., in: such organic bases as isopropylamine, trimethylamine, Monoclonal Antibodies and T-Cell Hybridomas, pp. 563-681 2-ethylamino ethanol, histidine, procaine and the like. (Elsevier, N.Y., 1981) (both of which are incorporated by The vaccines of the invention may be multivalent or uni- 20 reference in their entireties). The term "monoclonal antivalent. Multivalent vaccines are made from recombinant body" as used herein is not limited to antibodies produced viruses that direct the expression of more than one antigen. through hybridoma technology. The term "monoclonal anti- Many methods may be used to introduce the vaccine for- body" refers to an antibody that is derived from a single clone, mulations of the invention; these include but are not limited to including any eukaryotic, prokaryotic, or phage clone, and oral, intradermal, intramuscular, intraperitoneal, intravenous, 25 not the method by which it is produced. subcutaneous, intranasal routes, and via scarification Methods for producing and screening for specific antibod- (scratching through the top layers of skin, e.g., using a bifur- ies using hybridoma technology are routine and well known cated needle). in the art. In a non-limiting example, mice can be immunized The patient to which the vaccine is administered is prefer- with an antigen of interest or a cell expressing such an antiably a mammal, most preferably a human, but can also be a 30 gen. Once an immune response is detected, e.g., antibodies non-human animal including but not limited to cows, horses, specific for the antigen are detected in the mouse serum, the sheep, pigs, fowl (e.g., chickens), goats, cats, dogs, hamsters, mouse spleen is harvested and splenocytes isolated. The splemice and rats. nocytes are then fused by well known techniques to any suitable myeloma cells. Hybridomas are selected and cloned 5.4 Preparation of Antibodies 35 by limiting dilution. The hybridoma clones are then assayed by methods known in the art for cells that secrete antibodies Antibodies which specifically recognize a polypeptide of capable of binding the antigen. Ascites fluid, which generally the invention, such as, but not limited to, polypeptides com- contains high levels of antibodies, can be generated by inocuprising the sequence of SEQ ID NO:2 or any of SEQ ID NOS: lating mice intraperitoneally with positive hybridoma clones. 34-2918 or CoV-HKU1 epitope, or antigen-binding frag- 40 Antibody fragments which recognize specific epitopes ments thereof, can be used for detecting, screening, and iso- may be generated by known techniques. For example, Fab lating the polypeptide of the invention or fragments thereof, and F(ab')2 fragments may be produced by proteolytic cleavor similar sequences that might encode similar enzymes from age of immunoglobulin molecules, using enzymes such as the other organisms. For example, in one specific embodi- papain (to produce Fab fragments) or pepsin (to produce ment, an antibody which immirnospecifically binds CoV- 45 F(ab')2 fragments). F(ab')2 fragments contain the complete HKU 1 epitope, or a fragment thereof, can be used for various light chain, and the variable region, the CH1 region and the in vitro detection assays, including enzyme-linked immun- hinge region of the heavy chain. osorbent assays (ELISA), radioimmunoasyays, Western blot, The antibodies of the invention or fragments thereof can be etc., for the detection of a polypeptide of the invention or, also produced by any method known in the art for the synthepreferably, CoV-HKU1, in samples, for example, a biological 50 sis of antibodies, in particular, by chemical synthesis or prefmaterial, including cells, cell culture media (e.g., bacterial erably, by recombinant expression techniques. cell culture media, mammalian cell culture media, insect cell The nucleotide sequence encoding an antibody may be culture media, yeast cell culture media, etc.), blood, plasma, obtained from any information available to those skilled in serum, tissues, sputum, naseopharyngeal aspirates, etc. the art (i.e., from Genbank, the literature, or by routine clon- Antibodies specific for a polypeptide of the invention or 55 ing and sequence analysis). If a clone containing a nucleic any epitope of CoV-HKU1 may be generated by any suitable acid encoding a particular antibody or an epitope-binding method known in the art. Polyclonal antibodies to an antigen- fragment thereof is not available, but the sequence of the of-interest, for example, the CoV-HKU1 epitopes orpolypep- antibody molecule or epitope-binding fragment thereof is tides encoded by a nucleotide sequence of SEQ ID NO: 1 or 3, known, a nucleic acid encoding the immunoglobulin may be including the polypeptides shown in FIG. 2 (SEQ ID NOS: 60 chemically synthesized or obtained from a suitable source 34-1318), FIG. 8 (SEQ ID NOS:1319-2918), as well as SEQ (e.g., an antibody cdna library, or a cdna library generated ID NO:2, can be produced by various procedures well known from, or nucleic acid, preferably polya+ RNA, isolated from in the art. For example, an antigen can be administered to any tissue or cells expressing the antibody, such as hybridoma various host animals including, but not limited to, rabbits, cells selected to express an antibody) by PCR amplification mice, rats, etc., to induce the production of antisera contain- 65 using synthetic primers hybridizable to the 3' and 5' ends of ing polyclonal antibodies specific for the antigen. Various the sequence or by cloning using an oligonucleotide probe adjuvants may be used to increase the immunological specific for the particular gene sequence to identify, e.g., a
US 8,092,994 B2 17 18 cdna clone from a cdna library that encodes the antibody. antigen binding domain that binds the antigen of interest can Amplified nucleic acids generated by PCR may then be be selected or identified with antigen, e.g., using labeled cloned into replicable cloning vectors using any method well antigen or antigen bound or captured to a solid surface or known in the art. bead. Phage used in these methods are typically filamentous Once the nucleotide sequence of the antibody is deter- 5 phage, including fd and M13. The antigen binding domains mined, the nucleotide sequence of the antibody may be are expressed as a recombinantly fused protein to either the manipulated using methods well known in the art for the phage gene III or gene VIII protein. Examples of phage dismanipulation of nucleotide sequences, e.g., recombinant play methods that can be used to make the immunoglobulins, DNA techniques, site directed mutagenesis, PCR, etc. (see, or fragments thereof, of the present invention include those for example, the techniques described in Sambrook et al., io disclosed in Brinkman et al., J. Immunol. Methods, 182:41- supra; and Ausubel et al., eds., 1998, Current Protocols in 50, 1995; Ames et al., J. Immunol. Methods, 184:177-186, Molecular Biology, John Wiley & Sons, NY, which are both 1995; Kettleborough et al., Eur. J. Immunol., 24:952-958, incorporated by reference herein in their entireties), to gen- 1994; Persic et al., Gene, 187:9-18, 1997; Burton et al., erate antibodies having a different amino acid sequence by, Advances in Immunology, 57:191-280, 1994; PCT applicafor example, introducing amino acid substitutions, deletions, 15 tion No. PCT/GB91/01134; PCT publications WO 90/02809; and/or insertions into the epitope-binding domain regions of WO 91/10737; WO 92/01047; WO 92/18619; WO 93/11236; the antibodies or any portion of antibodies which may WO 95/15982; WO 95/20401; and U.S. Pat. Nos. 5,698,426; enhance or reduce biological activities of the antibodies. 5,223,409; 5,403,484; 5,580,717; 5,427,908; 5,750,753; Recombinant expression of an antibody requires construc- 5,821,047; 5,571,698; 5,427,908; 5,516,637; 5,780,225; tion of an expression vector containing a nucleotide sequence 20 5,658,727; 5,733,743 and 5,969,108; each of which is incorthat encodes the antibody. Once a nucleotide sequence encod- porated herein by reference in its entirety. ing an antibody molecule or a heavy or light chain of an As described in the above references, after phage selection, antibody, or portion thereof has been obtained, the vector for the antibody coding regions from the phage can be isolated the production of the antibody molecule may be produced by and used to generate whole antibodies, including human antirecombinant DNA technology using techniques well known 25 bodies, or any other desired fragments, and expressed in any in the art as discussed in the previous sections. Methods desired host, including mammalian cells, insect cells, plant which are well known to those skilled in the art can be used to cells, yeast, and bacteria, e.g., as described in detail below. construct expression vectors containing antibody coding For example, techniques to recombinantly produce Fab, Fab' sequences and appropriate transcriptional and translational and F(ab)2 fragments can also be employed using methods control signals. These methods include, for example, in vitro 3o known in the art such as those disclosed in PCT publication recombinant DNA techniques, synthetic techniques, and in WO 92/22324; Mullinax et al., BioTechniques, 12(6):864- vivo genetic recombination. The nucleotide sequence encod- 869,1992; and Sawai et al., AJRI, 34:26-34, 1995; and Better ing the heavy-chain variable region, light-chain variable et al., Science, 240:1041-1043, 1988 (each of which is incorregion, both the heavy-chain and light-chain variable regions, porated by reference in its entirety). Examples of techniques an epitope-binding fragment of the heavy- and/or light-chain 35 which can be used to produce single-chain Fvs and antibodies variable region, or one or more complementarity determining include those described in U.S. Pat. Nos. 4,946,778 and regions (CDRs) of an antibody may be cloned into such a 5,258,498; Huston et al., Methods in Enzymology, 203:46- vector for expression. Thus-prepared expression vector can 88, 1991; Shu et al., PNAS, 90:7995-7999, 1993; and Skerra be then introduced into appropriate host cells for the expres- et al., Science, 240:1038-1040, 1988. sion of the antibody. Accordingly, the invention includes host 40 Once an antibody molecule of the invention has been procells containing a polynucleotide encoding an antibody spe- duced by any methods described above, it may then be puricific for the polypeptides of the invention or fragments fied by any method known in the art for purification of an thereof. immunoglobulin molecule, for example, by chromatography The host cell may be co-transfected with two expression (e.g., ion exchange, affinity, particularly by affinity for the vectors of the invention, the first vector encoding a heavy 45 specific antigen after Protein A or Protein G purification, and chain derived polypeptide and the second vector encoding a sizing column chromatography), centrifugation, differential light chain derived polypeptide. The two vectors may contain solubility, or by any other standard techniques for the purifiidentical selectable markers which enable equal expression of cation of proteins. Further, the antibodies of the present heavy and light chain polypeptides or different selectable invention or fragments thereof may be fused to heterologous markers to ensure maintenance of both plasmids. Alterna- 50 polypeptide sequences described herein or otherwise known tively, a single vector may be used which encodes, and is in the art to facilitate purification. capable of expressing, both heavy and light chain polypep- For some uses, including in vivo use of antibodies in tides. In such situations, the light chain should be placed humans and in vitro detection assays, it may be preferable to before the heavy chain to avoid an excess of toxic free heavy use chimeric, humanized, or human antibodies. A chimeric chain (Proudfoot, Nature, 322:52, 1986; and Kohler, Proc. 55 antibody is a molecule in which different portions of the Natl. Acad. Sci. USA, 77:2197,1980). The coding sequences antibody are derived from different animal species, such as for the heavy and light chains may comprise cdna or antibodies having a variable region derived from a murine genomic DNA. monoclonal antibody and a constant region derived from a In another embodiment, antibodies can also be generated human immunoglobulin. Methods for producing chimeric using various phage display methods known in the art. In 6o antibodies are known in the art. See e.g., Morrison, Science, phage display methods, functional antibody domains are dis- 229:1202, 1985; Oi et al., BioTechniques, 4:214 1986; Gillies played on the surface of phage particles which carry the et al., J. Immunol. Methods, 125:191-202, 1989; U.S. Pat. polynucleotide sequences encoding them. In a particular Nos. 5,807,715; 4,816,567; and 4,816,397, which are incorembodiment, such phage can be utilized to display antigen porated herein by reference in their entireties. Humanized binding domains, such as Fab and Fv or disulfide-bond sta- 65 antibodies are antibody molecules from non-human species bilized Fv, expressed from a repertoire or combinatorial anti- that bind the desired antigen having one or more complemenbody library (e.g., human or murine). Phage expressing an tarity determining regions (CDRs) from the non-human spe-
US 8,092,994 B2 19 20 cies and framework regions from a human immunoglobutin the polypeptides of the invention or fragments, derivatives, molecule. Often, framework residues in the human frame- analogs, or variants thereof, or similar molecules having the work regions will be substituted with the corresponding resi- similar enzymatic activities as the polypeptide of the invendue from the CDR donor antibody to alter, preferably tion. Such solid supports include, but are not limited to, glass, improve, antigen binding. These framework substitutions are 5 cellulose, polyacrylamide, nylon, polystyrene, polyvinyl identified by methods well known in the art, e.g., by modeling chloride or polypropylene. of the interactions of the CDR and framework residues to identify framework residues important for antigen binding 5.5 Pharmaceutical Compositions and Kits and sequence comparison to identify unusual framework residues at particular positions. See, e.g., Queen et al., U.S. Pat. 10 The present invention encompasses pharmaceutical com- No. 5,585,089; Riechmann et al., Nature, 332:323, 1988, positions comprising anti-viral agents of the present invenwhich are incorporated herein by reference in their entireties. tion. In a specific embodiment, the anti-viral agent is an Antibodies can be humanized using a variety of techniques antibody which immunospecifically binds CoV-HKU1 or known in the art including, for example, CDR-grafting (EP variants thereof, or any proteins derived therefrom. In another 239,400; PCT publication WO 91/09967; U.S. Pat. Nos. 15 specific embodiment, the anti-viral agent is a polypeptide or 5,225,539; 5,530,101 and 5,585,089), veneering orresurfac- nucleic acid molecule of the invention. The pharmaceutical ing (EP 592,106; EP 519,596; Padlan, Molecular Immunol- compositions have utility as an anti-viral prophylactic agent ogy, 28(4/5):489-498, 1991; Studnicka et al., Protein Engi- and may be administered to a subject where the subject has neering, 7(6):805-814, 1994; Roguska et al., Proc Natl. Acad. been exposed or is expected to be exposed to a virus. Sci. USA, 91:969-973, 1994), and chain shuffling (U.S. Pat. 20 Various delivery systems are known and can be used to No. 5,565,332), all of which are hereby incorporated by ref- administer the pharmaceutical composition of the invention, erence in their entireties. e.g., encapsulation in liposomes, microparticles, microcap- Completely human antibodies are particularly desirable for sules, recombinant cells capable of expressing the mutant therapeutic treatment of human patients. Human antibodies viruses, receptor mediated endocytosis (see, e.g., Wu and Wu, can be made by a variety of methods known in the art includ- 25 1987, J. Biol. Chem. 262:4429 4432). Methods of introducing phage display methods described above using antibody tion include but are not limited to intradermal, intramuscular, libraries derived from human immunoglobulin sequences. intraperitoneal, intravenous, subcutaneous, intranasal, epidu- See U.S. Pat. Nos. 4,444,887 and 4,716,111; and PCT publi- ral, and oral routes. The compounds may be administered by cations WO 98/46645; WO 98/50433; WO 98/24893; WO any convenient route, for example by infusion or bolus injec- 98/16654; WO 96/34096; WO 96/33735; and WO 91/10741, 30 tion, by absorption through epithelial or mucocutaneous lineach of which is incorporated herein by reference in its ings (e.g., oral mucosa, rectal and intestinal mucosa, etc.) and entirety. may be administered together with other biologically active Human antibodies can also be produced using transgenic agents. Administration can be systemic or local. In a preferred mice which are incapable of expressing functional endog- embodiment, it may be desirable to introduce the pharmaceuenous immunoglobulins, but which can express human 35 tical compositions of the invention into the lungs by any immunoglobulin genes. For an overview of this technology suitable route. Pulmonary administration can also be for producing human antibodies, see Lonberg and Huszar, Int. employed, e.g., by use of an inhaler or nebulizer, and formu- Rev. Immunol., 13:65-93, 1995. For a detailed discussion of lation with an aerosolizing agent. this technology for producing human antibodies and human In a specific embodiment, it may be desirable to administer monoclonal antibodies and protocols forproducing such anti- 40 the pharmaceutical compositions of the invention locally to bodies, see, e.g., PCT publications WO 98/24893; WO the area in need of treatment; this may be achieved by, for 92/01047; WO 96/34096; WO 96/33735; European Patent example, and not by way of limitation, local infusion during No. 0 598 877; U.S. Pat. Nos. 5,413,923; 5,625,126; 5,633, surgery, topical application, e.g., in conjunction with a wound 425; 5,569,825; 5,661,016; 5,545,806; 5,814,318; 5,885,793; dressing after surgery, by injection, by means of a catheter, by 5,916,771; and 5,939,598, which are incorporated by refer- 45 means of a suppository, by means of nasal spray, or by means ence herein in their entireties. In addition, companies such as of an implant, said implant being of a porous, non porous, or Abgenix, Inc. (Fremont, Calif.), Medarex (NJ) and Genp- gelatinous material, including membranes, such as sialastic harm (San Jose, Calif.) can be engaged to provide human membranes, or fibers. In one embodiment, administration can antibodies directed against a selected antigen using technol- be by direct injection at the site (or former site) infected ogy similar to that described above. 50 tissues. Completely human antibodies which recognize a selected In another embodiment, the pharmaceutical composition epitope can be generated using a technique referred to as can be delivered in a vesicle, in particular a liposome (see "guided selection." In this approach a selected non-human Langer, 1990, Science 249:1527-1533; Treat et al., in Lipomonoclonal antibody, e.g., a mouse antibody, is used to guide somes in the Therapy of Infectious Disease and Cancer, the selection of a completely human antibody recognizing the 55 Lopez Berestein and Fidler (eds.), Liss, New York, pp. 353- same epitope. (Jespers et al., Bio/technology, 12:899-903, 365 (1989); Lopez-Berestein, ibid., pp. 317-327; see gener- 1988). ally ibid.). Antibodies filsed or conjugated to heterologous polypep- In yet another embodiment, the pharmaceutical compositides may be used in in vitro immunoassays and in purifica- tion can be delivered in a controlled release system. In one tion methods (e.g., affinity chromatography) well known in 60 embodiment, a pump may be used (see Langer, supra; Sefton, the art. See e.g., PCT publication Number WO 93/21232; EP 1987, CRC Crit. Ref. Biomed. Eng. 14:201; Buchwald et al., 439,095; Naramura et al., Immunol. Lett., 39:91-99, 1994; 1980, Surgery 88:507; and Saudek et al., 1989, N. Engl. J. U.S. Pat. No. 5,474,981; Gillies et al., PNAS, 89:1428-1432, Med. 321:574). In another embodiment, polymeric materials 1992; and Fell et al., J. Immunol., 146:2446-2452, 1991, can be used (see Medical Applications of Controlled Release, which are incorporated herein by reference in their entireties. 65 Langer and Wise (eds.), CRC Pres., Boca Raton, Fla. (1974); Antibodies may also be attached to solid supports, which Controlled Drug Bioavailability, Drug Product Design and are particularly useful for immunoassays or purification of Performance, Smolen and Ball (eds.), Wiley, New York
21 (1984); Ranger and Peppas, J. Macromol. Sci. Rev. Macromol. Chem. 23:61 (1983); see also Levy et al., 1985, Science 228:190; During et al., 1 989, Ann. Neurol. 25:351; Howard et al., 1989, J. Neurosurg. 71:105). In yet another embodiment, a controlled release system can be placed in proximity of the composition's target, i.e., the lung, thus requiring only a fraction of the systemic dose (see, e.g., Goodson, in Medical Applications of Controlled Release, supra, vol. 2, pp. 115-138 (1984)). Other controlled release systems are discussed in the review by Langer (Science 249:1527-1533 (1990)). The pharmaceutical compositions of the present invention comprise a therapeutically effective amount of recombinant or chimeric CoV-HKU1, and a pharmaceutically acceptable carrier. Ina specific embodiment, the term "pharmaceutically acceptable" means approved by a regulatory agency of the Federal or a state government or listed in the U.S. Pharmacopeia or other generally recognized pharmacopeia for use in animals, and more particularly in humans. The term "carrier" refers to a diluent, adjuvant, excipient, or vehicle with which the pharmaceutical composition is administered. Such pharmaceutical carriers can be sterile liquids, such as water and oils, including those of petroleum, animal, vegetable or synthetic origin, such as peanut oil, soybean oil, mineral oil, sesame oil and the like. Water is a preferred carrier when the pharmaceutical composition is administered intravenously. Saline solutions and aqueous dextrose and glycerol solutions can also be employed as liquid carriers, particularly for injectable solutions. Suitable pharmaceutical excipients include starch, glucose, lactose, sucrose, gelatin, malt, rice, flour, chalk, silica gel, sodium stearate, glycerol monostearate, talc, sodium chloride, dried skim milk, glycerol, propylene, glycol, water, ethanol and the like. The composition, if desired, can also contain minor amounts of wetting or emulsifying agents, or ph buffering agents. These compositions can take the form of solutions, suspensions, emulsion, tablets, pills, capsules, powders, sustained release formulations and the like. The composition can be formulated as a suppository, with traditional binders and carriers such as triglycerides. Oral formulation can include standard carriers such as pharmaceutical grades of mannitol, lactose, starch, magnesium stearate, sodium saccharine, cellulose, magnesium carbonate, etc. Examples of suitable pharmaceutical carriers are described in "Remington's Pharmaceutical Sciences" by E. W. Martin. The formulation should suit the mode of administration. In a preferred embodiment, the composition is formulated in accordance with routine procedures as a pharmaceutical composition adapted for intravenous administration to human beings. Typically, compositions for intravenous administration are solutions in sterile isotonic aqueous buffer. Where necessary, the composition may also include a solubilizing agent and a local anesthetic such as lignocaine to ease pain at the site of the injection. Generally, the ingredients are supplied either separately or mixed together in unit dosage form, for example, as a dry lyophilized powder or water free concentrate in a hermetically sealed container such as an ampoule or sachette indicating the quantity of active agent. Where the composition is to be administered by infusion, it can be dispensed with an infusion bottle containing sterile pharmaceutical grade water or saline. Where the composition is administered by injection, an ampoule of sterile water for injection or saline can be provided so that the ingredients may be mixed prior to administration. The pharmaceutical compositions of the invention can be formulated as neutral or salt forms. Pharmaceutically acceptable salts include those formed with free amino groups such US 8,092,994 B2 22 as those derived from hydrochloric, phosphoric, acetic, oxalic, tartaric acids, etc., and those formed with free carboxyl groups such as those derived from sodium, potassium, ammonium, calcium, ferric hydroxides, isopropylamine, tri- 5 ethylamine, 2 ethylamino ethanol, histidine, procaine, etc. The amount of the pharmaceutical composition of the invention which will be effective in the treatment of a particular disorder or condition will depend on the nature of the disorder or condition, and can be determined by standard 10 clinical techniques. In addition, in vitro assays may optionally be employed to help identify optimal dosage ranges. The precise dose to be employed in the formulation will also depend on the route of administration, and the seriousness of 15 the disease or disorder, and should be decided according to the judgment of the practitioner and each patient's circumstances. However, suitable dosage ranges for intravenous administration are generally about 20-500 micrograms of active compound per kilogram body weight. Suitable dosage 20 ranges for intranasal administration are generally about 0.01 pg/kg body weight to 1 mg/kg body weight. Effective doses may be extrapolated from dose response curves derived from in vitro or animal model test systems. Suppositories generally contain active ingredient in the 25 range of 0.5% to 10% by weight; oral formulations preferably contain 10% to 95% active ingredient. The invention also provides a pharmaceutical pack or kit comprising one or more containers filled with one or more of the ingredients of the pharmaceutical compositions of the 30 invention. Optionally associated with such container(s) can be a notice in the form prescribed by a governmental agency regulating the manufacture, use or sale of pharmaceuticals or biological products, which notice reflects approval by the agency of manufacture, use or sale for human administration. 35 In a preferred embodiment, the kit contains an anti-viral agent of the invention, e.g., an antibody specific for the polypeptides encoded by a nucleotide sequence of SEQ ID NO: 1 or 3, or any CoV-HKU1 epitope, or a polypeptide or protein of the present invention, including those shown in FIG. 2 (SEQ ID 40 NOS:34-1318), FIG. 8 (SEQ ID NOS:1319-2918), and SEQ ID NO:2, or a nucleic acid molecule of the invention, alone or in combination with adjuvants, antivirals, antibiotics, analgesic, bronchodialaters, or other pharmaceutically acceptable excipients. 45 The present invention further encompasses kits comprising a container containing a pharmaceutical composition of the present invention and instructions for use. 5.6 Detection Assays 50 The present invention provides a method for detecting an antibody, which immunospecifically binds to the CoV- HKU1, in a biological sample, for example blood, serum, plasma, saliva, urine, etc., from a patient suffering from res- 55 piratory tract infection. In a specific embodiment, the method comprising contacting the sample with the polypeptides or protein encoded by the nucleotide sequence of SEQ ID NO: 1 and/or 3, including the polypeptides having the amino acid sequences of SEQ ID NOS:34-1318 shown in FIG. 2, SEQ ID 6o NOS:1319-2918 shown in FIG. 8, or SEQ ID NO:2, directly immobilized on a substrate and detecting the virus-bound antibody directly or indirectly by a labeled heterologous antiisotype antibody. In another specific embodiment, the sample is contacted with a host cell comprising a nucleic acid mol- 65 ecule having the nucleotide sequence of SEQ ID NO: 1 or 3 and expressing the polypeptides encoded thereby, and the bound antibody can be detected by immunofluorescent assay.
23 An exemplary method for detecting the presence or absence of a polypeptide or nucleic acid of the invention in a biological sample involves obtaining a biological sample from various sources and contacting the sample with a compound or an agent capable of detecting an epitope or nucleic acid (e.g., mrna, genomic RNA) of CoV-HKL1 such that the presence of CoV-HKU1 is detected in the sample. A preferred agent for detecting CoV-HKU1 mrna or genomic RNA of the invention is a labeled nucleic acid probe capable of hybridizing to mrna or genomic RNA encoding a polypeptide of the invention. The nucleic acid probe can be, for example, a nucleic acid molecule comprising or consisting of the nucleotide sequence of SEQ ID NO: 1 or 3, or a portion thereof, or a complement thereof, such as an oligonucleotide of at least 15, 20, 25, 30, 50, 100, 250, 500, 750, 1,000 or more contiguous nucleotides in length and sufficient to specifically hybridize under stringent conditions to a CoV- HKU1 mrna or genomic RNA. In another preferred specific embodiment, the presence of CoV-HKU1 is detected in the sample by an reverse transcription polymerase chain reaction (RT-PCR) using the primers that are constructed based on a partial nucleotide sequence of the genome of CoV-HKU1 or a genomic nucleic acid sequence of SEQ ID NO:3, or based on a nucleotide sequence of SEQ ID NO: 1. In a non-limiting specific embodiment, preferred primers to be used in a RT-PCR method are: 5'-GGTTGGGACTATCCTAAGTGTGA-3'(SEQ ID NO:4) and 5'-CCATCATCAGATAGAATCATCATA-3' (SEQ ID NO: 5), in the presence of 3 mm MgCl2 and the thermal cycles are, for example, but not limited to, 94 C. for 8 min followed by 40 cycles of 94 C. for 1 min, 50 C. for 1 min, 72 C. for 1 min. In more preferred specific embodiment, the present invention provides a real-time quantitative PCR assay to detect the presence of CoV-HKU1 in a biological sample by subjecting the cdna obtained by reverse transcription of the extracted total RNA from the sample to PCR reactions using the specific primers, such as those having nucleotide sequences of SEQ ID NOS:4 and 5, and a fluorescence dye, such as SYBR Green I, which fluoresces when bound nonspecifically to double-stranded DNA. The fluorescence signals from these reactions are captured at the end of extension steps as PCR product is generated over a range of the thermal cycles, thereby allowing the quantitative determination of the viral load in the sample based on an amplification plot. A preferred agent for detecting CoV-HKU1 is an antibody that specifically binds a polypeptide of the invention or any CoV-HKU1 epitope, preferably an antibody with a detectable label. Antibodies can be polyclonal, or more preferably, monoclonal. An intact antibody, or a fragment thereof (e.g., Fab or F(ab')2) can be used. The term "labeled", with regard to the probe or antibody, is intended to encompass direct labeling of the probe or antibody by coupling (i.e., physically linking) a detectable substance to the probe or antibody, as well as indirect labeling of the probe or antibody by reactivity with another reagent that is directly labeled. Examples of indirect labeling include detection of a primary antibody using a fluorescently labeled secondary antibody and end-labeling of a DNA probe with biotin such that it can be detected with fluorescently labeled streptavidin. The detection method of the invention can be used to detect mrna, protein (or any epitope), or genomic RNA in a sample in vitro as well as in vivo. For example, in vitro techniques for detection of mrna include northern hybridizations, in situ hybridizations, RT-PCR, and RNase US 8,092,994 B2 protection. In vitro techniques for detection of an epitope of 65 CoV-HKU1 include enzyme linked immunosorbent assays (ELISAs), Western blots, immunoprecipitations and immun- ofluorescence. In vitro techniques for detection of genomic RNTA include northern hybridizations, RT-PCR, and RNase protection. Furthermore, in vivo techniques for detection of CoV-HKU1 include introducing into a subject organism a 5 labeled antibody directed against the polypeptide. For example, the antibody can be labeled with a radioactive marker whose presence and location in the subject organism can be detected by standard imaging techniques, including autoradiography. 10 In a specific embodiment, the methods further involve obtaining a control sample from a control subject, contacting the control sample with a compound or agent capable of detecting CoV-HKU1, e.g., a polypeptide of the invention or mrna or genomic RNA encoding a polypeptide ofthe invention, such that the presence of CoV-HKU1 or the polypeptide 15 or mrna or genomic RNA encoding the polypeptide is detected in the sample, and comparing the absence of CoV- HKU1 or the polypeptide or mrna or genomic RNA encoding the polypeptide in the control sample with the presence of CoV-HKU1, or the polypeptide or mrna or genomic DNA 20 encoding the polypeptide in the test sample. The invention also encompasses kits for detecting the presence of CoV-HKU1 or a polypeptide or nucleic acid of the invention in a test sample. The kit, for example, can comprise a labeled compound or agent capable of detecting CoV- 25 HKU1 or the polypeptide or a nucleic acid molecule encoding the polypeptide in a test sample and, in certain embodiments, a means for determining the amount of the polypeptide or mrna in the sample (e.g., an antibody which binds the polypeptide or an oligonucleotide probe which binds to DNA 30 or mrna encoding the polypeptide). Kits can also include instructions for use. For antibody-based kits, the kit can comprise, for example: (1) a first antibody (e.g., attached to a solid support) which binds to a polypeptide of the invention or CoV-HKU1 35 epitope; and, optionally, (2) a second, different antibody which binds to either the polypeptide or the first antibody and is conjugated to a detectable agent. For oligonucleotide-based kits, the kit can comprise, for example: (1) an oligonucleotide, e.g., a delectably labeled 40 oligonucleotide, which hybridizes to a nucleic acid sequence encoding a polypeptide of the invention or to a sequence within the CoV-HKU1 genome or (2) a pair of primers useful for amplifying a nucleic acid molecule containing an CoV- HKU1 sequence. The kit can also comprise, e.g., a buffering 45 agent, a preservative, or a protein stabilizing agent. The kit can also comprise components necessary for detecting the detectable agent (e.g., an enzyme or a substrate). The kit can also contain a control sample or a series of control samples which can be assayed and compared to the test sample con- 50 tamed. Each component of the kit is usually enclosed within an individual container and all of the various containers are within a single package along with instructions for use. 6. EXAMPLES 55 The following examples illustrate the identification of the novel CoV-HKU1. These examples should not be construed as limiting. Methods and Results 60 As a general reference, Wiedbrauk DL & Johnston SLG. (Manual of Clinical Virology, Raven Press, New York, 1993) was used. 6.1 Clinical Subject The patient is an in-patient ofthe United Christian Hospital in Hong Kong. Nasopharyngeal aspirates were collected from
25 the patient weekly from the first till the fifth week of the illness, stool and urine in the first and second week of the illness, and sera in the first, second, and fourth weeks of the illness. 6.2 Antibody Detection To produce a fusion plasmid for protein purification, primers, 5'-TTTTCCTTTTGCGGCCGCTTAAGCAACA- GAGTCTTCTA-3' (SEQ ID NO:6) and 5'-CGGAATTC- GATGTCTTATACTCCCGGT-3'(SEQ ID NO:7) were used to amplify the gene encoding the N protein of the CoV-HKU1 by RT-PCR. The sequence coding for amino acid residues 1 to 441 of the N protein was amplified and cloned into the EcoRI and Notl sites of expression vector pet-28b(+) (Novagen, Madison, Wis., USA) in frame and downstream of the series of six histidine residues. The (His)6-tagged (SEQ ID NO:27) recombinant N protein was expressed in E. coli and purified using the Nit -loaded HiTrap Chelating System (Amersham Pharmacia, USA) according to the manufacturer's instructions. Western blot analysis was performed as follows: Twohundred ng of purified (His)6-tagged (SEQ ID NO:27) recombinant N protein of CoV-HKUi were loaded into each well of a sodium dodecyl sulfate (SDS-10% polyacrylamide gel and subsequently electroblotted onto a nitrocellulose membrane (Bio-Rad, Hercules, Calif., USA). The blot was cut into strips and the strips were incubated separately with 1:2000 dilution of serum samples obtained during the first, second, and fourth weeks of the patient's illness. Serum samples of two healthy blood donors were used as controls. Antigen-antibody interaction was detected with an ECL fluorescence system (Amersham Life Science, Buckinghamshire, UK). Several prominent immunoreactive bands were visible for serum samples collected during the second and fourth weeks of the patient's illness (FIG. 7, lanes 2 and 3). The sizes of the largest bands were about 53 kda, consistent with the expected size of 52.8 kda for the full-length (His)6-tagged (SEQ ID NO:27) N protein, whereas the other bands were consistent with the degradation products of the (His)6-tagged (SEQ ID NO:27) N protein. Only very faint bands were observed for serum samples obtained from the patient during the first week of the illness (FIG. 7, lane 1) and two healthy blood donors (FIG. 7, lanes 4 and 5). ELISA was performed using the recombinant N protein of CoV-HKUi prepared as described above. Each well of a Nunc immunoplate (Roskilde, Denmark) was coated with 20 ng of purified (His)6-tagged (SEQ ID NO:27) recombinant N protein for 12 h and then blocked in phosphate-buffered saline with 2% bovine serum albumin. The serum samples obtained from the patient during the first, second, and fourth weeks of the illness were serially diluted and were added to the wells of the (His)6-tagged (SEQ ID NO:27) recombinant N protein-coated plates in a total volume of 100 ltl per well and incubated at 37 C. for 2 h. After washing with washing buffer five times, 100 ltl per well of 1:4000 diluted horse radish peroxidase-conjugated goat anti-human IgG antibody (Zymed Laboratories Inc., South San Francisco, Calif., USA) were added to the wells and incubated at 37 C. for 1 h. After washing with washing buffer five times, 100 ltl of diluted 3,3',5,5'-tetramethylbenzidine (Zymed Laboratories Inc.) were added to each well and incubated at room temperature for 15 min. One hundred microliters of 0.3 M H2SO4 were US 8,092,994 B2 26 added and the absorbance at 450 nm of each well was measured. Each sample was tested in duplicate and the mean absorbance for each serum was calculated. 5 Box titration was carried out with different dilutions of (His)6-tagged (SEQ ID NO:27) recombinant N protein coating antigen and serum obtained from the fourth week of the patient's illness. The results identified 20 ng and 80 ng of purified(his)6 tagged recombinant N protein per ELISA well 10 as the ideal amount for plate coating and 1:1000 and 1:20 as the most optimal serum dilution for IgG and IgM detection, respectively. To establish the baseline for the tests, serum samples (di- 15 luted at 1:1000 and 1:20 for IgG and IgM, respectively) from 100 healthy blood donors were tested in the CoV-HKUi antibody ELISA. For the 100 sera from healthy blood donors, the mean ELISA OD450 values for IgG and IgM detection were 0.178 and 0.224, with standard deviations of 0.070 and 20 0.117. Absorbance values of 0.387 and 0.576 were selected as the cutoff values (that equal the sum of the mean value from the healthy control and three times the standard deviation) for IgG and IgM, respectively. Using these cutoff values, the 25 titers for IgG of the patient's serum samples obtained during the first, second, and fourth weeks of the illness were <1:1000,1:2000, and 1:8000, respectively (FIG. 6), and those for IgM were 1:20, 1:40, and 1:80, respectively (data not shown). 30 6.3 RT-PCR and Real Time Quantitative PCR RT-PCR Assay 35 An RT-PCR was developed to detect the CoV-HKUi sequence from NPA samples. Total RNA from clinical samples was reverse transcribed using random hexamers and cdna was amplified using primers 5'-GGTTGGGACTATC- 40 CTAAGTGTGA-3'(SEQ ID NO:4) and 5'-CCATCATCA- GATAGAATCATCATA-3'(SEQ ID NO:5), which were constructed based on the RNA-dependent RNA polymeraseencoding sequence (SEQ ID NO: 1) of the CoV-HKUi in the presence of 2.5 mm MgCl2 (94 C. for 8 min followed by 40 45 50 55 cycles of 94 C. for 1 min, 50 C. for 1 min, 72 C. for 1 min). The summary of a typical RT-PCR protocol is as follows: 1. RNA Extraction RNA from 140 µl of NPA samples was extracted by QIAquick viral RNA extraction kit and was eluted in 50 ld of elution buffer. 2. Reverse Transcription RNA 11.5 pl 0.1 M DTT 2µl 5x buffer 4 pl 60 10 mm dntp 1 µ1 Superscript II, 200 U/µl (Invitrogen) 1 µ1 Random hexamers, 0.3 µg/µ1 Reaction condition 0.5 pl 42 C., 50 min 94 C., 3 min 65 4 C.
US 8,092,994 B2 27 28 3. PCR lowest among all known coronaviruses with genome cdna generated by random primers was amplified in a 50 sequences available, with a GC skew of 0.19. Table 1 shows µ1 reaction as follows: the comparison of genomic features of CoV-HKU1 and other corona viruses. cdna 2µf 10 mm dntp 0.5 µl lox buffer 5µl 25 mm MgCl2 5 µl 25 pm Forward primer 0.5 p1 10 25 pm Reverse primer 0.5 ld AmpliTaq Gold polymerase, 0.25 ld 5 U/µl (Applied Biosystems) Water 36.25 ld Thermal-cycle condition: 95 C., 10 min, followed by 40 15 cycles of 95 C., 1 min; 50 C. 1 min; 72 C., 1 min. 4. Primer Sequences Primers were designed based on the RNA-dependent RNA polymerase encoding sequence (SEQ ID NO:1) of the CoV- 20 HKU1. Coronaviruses Group 1 TABLE 1 Size (bases) Genome features G + C content GC skew HCoV-229E 27317 0.38 0.13 PEDV 28033 0.42 0.09 HCoV-NL63 27553 0.34 0.16 Group 2 CoV-HKU1 29942 0.32 0.19 HCoV-OC43 30738 0.37 0.18 BcoV 31028 0.37 0.17 MHV 31357 0.42 0.14 Group 3 IBV 27608 0.38 0.14 SARS-CoV 29751 0.41 0.02 Forward primer: 5'-GGTTGGGACTATCCTAAGTGTGA-3' (SEQ ID NO 4) Reverse primer: (SEQ ID NO 5) and 5'-CCATCATCAGATAGAATCATCATA-3' Product size: 440 bps HCoV-229E = human coronavirus 229E; PEDV = porcine epidemic diarrhea virus; 25 HCoV-NL63 = human coronavirus NL63; HCoV-0C43 = human coronavirus OC43; NEW = murine hepatitis virus; BCoV = bovine coronavirus; 1BV = infectious bronchitis virus; SARS-CoV = SARS coronavirus; 30 GC skew =(G-C)/(G+C) Real-Time Quantitative PCR Assay Total RNA from 140 ld of nasopharyngeal aspirate (NPA) The genome organization is the same as other coronaviwas extracted by QlAamp virus RNA mini kit (Qiagen) as ruses, with the characteristic gene order 5'-replicase, S, E, M, instructed by the manufacturer. Ten µ1 of eluted RNA samples N-3'. Both 5' and 3' ends contain short untranslated regions. were reverse transcribed by 200 U of Superscript II reverse 35 The 5' end of the genome consists of a putative 5' leader transcriptase (Invitrogen) in a 20 t1 reaction mixture contain- sequence. A putative transcription regulatory sequences ing 0.15 tg of random hexamers, 10 mmol/l DTT, and 0.5 (TRS) motif, 5'-CUAAAC-3', was found at the 3' end of the mmol/l dntp, as instructed. Complementary DNA was then leader sequence and precedes each translated ORF except amplified in a SYBR Green I fluorescence reaction (Roche, ORF4 and ORF6 which encodes the putative E protein. Table Ind.) mixtures. Briefly, 20 p1 reaction mixtures containing 2 p 40 2 shows the putative transcription regulatory sequences in the of cdna, 3.5 mmol/l MgC12, 0.25 µcool/l of forward primer genome of CoV-HKU1. [5'-GGTTGGGACTATCCTAAGTGTGA-3' (SEQ ID NO:4)] and 0.25 timol/l reverse primer [5'-CCATCATCA- TABLE 2 GATAGAATCATCATA-3' (SEQ ID NO:5)] were thermal- cycled by a LightCycler (Roche) with the PCR program, 45 Number f base seq [95 C., 10 min followed folld by 50 cycles clf 95 of C., 10 min; 57 upstream ID C., 5 sec; 72 C. 9 sec]. Plasmids containing the target of AUG ORF TRS sequence NO. sequence were used as positive controls. Fluorescence signals -140 Leader UUAAAUCUAAACUUUUUAA (127) from these reactions were captured at the end of extension s AUG step in each cycle. To determine the specificity of the assay, 50 PCR products (440 base pairs) were subjected to a melting -7 HemagglutininUUAAAUCUAAACUAUG 9 curve analysis at the end of the assay (65 C. to 95 C., 0.1 C. esterase per second) (data not shown). The amount of CoV-HKU1 RNA in the nasopharyngeal -6 Spike UUAAAUCUAAACAUG 10 aspirates was followed weekly. Quantitative RT-PCR showed 55-13 ORF 5 UUAAAUCUAAACUUUAUUUAUG 11 that the amounts of CoV-HKU1 RNA were 8.5x105 and 9.6x 106 copies per ml in two nasopharyngeal aspirates collected -9 Membrane CUAAAUCUAAACAUUAUG 12 in the first week of the illness, 1.5x 105 copies per ml of NPA, -13 Nuc leocaps id UUAAAUCUAAACUAUUAGGAUG 13 respectively, at two time points collected in the second week of the illness, but CoV-HKU1 RNA was undetectable in the 60-35 ORF 9 UUAAAUCUAAACUAUUAGGAUGUCU 14 NPA collected in the third, fourth and fifth weeks of the illness (FIG. 6). CoV-HKU1 RNA was also undetectable in the urine and stool of the patient collected in the first and second weeks of the illness. Discussion The genome of CoV-HKU1 is a 29942-nucleotide long, polyadenylated RNA. The G+C content is 32%, which is the UAUACUCCCGGUCAUUAUG As in SDAV (Sialodacryoadentitis virus) and MHV (mouse hepatitis virus), ORF6 may share the same TRS with 65 ORF 5, suggesting that the translation of the E protein is cap-independent, possibly via an internal ribosomal entry site. The 3' untranslated region contains a predicted
29 pseudoknot structure 59-119 bp downstream of N gene. This pseudoknot structure is highly conserved among coronaviruses and plays a role in coronavirus RNA replication. The coding potential of the CoV-HKU1 genome is shown in FIG. 3 and Table 3 and the phylogenetic analyses of the chymotrypsin-like protease (3CLP" ), replicase, helicase, haemagglutinin-esterase (HE), S, E, M and N, are shown in FIGS. 4A and 4B. ORFs Start-end (base) TABLE 3 No. of bases No. of amino acids US 8,092,994 B2 Candidate Frame TRS ORF la 206-13600 13395 4465 +2 ORF lb 13600-21753 8154 2717 +1 HE (ORF 2) 21773-22933 1161 386 +2 Strong S (ORF 3) 22942-27012 4071 1356 +1 Strong ORF 4 26960-27070 111 36 +2 None ORF 5 27051-27380 330 109 +3 Strong E (ORF 6) 27373-27621 249 82 +1 None M (ORF 7) 27633-28304 672 223 +3 Strong N (ORF 8) 28320-29645 1326 441 +3 Strong ORF 9 28342-28959 618 205 +1 Strong The replicase la ORF (bases 206-13600) and replicase lb ORF (bases 13600-21753) occupy 21.5 kb of the CoV-HKU1 25 genome. Similar to other coronaviruses, a frame shift interrupts the protein-coding regions and separates the la and lb ORFs. This ORF encodes a number of putative proteins, including papain-like protease (PLP) with two copies of the PLP domain, PLPl" and PLP2'", 3CLP, replicase, heli- 30 case, and other proteins of unknown functions. These proteins are produced by proteolytic cleavages of a large polyprotein (FIG. 3). The sequence of the resulting putative proteins is the same as that in the MHV genome. This polyprotein is synthesized by a 1 ribosomal frameshift at a conserved site 35 (UUUAAAC) upstream of a pseudoknot structure at the junction of ORF 1 a and ORF lb. This ribosomal frameshift would result in a polyprotein of 7182 amino acids, which has 75-77% amino acid identities with the polyprotein in other Group 2 coronaviruses and 43-47% amino acid identities 40 with the polyprotein in other non-group 2 coronaviruses. The replicase gene of CoV-HKU1, which encodes 928 amino acids, has 87-89% amino acid identities with the replicase of other Group 2 coronaviruses and 54-65% amino acid identities with the replicase of other non-group 2 coronaviruses 45 (Table 4 and FIG. 4A). Table 4 shows amino acid identities between the predicted chymotrypsin-like protease (3CLP" ), replicase (Rep), helicase (Hel), hemagglutinin-esterase (HE), spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins of CoV-HKU1 and the corresponding proteins of 50 other coronaviruses. Group Virus 3CLP" TABLE 4 Pairwise amino acid identity (%) Rep Hel HE S E 1 HCoV-229E 45 54 55 31 26 35 28 PEDV 44 56 55 30 34 37 37 PTGV 45 57 57 32 34 37 27 CCOV 31 32 36 27 HCoV-NL63 43 54 54 30 28 32 28 2 HCoV-0C43 82 87 88 57 60 54 76 58 MHV 85 89 87 50 58 55 78 60 BCoV 84 88 88 56 61 55 76 57 SDAV 50 61 60 77 62 ECoV 53 61 56 78 59 65 PHEV 54 61 54 77 57 M N Group Virus 30 TABLE 4-continued Pairwise amino acid identity (%) 3CLP" Rep Hel HE S E M N 3 IBV 41 60 57 32 28 38 27 SARS- SARS-CoV 48 65 63 33 27 34 31 CoV HCoV-229E = human coronavirus 229E; 10 PEDV = porcine epidemic diarrhea virus; PTGV = porcine transmissible gastroenteritis virus; CCoV = canine enteric coronavims; HCoV-NL63 = human coronavirus NL63; HCoV-OC43 = human coronavirus OC43; MHV = murine hepatitis virus; 15 BCoV = bovine coronavirus; SDAV = rat sialodacryoadenitis coronavims; ECoV = equine coronavirus NC99; PHEV = porcine hemagglutinating encephalomyelitis virus; 1BV = infectious bronchitis virus; SARS-CoV = SARS coronavirus 20 55 60 The catalytic histidine and cysteine amino acid residues, conserved among the 3CLP" in all coronaviruses, are present in the predicted 3CLP of CoV-HKU1 (amino acids His 3375 and Cys3479 of ORF 1 a). In the N-terminal of the putative PLP (amino acid residues 945 to 1104 of ORF la), there are 14 tandem copies of a 30-base repeat, which encode NDDEDV- VTGD (SEQ ID NO: 15), followed by two 30-base regions that encode NNDEEIVTGD (SEQ ID NO: 16) and NDDQIV- VTGD (SEQ ID NO: 17), located upstream to the first copy of PLP domain, PLPl". This repeat is not observed in other coronaviruses. ORF 2 (bases 21773-22933) encodes the predicted HE glycoprotein with 386 amino acids. The HE protein of CoV- HKU1 has 50-57% amino acid identities with the HE proteins of other Group 2 coronaviruses (Table 4 and FIG. 4A). PFAM and InterProScan analyses of the ORF show that amino acid residues 1 to 349 of the predicted protein is a member of the haemagglutinin esterase family (PFAM accession no.: PF03996 and INTERPRO accession no. IPR007142). This family contains membrane glycoproteins that are present on viral surface and are involved with the cell infection process. It contains haemagglutinin chain 1 (HE1) and haemagglutinin chain 2 (HE2), and forms a homotrimer with each monomer being formed by two chains linked by a disulphide bond. Furthermore, PFAM and InterProScan analyses of the ORF show that amino acid residues 122 to 236 of the predicted protein are the haemagglutinin domain of HE-fusion glycoprotein family (PFAM accession no.: PF02710 and INTER- PRO accession no. IPR003860). HE is also present in other Group 2 coronaviruses and influenza C virus. SignalP analysis reveals a signal peptide probability of 0.738, with a cleavage site between residues 13 and 14. Although TMpred and TMHMM analyses of the ORF show four and three transmembrane domains, respectively, PHDhtm analysis of the ORF shows only one transmembrane domain at positions 354 to 376. This concurs with only one transmembrane region reported in the C terminal of the HE of BCoV (bovine coronavirus) and puffinosis virus. PrositeScan analysis of the HE protein of CoV-HKU1 reveals eight potential N-linked glycosylation (six NXS and two NXT) sites. These are located at positions 83 (NYT),110, (NGS), 145 (NVS), 168 (NYS), 193 (NFS), 286 (NSS), 314 (NVS, and 328 (NFT). The putative active site for neuraminate O-acetyl-esterase activity, FGDS (SEQ ID NO: 18), is located at positions 31-34. ORF 3 (bases 22942-27012) encodes the predicted S glycoprotein (PFAM accession no. PF01601) with 1356 amino acids. The S protein of CoV-HKU1 has 58-61% amino acid
US 8,092,994 B2 31 identities with the S proteins of other Group 2 coronaviruses, 39 to 59, and TMHMM analysis of the ORF shows two but has fewer than 35% amino acid identities with the S transmembrane domains at positions 10 to 32 and 39 to 58, proteins of Group 1, Group 3, and SARS-CoV (Table 4 and consistent with the anticipated association of the E protein FIG. 4B). InterProScan analysis predicts it as a type I mem- with the viral envelope. Both programs predict that both the N brane glycoprotein. Important features of the S protein of 5 and C termini are located on the surface of the virus. CoV-HKU1 are depicted in FIG. 5. PrositeScan of the S ORF 7 (bases 27633-28304) encodes the predicted M proprotein of CoV-HKU1 reveals 28 potential N-linked glyco- tein with 223 amino acids. The M protein of CoV-HKU1 has sylation (12 NXS and 16 NXT) sites. SignalP analysis reveals 76-78% amino acid identities with the M proteins of other a signal peptide probability of 0.909, with a cleavage site Group 2 coronavirus, but has fewer than 40% amino acid between residues 13 and 14. By multiple alignments with the io identities with the M proteins of Group 1, Group 3, and S proteins of other Group 2 coronaviruses, a potential cleav- SARS-CoV (Table 4 and FIG. 4B). PFAM analysis of the age site located after RRKRR (SEQ ID NO: 19), between ORF shows that the predicted M protein is a member of the residues 760 and 761, where S will be cleaved into 51 and S2, coronavirus matrix glycoprotein (Corona_M) family (PFAM is identified. Immediately upstream to RRKRR (SEQ ID NO: accession no.: PFO1 635). SignalP analysis predicts the pres- 19), there is a series of five serine residues that are not present 15 ence of a transmembrane anchor (probability 0.926). TMpred in any other known coronaviruses (FIG. 5). Most of the S analysis of the ORF shows three transmembrane domains at protein (residues 15 to 1300) is exposed on the outside of the positions 21 to 42, 53 to 74, and 77 to 98. TMHMM analysis virus, with a transmembrane domain at the C terminus (TM- of the ORF shows three transmembrane domains at positions HMM analysis of the ORF shows one transmembrane domain 20 to 39, 46 to 68, and 78 to 100. The N terminal 19-20 amino at positions 1301 to 1356), followed by a cytoplasmic tail rich 20 acids are located on the outside and the C terminal 123-125- in cysteine residues. Two heptad repeats (HR), located at amino acid hydrophilic domain on the inside of the virus. residues 982 to 1083 (HR1) and 1250 to 1297 (HR2), identi- ORF 8 (bases 28320-29645) encodes the predicted N profied by multiple alignments with other coronaviruses, are tein (PFAM accession no.: PF00937) with 441 amino acids. present. In MHV, it has been confirmed that the receptor for The N protein of CoV-HKU1 has 57-62% amino acid identiits S protein binding is CEACAMI, a member of the carci- 25 ties with the N proteins of other Group 2 coronaviruses, but noembryonic antigen (CEA) family of glycoproteins in the has fewer than 40% amino acid identities with the N proteins immunoglobulin superfamily. Furthermore, it has been of Group 1, Group 3, and SARS-CoV (Table 4 and FIG. 4B). shown by site-directed mutage-tesis, that three conserved ORF 9 (bases 28342-28959) encodes a hypothetical proregions (sites T, II, and III) and some amino acid residues tein (N2) of 205 amino acids within the ORF that encodes the (Thr62, Thr212, Tyr214, and Tyr216 in MHV) in the N-terminal 30 predicted N protein. PFAM analysis of the ORF shows that of the S protein are particularly important for its receptor- the predicted protein is a member of the coronavirus nucleobinding activity. By multiple alignments with the N-terminal capsid I protein (Corona_I) family (PFAM accession no.: 330 amino acids of the S protein of MHV and other group 2 PF03187). This hypothetical N2 protein of CoV-HKU1 has coronaviruses, it is observed that these conserved regions and 32-39% amino acid identities with the N2 proteins of other amino acids are present in CoV-HKU1 (FIG. 5). This infers 35 Group 2 coronaviruses. that the receptor for CoV-HKU1 could be a member of the We report the characterization and complete genome CEA family on the surface of the cells in the respiratory tract. sequence of a novel coronavirus detected in the nasopharyn- On the other hand, for HCoV-OC43, it has been shown in geal aspirates of patients with pneumonia. The clinical sigvitro that the receptor for the S protein is a sialic acid. How- nificance of the virus in the first patient was evident by the ever, the amino acid residues on the S protein of HCoV-OC43 4o high viral loads in the patient's nasopharyngeal aspirates that are important for receptor binding are not well defined. during the first week of his illness, which coincided with the ORF 4 (bases 26960-27070) encodes a predicted protein acute symptoms developed in the patient. The viral load with 36 amino acids. This ORF overlaps with the ORF that decreased during the second week of the illness and was encodes the S protein. This ORF is not present in other coro- undetectable in the third week of the illness. In addition, the naviruses and BlastP analysis of the ORF does not show any 45 fall in viral load was accompanied by the recovery from the hits. illness and development of specific antibody response to the ORF 5 (bases 27051-27380) encodes a predicted protein recombinant N protein of the virus. Similar to other recently with 109 amino acids. This ORF overlaps with the ORF that discovered viruses, such as hepatitis C virus, GB virus C, encodes the E protein. PFAM analysis of the ORF shows that transfusion transmitted virus, and SEN virus, the present the predicted protein is a member of the coronavirus non- 50 virus could not be recovered from cell cultures using the structural protein NS2 family (PFAM accession no.: standard cell lines. This could be related to the inherently low PF04753). TMpred and TMHMM analysis do not reveal any recovery rate of coronaviruses. Human coronaviruses are partransmembrane helix. This predicted protein of CoV-HKU1 ticularly difficult to culture in vitro. Many decades after the has 44-51% amino acid identities with the corresponding recognition of HCoV-229E and HCoV-OC43, there are still proteins of other Group 2 coronaviruses. 55 only a handful of primary virus isolates available and organ ORF 6 (bases 27373-27621) encodes the predicted E pro- culture is required for primary isolation of HCoV-OC43. In tein with 82 amino acids. The E protein of CoV-HKU1 has our experience, SARS-CoV can only be recovered from less 54-60% amino acid identities with the E proteins of other than 20% of patients with serologically and RT-PCR docu- Group 2 coronaviruses, but has fewer than 35% amino acid mented SARS-CoV pneumonia. Therefore, it is not surprisidentities with the E proteins of Group 1, Group 3, and SARS- 60 ing that the new coronavirus CoV-HKU1 has been so far CoV (Table 4 and FIG. 4B). PFAM and InterProScan analy- proven difficult to culture in vitro. After the discovery of ses of the ORF show that the predicted E protein is a member CoV-HKU1 in the first patient, we conducted a preliminary of the non-structural protein NS3/Small envelope protein E study on 400 nasopharyngeal aspirates that were collected (NS3_envE) family (PFAM accession no.: PF02723). Sig- last year during the SARS epidemic period. Among these 400 nalp analysis predicts the presence of a transmembrane 65 nasopharyngeal aspirates, CoV-HKU1 was detected in one anchor (probability 0.995). TMpred analysis of the ORF specimen, with a viral load comparable to that of the first shows two transmembrane domains at positions 16 to 34 and patient. These results have suggested that CoV-HKU1 is not
33 only incidentally found in one patient, but a previously unrecognized coronavirus associated with pneumoma. Genomic analysis has reveals that CoV-HKU1 is a Group 2 coronavirus. The genome organization of CoV-HKU1 concurs with those of other coronaviruses, with the characteristic gene order, i.e., 5'-repticase, S, E, M, N-3', short untranslated regions in both 5' and 3' ends, 5' conserved coronavirus core leader sequence, putative TRS upstream to multiple ORFs, and conserved pseudoknot in the 3' untranslated region. In contrast to coronaviruses of other groups, CoV-HKU1 contains certain features that are characteristics of Group 2 coronaviruses, including the presence of HE, ORF 5, and N2. Phylogenetic analysis of the 3CLP, replicase, helicase, S, E, M, and N proteins showed that these genes of CoV-HKU1 were clustered with the corresponding genes in other Group 2 coronaviruses. However, the proteins of CoV-HKU1 formed distinct branches in the phylogenetic trees, indicating that CoV-HKU1 is a distinct member of the group, and is not very closely related to any other known members of Group 2 coronaviruses (FIGS. 4A and 4B). In addition to phylogenetic analysis of the putative proteins, CoV-HKU1 exhibits certain features that are distinct from other Group 2 coronaviruses. Compared to other Group 2 coronaviruses, there is a deletion of about 800 bps between the replicase ORF lb and the HE ORF 2 in CoV-HKU1. In other Group 2 coronaviruses, including MHV, SDAV, HCoV- OC43 and BCoV, an ORF of 798-837 bp (273-278 amino acids) is present between the replicase lb ORF and the HE ORF 2. This ORF encodes a protein of the coronavirus nonstructural protein NS2a family (PFAM accession no.: PF05213). The absence of this ORF in CoV-HKU1 indicates that this is probably a non-essential gene of coronavirus. In addition to the deletion, the N-terminal of the putative PLP in ORF la contains 14 tandem copies of a 30-bp repeat that codes for a highly acidic domain. Similar repeats, with different amino acid compositions, have been found in the genomes of human, rat and parasites, but have not been found in other coronaviruses. The function of these repeats is not well understood, although some authors have suggested that the repeats could be important antigens, and their biological role may be related to their special three-dimensional structures. The vitellaria antigenic protein of Clonorchis sinensis contains 23 tandem copies of a 30-bp repeat that codes for DGGAQPPKSG (SEQ ID NO:20). In the case of Plasmodiumfalciparum, it has been shown that the antigenicity of the circumsporozoite protein is due to its repeating epitope structure. It has also been suggested that the tandemly repeated peptide may induce strong humoral immune response in the infected host and thus may also be useful in serological diagnosis. Further experiments should be performed to delineate the antigenic properties, biological role, and possible clinical usefulness of the repeat in the PLP of CoV-HKU1. US 8,092,994 B2 34 The geographical, political, and economic location of Hong Kong makes it a unique place for the study of emerging infectious disease. Hong Kong, as the gateway of southern China, with thousands of people crossing the border on sur- 5 face and by air every day, has a high potential of importing and exporting infectious diseases to and from China, countries in Southeast Asia and from the rest of the world. In 1997, the first 18 human cases of avian influenza A H5N1 virus 10 infection were reported in Hong Kong. In early 2003, two cases of human infection caused by avian influenza A (H5N1) that was acquired in Fujian, were diagnosed in Hong Kong, which provided an early warning of the impending disease threat for humans and poultry in Southeast Asia that followed 15 in 2004. For the SARS epidemic, although both epidemiological and genomic evidence revealed that the disease had first occurred in southern China in November 2002, it did not receive as much international attention until the disease was spread to Hong Kong and through Hong Kong to Singapore, 20 Toronto, Vietnem, and the United States of America. As for emerging bacterial infections, 50% of the patients with gastroenteritis associated with the recovery of Laribacter hongkongensis had recent history of travel to southern China. 25 In this report, one of the patients also had recent history of travel to Shenzhen of China prior to the development of the respiratory illness. We speculate that he might have contacted the virus in Shenzhen. More intensive surveillance of emerging infectious pathogens in this locality is warranted. 30 7. MARKET POTENTIAL The genome of CoV-HKU1 is completely sequenced. This allows the development of various diagnostic tests as 35 described hereinabove. In addition, this virus contains genetic information which is extremely important and valuable for clinical and scientific research applications. 8. EQUIVALENTS 40 Those skilled in the art will recognize, or be able to ascertain many equivalents to the specific embodiments of the invention described herein using no more than routine experimentation. Such equivalents are intended to be encompassed 45 by the following claims. All publications, patents and patent applications mentioned in this specification are herein incorporated by reference into the specification to the same extent as if each individual publication, patent or patent application was 50 specifically and individually indicated to be incorporated herein by reference. Citation or discussion of a reference herein shall not be construed as an admission that such is prior art to the present invention. SEQUENCE LISTING The patent contains a lengthy "Sequence Listing" section. A copy of the "Sequence Listing" is available in electronic form from the USPTO web site (http://segdata.uspto.gov/?pagerequest=docdetail&docid=us08092994b2). An electronic copy of the "Sequence Listing" will also be available from the USPTO upon request and payment of the fee set forth in 37 CFR 1.19(b)(3).
US 8,092,994 B2 35 36 What is claimed: otide sequence of SEQ ID NO: 1 or 3, or a full length comple- 1. A method for detecting the presence of a first nucleic acid ment of a sequence comprising at least 45 contiguous nuclemolecule comprising the nucleotide sequence of SEQ ID NO: otides of SEQ ID NO:1 or 3. 1 or a fragment thereof or a full length complement thereof in 6. The method of claim 4, wherein the second nucleic acid a biological sample, said method comprising: 5 molecule comprises at least 100, 150, 200, 300, or 350 con- (a) contacting the biological sample with a second nucleic tiguous nucleotides of the nucleotide sequence of SEQ ID acid molecule that selectively binds to said first nucleic NO: 1 or of a full length complement of a sequence comprisacid molecule, wherein the second nucleic acid mol- ing at least 100, 150, 200, 300, or 350 contiguous nucleotides ecule comprises at least 45 contiguous nucleotides of of SEQ ID NO: 1. SEQ ID NO: 1 or of a full length complement of a 10 7. The method of claim 4, wherein the second nucleic acid sequence comprising at least 45 contiguous nucleotides molecule comprises at least 100, 150, 200, 300, 350, 400, of SEQ ID NO: 1; and 450, 500, 550, 600, 650, 700, 750, 800, 850, 900, 950, 1000, (b) detecting whether the second nucleic acid binds to a 1050,1100, 1150,1200, 2000, 3000, 4000, 5000, 6000, 7000, nucleic acid molecule in the sample under conditions of 8000, 9000, 10000, 11000, 12000, 13000, 14000, 15000, strict hybridization. 15 16000, 17000, 18000, 19000, 20000, 21000, 22000, 23000, 2. The method of claim 1, wherein the second nucleic acid 24000, 25000, 26000, 27000, 28000, or 29000 contiguous molecule that binds to said first nucleic acid molecule com- nucleotides of the nucleotide sequence of SEQ ID NO: 3 or of prises the nucleotide sequence of SEQ ID NO: 1 or a full a full length complement of a sequence comprising at least length complement of a sequence comprising at least 45 100, 150, 200, 300, 350, 400, 450, 500, 550, 600, 650, 700, contiguous nucleotides of SEQ ID NO: 1. 20 750 800, 850,900,950,1000,1050,1100,1150,1200, 2000, 3. The method of claim 1, wherein the second nucleic acid 3000, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 11000, molecule comprises at least 100, 150, 200, 300, or 350 con- 12000, 13000, 14000, 15000, 16000, 17000, 18000, 19000, tiguous nucleotides of the nucleotide sequence of SEQ ID 20000, 21000, 22000, 23000, 24000, 25000, 26000, 27000, NO: 1 or of a full length complement of a sequence compris- 28000, or 29000 contiguous nucleotides of SEQ ID NO: 3. ing at least 100, 150, 200, 300, or 350 contiguous nucleotides 25 8. A method for identifying a subject infected with CoVof SEQ ID NO: 1. HKU1, comprising: 4. A method for detecting the presence of a first nucleic acid (a) obtaining total RNA from a biological sample obtained molecule comprising the nucleotide sequence of SEQ ID NO: from the subject; 3 or a fragment thereof or a full length complement thereof in (b) reverse transcribing the total RNA to obtain cdna; and a biological sample, said method comprising: 30 (c) amplifying the cdna using a set of primers derived (a) contacting the biological sample with a second nucleic from the nucleotide sequence of SEQ ID NO: 1 or 3, or acid molecule that selectively binds to said first nucleic from a full length complement of SEQ ID NO:1 or 3. acid molecule, wherein the second nucleic acid mol- 9. The method of claim 8, wherein the set of primers ecule comprises at least 45 contiguous nucleotides of comprises first and second primers, said first and second SEQ ID NO: 1 or 3 or of a full length complement of a 35 primers comprising the nucleotide sequences of SEQ ID sequence comprising at least 45 contiguous nucleotides NOS: 4 and 5, respectively. of SEQ ID NO:1 or 3; and 10. The method of claim 8, wherein the set of primers (b) detecting whether the second nucleic acid molecule comprises first and second primers, said first and second binds to a nucleic acid molecule in the sample under primers comprising the nucleotide sequences of SEQ ID conditions of strict hybridization. 4o NOS: 6 and 7, respectively. 5. The method of claim 4, wherein the compound that binds to said second nucleic acid molecule comprises the nucle-