| Reference: | Besemer J. and Borodovsky M., Heuristic approach to deriving models for gene finding, NAR, 1999, Vol. 27, No. 19, pp. 3911-3920. [ Download PDF ] |
The models used by GeneMark.hmm 2.0 and GeneMark 2.4 are derived from parameters measured from the input sequences and knowledge gained through the study of various bacterial genomes.These models have been shown to accurately predict genes in bacterial, viral and plasmid DNA sequences. Please note that email is the only way to receive output for sequences larger than 1 MB. UPDATE (June 1, 2001): Web site has been redesigned and moved a to new, more powerful server Listing of previous updates | |
Sequence title: Fri May 3 12:57:39 EDT 2002 Length: 10079 bp G+C percentage: 42.17 %
GeneMark.hmm PROKARYOTIC (Version 2.1)
Sequence file name: sequence, RBS: N
Model file name: heuristic_no_rbs.mat
Model organism: Heuristic_model
Fri May 3 12:57:50 2002
Predicted genes
Gene Strand LeftEnd RightEnd Gene Class
# Length
1 + <2 43 42 1
2 - 77 550 474 1
3 - 850 1128 279 1
4 + 1359 1583 225 1
5 + 1580 1822 243 1
6 + 1815 3719 1905 1
7 + 3716 3910 195 1
8 + 3907 4212 306 1
9 + 4212 4643 432 1
10 + 4681 5271 591 1
11 + 5523 6332 810 1
12 + 6345 6659 315 1
13 + 6659 6781 123 1
14 + 6781 7008 228 1
15 + 7079 7918 840 1
16 + 7918 8925 1008 1
17 + 8925 9407 483 1
18 + 9400 9783 384 1
19 + 9922 10077 156 1
GENEMARK PREDICTIONS
Sequence: Fri May 3 12:57:39 EDT 2002
Sequence file: gm_sequence
Sequence length: 10079
GC Content: 42.17%
Window length: 96
Window step: 12
Threshold value: 0.500
---
Matrix: Heuristic model
Matrix author: MB/JDB
Matrix order: 2
List of Open reading frames predicted as CDSs, shown with alternate starts
(regions from start to stop codon w/ coding function >0.50)
Left Right DNA Coding Avg Start
end end Strand Frame Prob Prob
-------- -------- ---------- ----- ---- ----
77 250 complement fr 1 0.61 0.06
850 1128 complement fr 3 0.64 0.77
850 1059 complement fr 3 0.61 0.11
1359 1583 direct fr 3 0.55 0.56
1440 1583 direct fr 3 0.61 0.22
1779 3719 direct fr 3 0.66 0.10
1782 3719 direct fr 3 0.66 0.15
1800 3719 direct fr 3 0.67 0.31
1815 3719 direct fr 3 0.68 0.69
2052 3719 direct fr 3 0.70 0.07
2175 3719 direct fr 3 0.73 0.54
2202 3719 direct fr 3 0.72 0.58
3716 3910 direct fr 2 0.66 0.70
3749 3910 direct fr 2 0.74 0.55
3809 3910 direct fr 2 0.62 0.04
4087 4212 direct fr 1 0.86 0.68
4212 4643 direct fr 3 0.76 0.95
4374 4643 direct fr 3 0.81 0.04
4398 4643 direct fr 3 0.80 0.03
4407 4643 direct fr 3 0.79 0.03
4681 5271 direct fr 1 0.67 0.25
4684 5271 direct fr 1 0.67 0.27
4750 5271 direct fr 1 0.66 0.05
5523 6332 direct fr 3 0.67 0.01
5613 6332 direct fr 3 0.74 0.28
5661 6332 direct fr 3 0.76 0.13
5778 6332 direct fr 3 0.79 0.36
5790 6332 direct fr 3 0.79 0.42
5799 6332 direct fr 3 0.78 0.31
6345 6659 direct fr 3 0.64 0.19
6387 6659 direct fr 3 0.76 0.58
6498 6659 direct fr 3 0.67 0.07
6781 7008 direct fr 1 0.55 0.90
6796 7008 direct fr 1 0.56 0.72
6820 7008 direct fr 1 0.53 0.29
7079 7918 direct fr 2 0.83 0.01
7202 7918 direct fr 2 0.89 0.16
7208 7918 direct fr 2 0.89 0.15
7256 7918 direct fr 2 0.88 0.14
7918 8925 direct fr 1 0.80 0.97
8059 8925 direct fr 1 0.82 0.12
8086 8925 direct fr 1 0.83 0.32
8104 8925 direct fr 1 0.83 0.43
8925 9407 direct fr 3 0.87 0.91
8940 9407 direct fr 3 0.89 0.44
8970 9407 direct fr 3 0.88 0.02
8976 9407 direct fr 3 0.88 0.01
9400 9783 direct fr 1 0.77 0.82
9538 9783 direct fr 1 0.76 0.30
9556 9783 direct fr 1 0.75 0.34
9580 9783 direct fr 1 0.72 0.59
9819 9941 direct fr 3 0.55 0.17
9825 9941 direct fr 3 0.55 0.29
9922 10077 direct fr 1 0.52 0.63
9985 10077 direct fr 1 0.92 ....
List of Regions of interest
(regions from stop to stop codon w/ a signal in between)
LEnd REnd Strand Frame
-------- -------- ----------- -----
77 637 complement fr 1
145 462 complement fr 3
850 1158 complement fr 3
1350 1583 direct fr 3
1565 1822 direct fr 2
1776 3719 direct fr 3
3085 3429 direct fr 1
3680 3910 direct fr 2
3877 4212 direct fr 1
4206 4643 direct fr 3
4606 5271 direct fr 1
5520 6332 direct fr 3
6333 6659 direct fr 3
6650 6781 direct fr 2
6772 7008 direct fr 1
7026 7172 direct fr 3
7073 7918 direct fr 2
7912 8925 direct fr 1
8871 9407 direct fr 3
9373 9783 direct fr 1
9732 9941 direct fr 3
9910 10077 direct fr 1
POSSIBLE SEQUENCE FRAMESHIFTS DETECTED
From To
Frame Frame At base...
----- ----- ----------
3 1 288 +/- 11 bp (complement)
3 1 3144 +/- 11 bp (direct)
1 3 3264 +/- 11 bp (direct)
>Translation: 77..550 (reverse), 158 amino acids MTDFKQLLFRAGFMNFGKLDRRAAMEFLFINSERTLERWIAENKPCPRAVAMLKQRINGG MALHKDWGGFYICRGGYLWTPRGKKYDASYINKLDFLQSSVRYNESHVNALQNQIDHLHD LVAASETLKTIGNDLIKMSDSLALKEIVMKYGDKQRA* >Translation: 850..1128 (reverse), 93 amino acids MKITTSIELLDWFKSVVDIDSDYMVSKLTGIPKQTLSTVRTGNSEFSDYTALKLLLVGEH PEPLKGMALLEAHKAERNGNEEQAKLWRKSVA* >Translation: 1359..1583 (direct), 75 amino acids MYIDATQYRNDDEFTQYAKGKVAQLRLMLNSKKSALQKDKELQQQAKAQESALAGEELRR RALSLATQNRMVTL* >Translation: 1580..1822 (direct), 81 amino acids MSQRGISKGLLVCHSVRLAKVWVDQIEISIPLGVEPEPKQIEGISDLIDSLHSYDCSVCS VVAHKLDDAMCRWCELLLDA* >Translation: 1815..3719 (direct), 635 amino acids MRSIAIGHEPIKGGLKPIELIRPVPFCSPAFGFDKAAEYAKEQLKKIPTTYLRRHAAKLY AARFNSTTAKNPEKSANIFMRELVKRVDQIVTRSPLNITELQRDKRRKDKAKQLALICQQ MGIVDFDKSMTLEQATALVISKYQKLAEFTINQIDTAPAYSTYEAFMKKGGNPEILADKL EIAIRRMTCDKWWQRKLNRARDMTLEHLNITLGLVNKKKSPYASLQAVNEFKYAKKSQQK WLDSMQLESDDGETTLDLAEVFKGSVSNPEIRRVELMVRIRGYEEYAQEQGMKAVFYTIT APSKYHANSKKYNNATPKETQAFLVNQWAKARAELNKIDVPVFGVRVVEPHHDATPHWHM LLFMLPEHEQVTTKALRGYAMQIDGDEKGAEQARFTAENIDPSKGSAVGYIAKYISKNIN ANHIEGEKDNETGGEFNNENGLVLNVGAWASRWRIRQFQFVGGASVGVWREIRRAKPEML DKSTDVLREIFSAADNSQFAQFINLMGGAFAKRSERPIQISRVADGLNEYGEEKKRVVGL ESCSNVLKTRLMRFALKKRSDSDAPWSTENNCNHPANDWQIGAGDRLNPIAFIPKEIRNN VLRGATYYEVDEQLKTITEFKVKNNQLTQESIGL* >Translation: 3716..3910 (direct), 65 amino acids MNRDIYTQIETVGVLEKRVEKAGTFELKAAAMALAKAQRKLSVLLAKGMAELEDRLKIAE LRTK* >Translation: 3907..4212 (direct), 102 amino acids MNPTIAKITCPLCGNDEATVHRQKDRKKKLYYRCTGATFADGCGTIQCTGASGQAFISKN MKPLNGVESEDAAIEAAEDAKAEQVKPNKKRSFLDFLVDDE* >Translation: 4212..4643 (direct), 144 amino acids MPAAKKQIEEKPEVEQDLGAPDFSDLLDDDEKTLIDSVVNDDDESDELTDDAIGMAVGEL VGMGVMFLTDYLAERRGEHWNVSTKELKQLAKAVDGSVPDTELSPAWALVAVSVGMFAPR VVVDIQLNKRKVIEVENDDKKAD* >Translation: 4681..5271 (direct), 197 amino acids VVGATGSGKSAFIRDQVDFKGARVLAWDVDEDYRLPRVRSIKQFEKLVKKSGFGAIRCAL TVEPTEENFERFCQLVFAISHAGAPMVVIVEELADVARIGKASPHWGQLSRKGRKYGVQL YVATQSPQEIDKTIVRQCNFKFCGALNSASAWRSMADNLDLSTREIKQLENIPKKQVQYW LKDGTRPTEKKTLTFK* >Translation: 5523..6332 (direct), 270 amino acids MRSFLNLNSIPNVAAGNSCSIKLPIGQTYEVIDLRYSGVTPSQIKNVRVELDGRLLSTYK TLNDLILENTRHKRKIKAGVVSFHFVRPEMKGVNVTDLVQQRMFALGTVGLTTCEIKFDI DEAAAGPKLSAIAQKSVGTAPSWLTMRRNFFKQLNNGTTEIADLPRPVGYRIAAIHIKAA GVDAVEFQIDGTKWRDLLKKADNDYILEQYGKAVLDNTYTIDFMLEGDVYQSVLLDQMIQ DLRLKIDSTMDEQAEIIVEYMGVWSRNGF* >Translation: 6345..6659 (direct), 105 amino acids MNTSVPTSVPTNQSVWGNVSTGLDALISGWARVEQIKAAKASTGQGRVEQAMTPELDNGA AVVVEAPKKAAQPSETLVFGVPQKTLLLGFGGLLVLGLVMRGNK* >Translation: 6659..6781 (direct), 41 amino acids MQKPSGKGLKYFAYGVAISAAGAILAEYVRDWMRKPKAKS* >Translation: 6781..7008 (direct), 76 amino acids MLGALMGVAGGAPMGGASPMGGMPSIASSSSAETGQQTQSGNFTGGGINFGSNNNNQLLI VGAVVIGLFLVIKRK* >Translation: 7079..7918 (direct), 280 amino acids MGLFGGGNSKSTSNQTTNNENTNIATQGDNLGAVINGNGNSVTMTDHGLVDALVDIGGYM SDSTQAAFGAASDMAYSSTEFAGQAITDGFDYAEGVNRDSLDMAEGINRDSLNFGRDALS VTGDLMTDAMQYSSDAMLASIEGNAGLAGQVMDASTTMTGQSLNFGLDTFSGAMDSLNQS NNNMALLAEFTSNQSTDLARDSMAFGADLMAQYQDNISASNYDAREHMLDASKTAMQFAD NMSRSDGQQLAKDSNKTLMIGIVAVSAAVGLYAISKGVN* >Translation: 7918..8925 (direct), 336 amino acids MIVKKKLAAGEFAETFKNGNNITIIKAVGELVLRAYGADGGEGLRTIVRQGVSIKGMNYT SVMLHTEYAQEIEYWVGDLDYSFQEQTTKSRDVNSFQIPLRDGVRELLPEDASRNRASIK SPVDIWIGGENMTALNGIVDGGRKFEAGQEFQINTFGSVNYWVSDEEIRVFKEYSARAKY AQNEGRTALEANNVPFFDIDVPPELDGVPFSLKARVRHKSKGVDGLGDYTSISVKPAFYI TEGDETTDTLIKYTSYGSTGSHSGYDFDDNTLDVMVTLSAGVHRVFPVETELDYDAVQEV QHDWYDESFTTFIEVYSDDPLLTVKGYAQILMERT* >Translation: 8925..9407 (direct), 161 amino acids MKKAHMFLATAAALGVAMFPTQINEAARGLRNNNPLNIKEGSDGGAQWEGEHELDLDPTF EEFKTPVHGIRAGARILRTYAVKYGLESIEGIIARWAPEEENDTENYINFVANKTGIPRN QKLNDETYPAVISAMIDMENGSNPYTYDEIKKGFEWGFYG* >Translation: 9400..9783 (direct), 128 amino acids MANFLTKNFVWILAAGVGVWFYQKADNAAKTATKPIADFLAELQFLVNGSNYVKFPNAGF VLTRDALQDDFIAYDDRIKAWLGTHDRHKDFLAEILDHERRVKPVYRKLIGNIIDASTIR AASGVEL* >Translation: 9922..10077 (direct), 52 amino acids MFSTLAKYLAVKLLTETFIKRVCLATAKHLANKSENTLDNELIDALEDALN*