Heuristic Approach for Gene Prediction in Prokaryotes (Reload this page)
Reference:Besemer J. and Borodovsky M., Heuristic approach to deriving models for gene finding, NAR, 1999, Vol. 27, No. 19, pp. 3911-3920.
[ Download PDF ]

The models used by GeneMark.hmm 2.0 and GeneMark 2.4 are derived from parameters measured from the input sequences and knowledge gained through the study of various bacterial genomes.These models have been shown to accurately predict genes in bacterial, viral and plasmid DNA sequences. Please note that email is the only way to receive output for sequences larger than 1 MB.

UPDATE (June 1, 2001): Web site has been redesigned and moved a to new, more powerful server
Listing of previous updates


Gene Prediction Results

Information on input sequence

Sequence title: Fri May  3 12:57:39 EDT 2002
Length:         10079 bp
G+C percentage: 42.17 %

Parse predicted by GeneMark.hmm 2.0

GeneMark.hmm PROKARYOTIC (Version 2.1)
Sequence file name: sequence,	RBS: N
Model file name: heuristic_no_rbs.mat
Model organism: Heuristic_model
Fri May  3 12:57:50 2002

Predicted genes
   Gene    Strand    LeftEnd    RightEnd       Gene     Class
    #                                         Length
    1        +          <2          43           42        1
    2        -          77         550          474        1
    3        -         850        1128          279        1
    4        +        1359        1583          225        1
    5        +        1580        1822          243        1
    6        +        1815        3719         1905        1
    7        +        3716        3910          195        1
    8        +        3907        4212          306        1
    9        +        4212        4643          432        1
   10        +        4681        5271          591        1
   11        +        5523        6332          810        1
   12        +        6345        6659          315        1
   13        +        6659        6781          123        1
   14        +        6781        7008          228        1
   15        +        7079        7918          840        1
   16        +        7918        8925         1008        1
   17        +        8925        9407          483        1
   18        +        9400        9783          384        1
   19        +        9922       10077          156        1

Listing of GeneMark Predictions

                              GENEMARK PREDICTIONS

Sequence: Fri May  3 12:57:39 EDT 2002
Sequence file: gm_sequence
Sequence length: 10079
GC Content:  42.17%
Window length: 96
Window step: 12
Threshold value: 0.500
---
Matrix: Heuristic model
Matrix author: MB/JDB
Matrix order: 2

List of Open reading frames predicted as CDSs, shown with alternate starts
(regions from start to stop codon w/ coding function >0.50)

Left      Right     DNA         Coding Avg   Start
end       end       Strand      Frame  Prob  Prob
--------  --------  ----------  -----  ----  ----

      77       250  complement  fr 1   0.61  0.06  

     850      1128  complement  fr 3   0.64  0.77  
     850      1059  complement  fr 3   0.61  0.11  

    1359      1583  direct      fr 3   0.55  0.56  
    1440      1583  direct      fr 3   0.61  0.22  

    1779      3719  direct      fr 3   0.66  0.10  
    1782      3719  direct      fr 3   0.66  0.15  
    1800      3719  direct      fr 3   0.67  0.31  
    1815      3719  direct      fr 3   0.68  0.69  
    2052      3719  direct      fr 3   0.70  0.07  
    2175      3719  direct      fr 3   0.73  0.54  
    2202      3719  direct      fr 3   0.72  0.58  

    3716      3910  direct      fr 2   0.66  0.70  
    3749      3910  direct      fr 2   0.74  0.55  
    3809      3910  direct      fr 2   0.62  0.04  

    4087      4212  direct      fr 1   0.86  0.68  

    4212      4643  direct      fr 3   0.76  0.95  
    4374      4643  direct      fr 3   0.81  0.04  
    4398      4643  direct      fr 3   0.80  0.03  
    4407      4643  direct      fr 3   0.79  0.03  

    4681      5271  direct      fr 1   0.67  0.25  
    4684      5271  direct      fr 1   0.67  0.27  
    4750      5271  direct      fr 1   0.66  0.05  

    5523      6332  direct      fr 3   0.67  0.01  
    5613      6332  direct      fr 3   0.74  0.28  
    5661      6332  direct      fr 3   0.76  0.13  
    5778      6332  direct      fr 3   0.79  0.36  
    5790      6332  direct      fr 3   0.79  0.42  
    5799      6332  direct      fr 3   0.78  0.31  

    6345      6659  direct      fr 3   0.64  0.19  
    6387      6659  direct      fr 3   0.76  0.58  
    6498      6659  direct      fr 3   0.67  0.07  

    6781      7008  direct      fr 1   0.55  0.90  
    6796      7008  direct      fr 1   0.56  0.72  
    6820      7008  direct      fr 1   0.53  0.29  

    7079      7918  direct      fr 2   0.83  0.01  
    7202      7918  direct      fr 2   0.89  0.16  
    7208      7918  direct      fr 2   0.89  0.15  
    7256      7918  direct      fr 2   0.88  0.14  

    7918      8925  direct      fr 1   0.80  0.97  
    8059      8925  direct      fr 1   0.82  0.12  
    8086      8925  direct      fr 1   0.83  0.32  
    8104      8925  direct      fr 1   0.83  0.43  

    8925      9407  direct      fr 3   0.87  0.91  
    8940      9407  direct      fr 3   0.89  0.44  
    8970      9407  direct      fr 3   0.88  0.02  
    8976      9407  direct      fr 3   0.88  0.01  

    9400      9783  direct      fr 1   0.77  0.82  
    9538      9783  direct      fr 1   0.76  0.30  
    9556      9783  direct      fr 1   0.75  0.34  
    9580      9783  direct      fr 1   0.72  0.59  

    9819      9941  direct      fr 3   0.55  0.17  
    9825      9941  direct      fr 3   0.55  0.29  

    9922     10077  direct      fr 1   0.52  0.63  
    9985     10077  direct      fr 1   0.92  ....  

List of Regions of interest
(regions from stop to stop codon w/ a signal in between)

   LEnd      REnd    Strand      Frame
 --------  --------  ----------- -----
       77       637  complement  fr 1
      145       462  complement  fr 3
      850      1158  complement  fr 3
     1350      1583  direct      fr 3
     1565      1822  direct      fr 2
     1776      3719  direct      fr 3
     3085      3429  direct      fr 1
     3680      3910  direct      fr 2
     3877      4212  direct      fr 1
     4206      4643  direct      fr 3
     4606      5271  direct      fr 1
     5520      6332  direct      fr 3
     6333      6659  direct      fr 3
     6650      6781  direct      fr 2
     6772      7008  direct      fr 1
     7026      7172  direct      fr 3
     7073      7918  direct      fr 2
     7912      8925  direct      fr 1
     8871      9407  direct      fr 3
     9373      9783  direct      fr 1
     9732      9941  direct      fr 3
     9910     10077  direct      fr 1

POSSIBLE SEQUENCE FRAMESHIFTS DETECTED
 From   To
 Frame  Frame  At base...
 -----  -----  ----------
   3      1          288 +/- 11 bp  (complement)
   3      1         3144 +/- 11 bp  (direct)
   1      3         3264 +/- 11 bp  (direct)


Protein translations of predicted genes

>Translation: 77..550 (reverse), 158 amino acids
MTDFKQLLFRAGFMNFGKLDRRAAMEFLFINSERTLERWIAENKPCPRAVAMLKQRINGG
MALHKDWGGFYICRGGYLWTPRGKKYDASYINKLDFLQSSVRYNESHVNALQNQIDHLHD
LVAASETLKTIGNDLIKMSDSLALKEIVMKYGDKQRA*

>Translation: 850..1128 (reverse), 93 amino acids
MKITTSIELLDWFKSVVDIDSDYMVSKLTGIPKQTLSTVRTGNSEFSDYTALKLLLVGEH
PEPLKGMALLEAHKAERNGNEEQAKLWRKSVA*

>Translation: 1359..1583 (direct), 75 amino acids
MYIDATQYRNDDEFTQYAKGKVAQLRLMLNSKKSALQKDKELQQQAKAQESALAGEELRR
RALSLATQNRMVTL*

>Translation: 1580..1822 (direct), 81 amino acids
MSQRGISKGLLVCHSVRLAKVWVDQIEISIPLGVEPEPKQIEGISDLIDSLHSYDCSVCS
VVAHKLDDAMCRWCELLLDA*

>Translation: 1815..3719 (direct), 635 amino acids
MRSIAIGHEPIKGGLKPIELIRPVPFCSPAFGFDKAAEYAKEQLKKIPTTYLRRHAAKLY
AARFNSTTAKNPEKSANIFMRELVKRVDQIVTRSPLNITELQRDKRRKDKAKQLALICQQ
MGIVDFDKSMTLEQATALVISKYQKLAEFTINQIDTAPAYSTYEAFMKKGGNPEILADKL
EIAIRRMTCDKWWQRKLNRARDMTLEHLNITLGLVNKKKSPYASLQAVNEFKYAKKSQQK
WLDSMQLESDDGETTLDLAEVFKGSVSNPEIRRVELMVRIRGYEEYAQEQGMKAVFYTIT
APSKYHANSKKYNNATPKETQAFLVNQWAKARAELNKIDVPVFGVRVVEPHHDATPHWHM
LLFMLPEHEQVTTKALRGYAMQIDGDEKGAEQARFTAENIDPSKGSAVGYIAKYISKNIN
ANHIEGEKDNETGGEFNNENGLVLNVGAWASRWRIRQFQFVGGASVGVWREIRRAKPEML
DKSTDVLREIFSAADNSQFAQFINLMGGAFAKRSERPIQISRVADGLNEYGEEKKRVVGL
ESCSNVLKTRLMRFALKKRSDSDAPWSTENNCNHPANDWQIGAGDRLNPIAFIPKEIRNN
VLRGATYYEVDEQLKTITEFKVKNNQLTQESIGL*

>Translation: 3716..3910 (direct), 65 amino acids
MNRDIYTQIETVGVLEKRVEKAGTFELKAAAMALAKAQRKLSVLLAKGMAELEDRLKIAE
LRTK*

>Translation: 3907..4212 (direct), 102 amino acids
MNPTIAKITCPLCGNDEATVHRQKDRKKKLYYRCTGATFADGCGTIQCTGASGQAFISKN
MKPLNGVESEDAAIEAAEDAKAEQVKPNKKRSFLDFLVDDE*

>Translation: 4212..4643 (direct), 144 amino acids
MPAAKKQIEEKPEVEQDLGAPDFSDLLDDDEKTLIDSVVNDDDESDELTDDAIGMAVGEL
VGMGVMFLTDYLAERRGEHWNVSTKELKQLAKAVDGSVPDTELSPAWALVAVSVGMFAPR
VVVDIQLNKRKVIEVENDDKKAD*

>Translation: 4681..5271 (direct), 197 amino acids
VVGATGSGKSAFIRDQVDFKGARVLAWDVDEDYRLPRVRSIKQFEKLVKKSGFGAIRCAL
TVEPTEENFERFCQLVFAISHAGAPMVVIVEELADVARIGKASPHWGQLSRKGRKYGVQL
YVATQSPQEIDKTIVRQCNFKFCGALNSASAWRSMADNLDLSTREIKQLENIPKKQVQYW
LKDGTRPTEKKTLTFK*

>Translation: 5523..6332 (direct), 270 amino acids
MRSFLNLNSIPNVAAGNSCSIKLPIGQTYEVIDLRYSGVTPSQIKNVRVELDGRLLSTYK
TLNDLILENTRHKRKIKAGVVSFHFVRPEMKGVNVTDLVQQRMFALGTVGLTTCEIKFDI
DEAAAGPKLSAIAQKSVGTAPSWLTMRRNFFKQLNNGTTEIADLPRPVGYRIAAIHIKAA
GVDAVEFQIDGTKWRDLLKKADNDYILEQYGKAVLDNTYTIDFMLEGDVYQSVLLDQMIQ
DLRLKIDSTMDEQAEIIVEYMGVWSRNGF*

>Translation: 6345..6659 (direct), 105 amino acids
MNTSVPTSVPTNQSVWGNVSTGLDALISGWARVEQIKAAKASTGQGRVEQAMTPELDNGA
AVVVEAPKKAAQPSETLVFGVPQKTLLLGFGGLLVLGLVMRGNK*

>Translation: 6659..6781 (direct), 41 amino acids
MQKPSGKGLKYFAYGVAISAAGAILAEYVRDWMRKPKAKS*

>Translation: 6781..7008 (direct), 76 amino acids
MLGALMGVAGGAPMGGASPMGGMPSIASSSSAETGQQTQSGNFTGGGINFGSNNNNQLLI
VGAVVIGLFLVIKRK*

>Translation: 7079..7918 (direct), 280 amino acids
MGLFGGGNSKSTSNQTTNNENTNIATQGDNLGAVINGNGNSVTMTDHGLVDALVDIGGYM
SDSTQAAFGAASDMAYSSTEFAGQAITDGFDYAEGVNRDSLDMAEGINRDSLNFGRDALS
VTGDLMTDAMQYSSDAMLASIEGNAGLAGQVMDASTTMTGQSLNFGLDTFSGAMDSLNQS
NNNMALLAEFTSNQSTDLARDSMAFGADLMAQYQDNISASNYDAREHMLDASKTAMQFAD
NMSRSDGQQLAKDSNKTLMIGIVAVSAAVGLYAISKGVN*

>Translation: 7918..8925 (direct), 336 amino acids
MIVKKKLAAGEFAETFKNGNNITIIKAVGELVLRAYGADGGEGLRTIVRQGVSIKGMNYT
SVMLHTEYAQEIEYWVGDLDYSFQEQTTKSRDVNSFQIPLRDGVRELLPEDASRNRASIK
SPVDIWIGGENMTALNGIVDGGRKFEAGQEFQINTFGSVNYWVSDEEIRVFKEYSARAKY
AQNEGRTALEANNVPFFDIDVPPELDGVPFSLKARVRHKSKGVDGLGDYTSISVKPAFYI
TEGDETTDTLIKYTSYGSTGSHSGYDFDDNTLDVMVTLSAGVHRVFPVETELDYDAVQEV
QHDWYDESFTTFIEVYSDDPLLTVKGYAQILMERT*

>Translation: 8925..9407 (direct), 161 amino acids
MKKAHMFLATAAALGVAMFPTQINEAARGLRNNNPLNIKEGSDGGAQWEGEHELDLDPTF
EEFKTPVHGIRAGARILRTYAVKYGLESIEGIIARWAPEEENDTENYINFVANKTGIPRN
QKLNDETYPAVISAMIDMENGSNPYTYDEIKKGFEWGFYG*

>Translation: 9400..9783 (direct), 128 amino acids
MANFLTKNFVWILAAGVGVWFYQKADNAAKTATKPIADFLAELQFLVNGSNYVKFPNAGF
VLTRDALQDDFIAYDDRIKAWLGTHDRHKDFLAEILDHERRVKPVYRKLIGNIIDASTIR
AASGVEL*

>Translation: 9922..10077 (direct), 52 amino acids
MFSTLAKYLAVKLLTETFIKRVCLATAKHLANKSENTLDNELIDALEDALN*


Input Sequence
Title (optional):


Sequence:


Sequence File upload:


Use alternate genetic code:
      Mycoplasma (TGA = Trp)

Output Options
Email Address: (required for graphical output or sequences longer than 1000000 bp)


Generate PostScript graphics
Print GeneMark 2.4 predictions in addition to GeneMark.hmm predictions
Translate predicted genes into protein


Run 

Web pages maintained by GeneMark administrator, gte851w@prism.gatech.edu. Please send any suggestions for improvements or problems to the web page maintainer.