Menu

Tree [995a78] master /
 History

HTTPS access


File Date Author Commit
 example 2024-12-19 yye yye [9a6de6] release 1.32
 train 2014-03-14 Wazim MohammedIsmail Wazim MohammedIsmail [33ce27] Initial commit
 FragGeneScan 2024-12-19 yye yye [9a6de6] release 1.32
 Makefile 2014-08-07 Wazim Mohammed Ismail Wazim Mohammed Ismail [2115a6] Thread support added. Fixed memory leaks
 README 2024-12-19 yye yye [9a6de6] release 1.32
 hmm.h 2022-08-01 Jeffrey Barrick Jeffrey Barrick [a53ed3] Fixed clang compile and bad_alloc error; works ...
 hmm_lib.c 2024-12-19 yye yye [9a6de6] release 1.32
 hmm_lib.o 2024-12-19 yye yye [9a6de6] release 1.32
 run_FragGeneScan.pl 2024-12-19 yye yye [9a6de6] release 1.32
 run_hmm.c 2024-12-19 yye yye [9a6de6] release 1.32
 run_hmm.o 2024-12-19 yye yye [9a6de6] release 1.32
 util_lib.c 2022-08-01 Jeffrey Barrick Jeffrey Barrick [a53ed3] Fixed clang compile and bad_alloc error; works ...
 util_lib.h 2022-08-01 Jeffrey Barrick Jeffrey Barrick [a53ed3] Fixed clang compile and bad_alloc error; works ...
 util_lib.o 2024-12-19 yye yye [9a6de6] release 1.32

Read Me

Installation
=============
To install FragGeneScan, please follow the steps as below:

1. Untar the downloaded file "FragGeneScan.tar.gz". This will automatically generate the directory "FragGeneScan".
   Replace FragGeneScan with specific version that you downloaded.
   Most recent version: FragGeneScan1.32 (updated on Dec 19, 2024)

2. Make sure that you also have a C compiler such as "gcc"

3. Run "makefile" to compile and build excutable "FragGeneScan"
	make clean
	make fgs

A note before you run FragGeneScan
====================
   You may directly call FragGeneScan instead of calling run_FragGeneScan.pl. 
   All the additional functionalities included in this script in earlier releases are now included in the main function.
   run_FragGeneSan.pl is still included in this package just in case.


Running the program
====================

1.  To run FragGeneScan, 

./FragGeneScan -s [seq_file_name] -o [output_file_name] -w [1 or 0] -t [train_file_name] -p [num_thread]


[seq_file_name]: sequence file name including the full path
[output_file_name]: output file name including the full path
[whole_genome]: 1 if the sequence file has complete genomic sequences, or contigs
		0 if the sequence file has short sequence reads
[train_file_name]: file name that contains model parameters; this file should be in the "train" directory. 
		   Note that four files containing model parameters already exist in the "train" directory. 
		   [complete] for complete genomic sequences or short sequence reads without sequencing error
		   [sanger_5] for Sanger sequencing reads with about 0.5% error rate
		   [sanger_10] for Sanger sequencing reads with about 1% error rate
		   [454_5] for 454 pyrosequencing reads with about 0.5% error rate
		   [454_10] for 454 pyrosequencing reads with about 1% error rate
		   [454_30] for 454 pyrosequencing reads with about 3% error rate
		   [illumina_5] for Illumina sequencing reads with about 0.5% error rate
		   [illumina_10] for Illumina sequencing reads with about 1% error rate
[num_thread]: number of thread used in FragGeneScan. Default 1.

If you want to use the perl wrapper, here is the command:
 ./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name] -complete=[1 or 0] -train=[train_file_name] -thread=[num_thread]

2. To test FragGeneScan with a complete genomic sequence,

./FragGeneScan -s ./example/NC_000913.fna -o ./example/NC_000913-fgs -w 1 -t complete -p 1

[NC_000913.fna]: this sequence file has the complete genomic sequence of E.coli
(NCBI gene predictions for this genome are available under the same folder example/)


3. To test FragGeneScan with sequencing reads,

./FragGeneScan -s ./example/NC_000913-454.fna -o ./example/NC_000913-454-fgs -w 0 -t 454_10 -p 1

[NC_000913-454.fna]: this sequence file has simulated reads (pyrosequencing, average length = 400 bp and sequencing error = 1%) generated using Metasim

For illumina reads, please use illumina_5 or illumina_10 as the train model.


4. To test FragGeneScan with assembly contigs,
./FragGeneScan -s ./example/contigs.fna -o ./example/contigs-fgs -w 1 -t complete -p 1

Note: -w 1 (i.e., complete=1) & -t complete (i.e., train=complete) are used as the parameters.

Output
============
Upon completion, FragGeneScan generates four files. 

1. The first file is "[output_file_name].out", which lists the coordinates of putative genes. This file consists of five columns (start position, end position, strand, frame, score).  For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome
108     440     -       3       1.378688
337     2799    +       1       1.303498
2801    3733    +       2       1.317386
3734    5020    +       2       1.293573
5234    5530    +       2       1.354725
5683    6459    -       1       1.290816
6529    7959    -       1       1.326412
8238    9191    +       3       1.286832
9306    9893    +       3       1.317067


2. The second file is  '[output_file_name].ffn", which lists nucleotide sequences corresponding to the putative genes in "[output_file_name].out". For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e
nd=338 strand=-
GTTGTTACCTCGTTACCTTTGGTCGAAAAAAAAAGCCCGCACTGTCAGGTGCGGGCTTTTTTCTGTGTTTCCTGTACGCGTCAGCCCGCACCGTTACCTG
TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTTCATGGATGTTGTGTACTCTGTAATTTTTATCTGTCTGTGCGCTATGCCTATATTGGT
TAAAGTATTTAGTGACCTAAGTCAA
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e
nd=2799 strand=+
TTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCC
TCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTAT
TTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAAAACAT
GTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCG


3. The third file is '[output_file_name].faa", which lists amino acid sequences corresponding to the putative genes in "[output_file_name].out". For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e
nd=338 strand=-
VVTSLPLVEKKSPHCQVRAFFCVSCTRQPAPLPVVMVMVVVMVVLMRFMDVVYSVIFICLCAMPILVKVFSDLSQ
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e
nd=2799 strand=+
LKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKH
VLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLG
RNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRDEDE
LPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVG
DGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGV
ANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKF
LYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEP
VLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTA
AGVFADLLRTLSWKLGV
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=2801
end=3733 strand=+
VKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSAC
SVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCI
AHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRLD
TAGARVLEN

4. [output_file_name].gff gene prediction results in gff format.


License
============
Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang.
You may redistribute this software under the terms of the GNU General Public License.