Menu

Tree [995a78] master /
 History

HTTPS access


File Date Author Commit
 example 2024-12-19 yye yye [9a6de6] release 1.32
 train 2014-03-14 Wazim MohammedIsmail Wazim MohammedIsmail [33ce27] Initial commit
 FragGeneScan 2024-12-19 yye yye [9a6de6] release 1.32
 Makefile 2014-08-07 Wazim Mohammed Ismail Wazim Mohammed Ismail [2115a6] Thread support added. Fixed memory leaks
 README 2024-12-19 yye yye [9a6de6] release 1.32
 hmm.h 2022-08-01 Jeffrey Barrick Jeffrey Barrick [a53ed3] Fixed clang compile and bad_alloc error; works ...
 hmm_lib.c 2024-12-19 yye yye [9a6de6] release 1.32
 hmm_lib.o 2024-12-19 yye yye [9a6de6] release 1.32
 run_FragGeneScan.pl 2024-12-19 yye yye [9a6de6] release 1.32
 run_hmm.c 2024-12-19 yye yye [9a6de6] release 1.32
 run_hmm.o 2024-12-19 yye yye [9a6de6] release 1.32
 util_lib.c 2022-08-01 Jeffrey Barrick Jeffrey Barrick [a53ed3] Fixed clang compile and bad_alloc error; works ...
 util_lib.h 2022-08-01 Jeffrey Barrick Jeffrey Barrick [a53ed3] Fixed clang compile and bad_alloc error; works ...
 util_lib.o 2024-12-19 yye yye [9a6de6] release 1.32

Read Me

Installation
=============
To install FragGeneScan, please follow the steps as below:

1. Untar the downloaded file "FragGeneScan.tar.gz". This will automatically generate the directory "FragGeneScan".
   Replace FragGeneScan with specific version that you downloaded.
   Most recent version: FragGeneScan1.32 (updated on Dec 19, 2024)

2. Make sure that you also have a C compiler such as "gcc"

3. Run "makefile" to compile and build excutable "FragGeneScan"
	make clean
	make fgs

A note before you run FragGeneScan
====================
   You may directly call FragGeneScan instead of calling run_FragGeneScan.pl. 
   All the additional functionalities included in this script in earlier releases are now included in the main function.
   run_FragGeneSan.pl is still included in this package just in case.


Running the program
====================

1.  To run FragGeneScan, 

./FragGeneScan -s [seq_file_name] -o [output_file_name] -w [1 or 0] -t [train_file_name] -p [num_thread]


[seq_file_name]: sequence file name including the full path
[output_file_name]: output file name including the full path
[whole_genome]: 1 if the sequence file has complete genomic sequences, or contigs
		0 if the sequence file has short sequence reads
[train_file_name]: file name that contains model parameters; this file should be in the "train" directory. 
		   Note that four files containing model parameters already exist in the "train" directory. 
		   [complete] for complete genomic sequences or short sequence reads without sequencing error
		   [sanger_5] for Sanger sequencing reads with about 0.5% error rate
		   [sanger_10] for Sanger sequencing reads with about 1% error rate
		   [454_5] for 454 pyrosequencing reads with about 0.5% error rate
		   [454_10] for 454 pyrosequencing reads with about 1% error rate
		   [454_30] for 454 pyrosequencing reads with about 3% error rate
		   [illumina_5] for Illumina sequencing reads with about 0.5% error rate
		   [illumina_10] for Illumina sequencing reads with about 1% error rate
[num_thread]: number of thread used in FragGeneScan. Default 1.

If you want to use the perl wrapper, here is the command:
 ./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name] -complete=[1 or 0] -train=[train_file_name] -thread=[num_thread]

2. To test FragGeneScan with a complete genomic sequence,

./FragGeneScan -s ./example/NC_000913.fna -o ./example/NC_000913-fgs -w 1 -t complete -p 1

[NC_000913.fna]: this sequence file has the complete genomic sequence of E.coli
(NCBI gene predictions for this genome are available under the same folder example/)


3. To test FragGeneScan with sequencing reads,

./FragGeneScan -s ./example/NC_000913-454.fna -o ./example/NC_000913-454-fgs -w 0 -t 454_10 -p 1

[NC_000913-454.fna]: this sequence file has simulated reads (pyrosequencing, average length = 400 bp and sequencing error = 1%) generated using Metasim

For illumina reads, please use illumina_5 or illumina_10 as the train model.


4. To test FragGeneScan with assembly contigs,
./FragGeneScan -s ./example/contigs.fna -o ./example/contigs-fgs -w 1 -t complete -p 1

Note: -w 1 (i.e., complete=1) & -t complete (i.e., train=complete) are used as the parameters.

Output
============
Upon completion, FragGeneScan generates four files. 

1. The first file is "[output_file_name].out", which lists the coordinates of putative genes. This file consists of five columns (start position, end position, strand, frame, score).  For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome
108     440     -       3       1.378688
337     2799    +       1       1.303498
2801    3733    +       2       1.317386
3734    5020    +       2       1.293573
5234    5530    +       2       1.354725
5683    6459    -       1       1.290816
6529    7959    -       1       1.326412
8238    9191    +       3       1.286832
9306    9893    +       3       1.317067


2. The second file is  '[output_file_name].ffn", which lists nucleotide sequences corresponding to the putative genes in "[output_file_name].out". For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e
nd=338 strand=-
GTTGTTACCTCGTTACCTTTGGTCGAAAAAAAAAGCCCGCACTGTCAGGTGCGGGCTTTTTTCTGTGTTTCCTGTACGCGTCAGCCCGCACCGTTACCTG
TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTTCATGGATGTTGTGTACTCTGTAATTTTTATCTGTCTGTGCGCTATGCCTATATTGGT
TAAAGTATTTAGTGACCTAAGTCAA
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e
nd=2799 strand=+
TTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCC
TCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTAT
TTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAAAACAT
GTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCG


3. The third file is '[output_file_name].faa", which lists amino acid sequences corresponding to the putative genes in "[output_file_name].out". For example,

>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e
nd=338 strand=-
VVTSLPLVEKKSPHCQVRAFFCVSCTRQPAPLPVVMVMVVVMVVLMRFMDVVYSVIFICLCAMPILVKVFSDLSQ
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e
nd=2799 strand=+
LKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKH
VLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLG
RNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRDEDE
LPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVG
DGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGV
ANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKF
LYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEP
VLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTA
AGVFADLLRTLSWKLGV
>gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=2801
end=3733 strand=+
VKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSAC
SVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCI
AHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRLD
TAGARVLEN

4. [output_file_name].gff gene prediction results in gff format.


License
============
Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang.
You may redistribute this software under the terms of the GNU General Public License.
Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.