FragGeneScan Code
Brought to you by:
yuzhenye
File | Date | Author | Commit |
---|---|---|---|
example | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
train | 2014-03-14 |
![]() |
[33ce27] Initial commit |
FragGeneScan | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
Makefile | 2014-08-07 |
![]() |
[2115a6] Thread support added. Fixed memory leaks |
README | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
hmm.h | 2022-08-01 |
![]() |
[a53ed3] Fixed clang compile and bad_alloc error; works ... |
hmm_lib.c | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
hmm_lib.o | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
run_FragGeneScan.pl | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
run_hmm.c | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
run_hmm.o | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
util_lib.c | 2022-08-01 |
![]() |
[a53ed3] Fixed clang compile and bad_alloc error; works ... |
util_lib.h | 2022-08-01 |
![]() |
[a53ed3] Fixed clang compile and bad_alloc error; works ... |
util_lib.o | 2024-12-19 |
![]() |
[9a6de6] release 1.32 |
Installation ============= To install FragGeneScan, please follow the steps as below: 1. Untar the downloaded file "FragGeneScan.tar.gz". This will automatically generate the directory "FragGeneScan". Replace FragGeneScan with specific version that you downloaded. Most recent version: FragGeneScan1.32 (updated on Dec 19, 2024) 2. Make sure that you also have a C compiler such as "gcc" 3. Run "makefile" to compile and build excutable "FragGeneScan" make clean make fgs A note before you run FragGeneScan ==================== You may directly call FragGeneScan instead of calling run_FragGeneScan.pl. All the additional functionalities included in this script in earlier releases are now included in the main function. run_FragGeneSan.pl is still included in this package just in case. Running the program ==================== 1. To run FragGeneScan, ./FragGeneScan -s [seq_file_name] -o [output_file_name] -w [1 or 0] -t [train_file_name] -p [num_thread] [seq_file_name]: sequence file name including the full path [output_file_name]: output file name including the full path [whole_genome]: 1 if the sequence file has complete genomic sequences, or contigs 0 if the sequence file has short sequence reads [train_file_name]: file name that contains model parameters; this file should be in the "train" directory. Note that four files containing model parameters already exist in the "train" directory. [complete] for complete genomic sequences or short sequence reads without sequencing error [sanger_5] for Sanger sequencing reads with about 0.5% error rate [sanger_10] for Sanger sequencing reads with about 1% error rate [454_5] for 454 pyrosequencing reads with about 0.5% error rate [454_10] for 454 pyrosequencing reads with about 1% error rate [454_30] for 454 pyrosequencing reads with about 3% error rate [illumina_5] for Illumina sequencing reads with about 0.5% error rate [illumina_10] for Illumina sequencing reads with about 1% error rate [num_thread]: number of thread used in FragGeneScan. Default 1. If you want to use the perl wrapper, here is the command: ./run_FragGeneScan.pl -genome=[seq_file_name] -out=[output_file_name] -complete=[1 or 0] -train=[train_file_name] -thread=[num_thread] 2. To test FragGeneScan with a complete genomic sequence, ./FragGeneScan -s ./example/NC_000913.fna -o ./example/NC_000913-fgs -w 1 -t complete -p 1 [NC_000913.fna]: this sequence file has the complete genomic sequence of E.coli (NCBI gene predictions for this genome are available under the same folder example/) 3. To test FragGeneScan with sequencing reads, ./FragGeneScan -s ./example/NC_000913-454.fna -o ./example/NC_000913-454-fgs -w 0 -t 454_10 -p 1 [NC_000913-454.fna]: this sequence file has simulated reads (pyrosequencing, average length = 400 bp and sequencing error = 1%) generated using Metasim For illumina reads, please use illumina_5 or illumina_10 as the train model. 4. To test FragGeneScan with assembly contigs, ./FragGeneScan -s ./example/contigs.fna -o ./example/contigs-fgs -w 1 -t complete -p 1 Note: -w 1 (i.e., complete=1) & -t complete (i.e., train=complete) are used as the parameters. Output ============ Upon completion, FragGeneScan generates four files. 1. The first file is "[output_file_name].out", which lists the coordinates of putative genes. This file consists of five columns (start position, end position, strand, frame, score). For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome 108 440 - 3 1.378688 337 2799 + 1 1.303498 2801 3733 + 2 1.317386 3734 5020 + 2 1.293573 5234 5530 + 2 1.354725 5683 6459 - 1 1.290816 6529 7959 - 1 1.326412 8238 9191 + 3 1.286832 9306 9893 + 3 1.317067 2. The second file is '[output_file_name].ffn", which lists nucleotide sequences corresponding to the putative genes in "[output_file_name].out". For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e nd=338 strand=- GTTGTTACCTCGTTACCTTTGGTCGAAAAAAAAAGCCCGCACTGTCAGGTGCGGGCTTTTTTCTGTGTTTCCTGTACGCGTCAGCCCGCACCGTTACCTG TGGTAATGGTGATGGTGGTGGTAATGGTGGTGCTAATGCGTTTCATGGATGTTGTGTACTCTGTAATTTTTATCTGTCTGTGCGCTATGCCTATATTGGT TAAAGTATTTAGTGACCTAAGTCAA >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e nd=2799 strand=+ TTGAAGTTCGGCGGTACATCAGTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAGGGGCAGGTGGCCACCGTCC TCTCTGCCCCCGCCAAAATCACCAACCACCTGGTGGCGATGATTGAAAAAACCATTAGCGGCCAGGATGCTTTACCCAATATCAGCGATGCCGAACGTAT TTTTGCCGAACTTTTGACGGGACTCGCCGCCGCCCAGCCGGGGTTCCCGCTGGCGCAATTGAAAACTTTCGTCGATCAGGAATTTGCCCAAATAAAACAT GTCCTGCATGGCATTAGTTTGTTGGGGCAGTGCCCGGATAGCATCAACGCTGCGCTGATTTGCCGTGGCGAGAAAATGTCGATCGCCATTATGGCCGGCG 3. The third file is '[output_file_name].faa", which lists amino acid sequences corresponding to the putative genes in "[output_file_name].out". For example, >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=108 e nd=338 strand=- VVTSLPLVEKKSPHCQVRAFFCVSCTRQPAPLPVVMVMVVVMVVLMRFMDVVYSVIFICLCAMPILVKVFSDLSQ >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=343 e nd=2799 strand=+ LKFGGTSVANAERFLRVADILESNARQGQVATVLSAPAKITNHLVAMIEKTISGQDALPNISDAERIFAELLTGLAAAQPGFPLAQLKTFVDQEFAQIKH VLHGISLLGQCPDSINAALICRGEKMSIAIMAGVLEARGHNVTVIDPVEKLLAVGHYLESTVDIAESTRRIAASRIPADHMVLMAGFTAGNEKGELVVLG RNGSDYSAAVLAACLRADCCEIWTDVDGVYTCDPRQVPDARLLKSMSYQEAMELSYFGAKVLHPRTITPIAQFQIPCLIKNTGNPQAPGTLIGASRDEDE LPVKGISNLNNMAMFSVSGPGMKGMVGMAARVFAAMSRARISVVLITQSSSEYSISFCVPQSDCVRAERAMQEEFYLELKEGLLEPLAVTERLAIISVVG DGMRTLRGISAKFFAALARANINIVAIAQGSSERSISVVVNNDDATTGVRVTHQMLFNTDQVIEVFVIGVGGVGGALLEQLKRQQSWLKNKHIDLRVCGV ANSKALLTNVHGLNLENWQEELAQAKEPFNLGRLIRLVKEYHLLNPVIVDCTSSQAVADQYADFLREGFHVVTPNKKANTSSMDYYHQLRYAAEKSRRKF LYDTNVGAGLPVIENLQNLLNAGDELMKFSGILSGSLSYIFGKLDEGMSFSEATTLAREMGYTEPDPRDDLSGMDVARKLLILARETGRELELADIEIEP VLPAEFNAEGDVAAFMANLSQLDDLFAARVAKARDEGKVLRYVGNIDEDGVCRVKIAEVDGNDPLFKVKNGENALAFYSHYYQPLPLVLRGYGAGNDVTA AGVFADLLRTLSWKLGV >gi|49175990|ref|NC_000913.2| Escherichia coli str. K-12 substr. MG1655, complete genome start=2801 end=3733 strand=+ VKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEPRENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSAC SVVAALMAMNEHCGKPLNDTRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGIKVSTAEARAILPAQYRRQDCI AHGRHLAGFIHACYSRQPELAAKLMKDVIAEPYRERLLPGFRQARQAVAEIGAVASGISGSGPTLFALCDKPETAQRVADWLGKNYLQNQEGFVHICRLD TAGARVLEN 4. [output_file_name].gff gene prediction results in gff format. License ============ Copyright (C) 2010 Mina Rho, Yuzhen Ye and Haixu Tang. You may redistribute this software under the terms of the GNU General Public License.