Home / previous releases
Name Modified Size InfoDownloads / Week
Parent folder
release 1 2015-03-13
clip_adapter.pl 2015-04-20 15.8 kB
discard_redundant_sequences.pl 2015-04-20 3.3 kB
FASTQ_to_FASTA.pl 2015-04-20 3.1 kB
filter_simple_repeats.pl 2015-04-20 6.8 kB
length_cutoff.pl 2015-04-20 4.7 kB
map_sequences.pl 2015-04-20 5.1 kB
merge_FASTA.pl 2015-04-20 3.0 kB
q_analyzer.pl 2015-04-20 8.1 kB
q_filter.pl 2015-04-20 7.7 kB
readme.txt 2015-04-20 3.1 kB
reformat.pl 2015-04-20 361 Bytes
remove_TAGs.pl 2015-04-20 7.8 kB
reverse_complement.pl 2015-04-20 3.9 kB
sort_by_TAGs.pl 2015-04-20 6.6 kB
split_FASTA.pl 2015-04-20 4.5 kB
basic_analyses.pl 2015-04-20 5.1 kB
Totals: 17 Items   89.0 kB 0
            - NGS tools for the novice -

This toolbox comprises simple and handy Perl scripts
for processing of next generation sequencing (NGS) data.
The Perl scripts are command line based and thus perfectly
suited for automated sequence analysis pipelines.


NGS tools for the novice is provided by David Rosenkranz,
Institute of Anthropology, Johannes Gutenberg University
Mainz, Germany.

Author contact: rosenkrd@uni-mainz.de


The complete toolbox is packed in NGS-toolbox.zip.

List of tools (21.02.2012):

- basic_analyses.pl
  Counts the number of sequences, shows length distribution and
  calculates overall base composition

- discard_redundant_sequences.pl
  Discards redundant sequences from the dataset. Fasta titles
  will refer to the sequence abundance.

- FASTQ_to_FASTA.pl
  Converts sequence files from FASTQ to FASTA format

- filter_simple_repeats.pl
  Filers sequences that contain or consist solely of stretches
  of simple repeats (homo- and/or dipolymeric stretches).

- length_cutoff.pl
  Applies a user defiend length cutoff. Sequences will be sorted
  into three output files (<min length, >max length, >min<max length)

- map_sequences.pl
  Maps sequences to an arbitrary number of reference sequences from
  one or several files.

- merge_FASTA.pl
  Concatenates an arbitrary number of FASTA files.

- q_filter.pl
  Filters sequence reads based on Phred quality scores. Several
  options for the filtering process are available. Low quality
  ends of sequence reads (indicated by B for Illumina1.5+ or #
  for Illumina 1.8+) can be clipped prior the filtering process.

- q_analyzer.pl
  Anaylses FASTQ files (Illumina or Sanger format) based on Phred
  quality scores. Outputs helpful statistics like average overall
  read accuracy and average positional Phred score.

- remove_TAGs.pl
  Removes TAG sequences from inputfiles. Several options for removal
  (e.g. only TAG, everything preceeding the TAG but not the TAG itself
  etc.) are available. Sequences will automatically be sorted by TAG.

- reverse_complement.pl
  Manipulates sequences and makes them reverse, complementary or
  reverse complementary.

- sort_by_TAGs.pl
  Sorts sequences by TAG without removing the TAG. Several options
  for TAG tracing are available (sequence has to start/end with TAG
  etc.).

- split_FASTA.pl
  Splits a FASTA file into several output files. The User can set a
  maximum number of sequences per output file or determine a fixed
  number of output files per input file.


A short instruction of each Perl script is embedded within the script.
You can also browse the local Wiki or visit the project homepage at:

http://www.uni-mainz.de/FB/Biologie/Anthropologie/472_ENG_HTML.php



IMPORTANT NOTE / DISCLAIMER:
It is strongly recommended to work in a seperate folder. Create
backup copies of all your datasets in a seperate folder. Files may
be overwritten without confirmation by the user!
We assume no liability for loss of data or correctness of results.
Source: readme.txt, updated 2015-04-20