Hello, I am using a compute node with 512GB memory allowed. My mpileup files are ~300GB each (normal and tumor). Is it possible to run VarScan somatic caller such that we don't store all data on the node and instead write it to a tmp file or tmp dir? I am not being able to complete VarScan somatic calling currently. Thanks!
Hello, i am trying to compare 2 SNP tables generated through varScan using java -jar VarScan.jar compare ${folder}/X.varScan.snp ${folder}/Y.varScan.snp unique1 ${folder}/X_unique.varScan.snp But i keep getting the warning "Warning: Unable to parse chrom/position from chrom position" I have tried everything i can think of to fix it but nothing works. It tells me how many positions are unique or shared but doesnt write this into an output file (output generated is empty) Thanks!
Thank you for your question. To me, it looks like you have your commands backwards: You run samtools mpileup FIRST, and pipe the output of that into VarScan. Also, keep in mind that you don't need to run two separate mpileup commands to generate the two-sample mpileup VarScan needs. You can give Samtools mpileup two BAM files. Also, a minor point, always provide an argument to VarScan option flags (e.g. use --mpileup 1). Here's my recommended restructuring of your command: samtools mpileup -B -q...
I want to run samtools and pipe the mpileup output into a VarScan call using the following conditions: 30 mapping score 30 phred quality score output-snp format min coverage = 50 min tumor coverage = 3 p-value = 0.01 min variant frequency = 4% Process substitution: java -jar VarScan.v2.3.9.jar somatic <(samtools mpileup -B -q 30 -Q 30 normal_sor t.bam) <(samtools mpileup -B -q 30 -Q 30 tumor_sort.bam) output -mpileup --strand-filter 1 --output-snp --min-coverage 50 --min-coverage-tumor 3 --p-value...
Hello .. I am trying to analyze exome copy number variation by GISTIC2, which requires ref arm size file and have downloaded the cytoband.txt.gx file for hg38 from UCSC. After removing of all unnecessary lines such as having alt or Chr mt or ChrUN etc I am trying to summarize using R and fail to do so. Kindly help me in creating my ref-arm-size file. Thank You.
The docs recommend piping samtools mpileup into mpileup2cns. However, it is possible for a empty or nearly empty bam file to produce no mpileup. In this case mpileup2cns will eventually time out, or hang indefinitely. It seems to be safe to use an intermediate file samtools mpileup >mpileup.tsv followed by java -jar varscan.jar mpileup2cns mpileup.tsv
Hello, and thanks for asking. The short answer is no. The VarScan readcounts command has been around forever, and it does facilitate some specialized analyses. However, it does not replace the bam-readcount functionality required for false positive filtering (though this is high on the priority list for the next release).
I have the same confusion, did you get the answer ?
Hi, I'm using VarScan to variant call metagenomics samples. However I'm not sure how to interpret the ambiguous calls in my context. What does it mean when the ouput is: A->R with 20% alt frequency for example? My understanding is that, in this case, 20% of the reads have the ambiguous call but is VarScan saying that 50% of those are ref (A->A) and 50% are alt (A->G)? or that VarScan cannot 'decide' if these 20% of the reads have A or G (i.e. could be up to 100% alt)? I could not find the answer...
Hi, Dan, I'm new to bioinformatics. Could you explain in detail how to prepare BED-like var file for bamreadcount. As suggested for SNV that I use print $1,$2,$2, but for indels I didn’t understand exactly how to proceed. I believe this is the cause of my problem.
Hi, Vanessa, I guess you need to be more specific. I've included below a general rule for raising question in GATK forum. If you could bring up more detail, Dan, myself or other users might be able to help you better. Thanks. Vic Writing a New Post Couldn’t find a solution from the steps above? Make a new post! Read these instructions to determine the category of your post and what information we need so that you can get your question answered: 1. GATK Errors An error message from a GATK Tool that...
perl script_FPfilter sample.varScan.snp.filter.bed varScan.variants.readcounts –output-basename varScan.variants.filter I get the following output: 27841 variants 27841 failed to get readcounts for variant allele 0 had read position < 0.1 0 had strandedness < 0.01 0 had var_count < 4 0 had var_freq < 0.05 0 had mismatch qualsum difference > 50 0 had mapping quality difference > 30 0 had read length difference > 25 0 had var_distance_to_3' < 0.2 0 passed the strand filter Apparently, the problem is...
I did it, but my problem remains.
I did it, but my problem remains.
I did it, but my problem remains.
Hi, Vanessa, Yes, I did. Please see above Dan's message: "Hello, thanks for your question. The input for VarScan fpfilter is NOT from VarScan readcounts. It instead requires a separate program, bam-readcount. This is a dependency we hope to fix with a future release, but if you use bam-readcount to generate the readcounts, you should be good to go." Basically, you need to generate a Varscan format somatic calling instead of VCF format of it; You then need to download another software called "bam-readcount"...
Do you able to solve the problem of failure in the result because no read count was returned? I'm having the same problem.
I'm new to variant calling and I'm having trouble running VarScan2 v2.4.4 (Support Protocol 1) which can be found in the following online document: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4278659 / The authors advise using the fpfilter.pl accessory script (https://sourceforge.net/projects/varscan/files/scripts/). When I run the following command: perl script_FPfilter sample.varScan.snp.filter.bed varScan.variants.readcounts –output-basename varScan.variants.filter I get the following output:...
Hi, In my example below, this is a short subset of SS=1 for SS=Germline. In general how is this SS determined? More specifically why would variants with FREQ very different from 50% (heterozygous) and 100% (homozygous) be classified as Germline, and even more so with these FREQ and with low SPV be given classification SS=1 for Germline. chr1 29902 . AAGCCAGACGC A . PASS DP=260;SS=1;SSC=30;GPV=1;SPV=0.00082866;AC=1;AN=2 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:225:214:10:4.46%:145,69,4,6 chr1 185125 . A G ....
Hello, under the documentation for somaticFilter it states: "This command filters somatic mutation calls to remove clusters of false positives and SNV calls near indels." What does removing clusters of false positives mean? How are false positives and clusters of these defined? Thanks Kelly
Great, is it possible to specify and output directory for these files? Also after test running I am confused. All the variants in my indel file that had a comma in the alt column are gone from the 3 output files. Does proccessSomatic not include any of these where the normal and tumor have different indels at a position? EDIT: sorry some do occur in the output files but not many. Only about 3.8% of all the indels that have two alt (comma separated) made it to the three output files for my first test....
Great, is it possible to specify and output directory for these files? Also after test running I am confused. All the variants in my indel file that had a comma in the alt column are gone from the 3 output files. Does proccessSomatic not include any of these where the normal and tumor have different indels at a position?
Great, is it possible to specify and output directory for these files?
Brian, If you extract Somatic calls only, you'll get only variants that are found in the tumor. You can use the processSomatic command to facilitate this extraction.
Thanks Dan, but then how would one extract just the tumor from the vcf, i.e. I don't want any information for the normal because I used an unmatched normal for all my tumors since they are really HIV patients and not tumors. So I want to just obtain a VCF for the tumor and remove the normal entirely. I was thinking I would first do an awk on the alt column and remove the first alt only if comma separated and then with bcftools just ^ the NORMAL to remove it. Is this was someone would do? Also I just...
Thanks but then how would one extract just the tumor from the vcf, i.e. I don't want any information for the normal because I used an unmatched normal for all my tumors since they are really HIV patients and not tumors. So I want to just obtain a VCF for the tumor and remove the normal entirely. I was thinking I would first do an awk on the alt column and remove the first alt only if comma separated and then with bcftools just ^ the NORMAL to remove it. Is this was someone would do? Also I just started...
Brian, the ref column is the reference base. The alts correspond to alternate allele numbers 1 and 2, so you're correct that alt 1 is the alternate allele in the normal and alt 2 is the alternate allele in the tumor IN THIS CASE. The numbers representing alts are sample-agnostic, so it would also be possible for the normal to be 1/2. If there's not a comma-separated list, then the alt allele is the alternate seen, not necessarily in the tumor, but for this position. You have to look at the genotype...
Sorry also to clarify if there is not comma separated list then the variant is for the tumor?
Hi, I want to confirm the output from Varscan somatic. If I have matched normal and tumor and both normal and tumor have variants, I assume based off running same thing with table output, the first alt is for the normal and the second alt is for the tumor. So for the first one the normal is CT>C and the tumor is CT>CTT. Example chr17 5529758 . CT C,CTT . PASS DP=20;SS=5;SSC=0;GPV=1E0;SPV=7.0898E-1 GT:GQ:DP:RD:AD:FREQ:DP4 0/1:.:10:8:2:20%:8,0,2,0 0/2:.:10:8:2:20%:8,0,2,0 chr17 6659795 . TTC T,TTCTCTCTCTC...
Hi, I came across a strange problem and it seems that --output-vcf 1 option disabled --strand-filter: So, if I do samtools mpileup -B -f hg19.fa -q 1 normal tumor | java -jar VarScan.v2.4.4.jar somatic -mpileup var.vcf --min-var-freq 0.08 --p-value 0.10 --strand-filter 1 --output-vcf 1, 0 were removed by the strand filter; But if I remove --output-vcf 1 from the above code, 582878 were removed by the strand filter. Also, is there a way to convert Varscan format into vcf 4.1? I need vcf 4.1 for Gemini....
Hi, Dan, 1. BED-like var file for bam-readcount Yes. It works. Then, I ran a test: combined somatic output snp head -20 and indel head -20; awk a BED-like file; all 38 variants have bam-readcounts; but fpfilter says only 30 had a bam-readcount result and 8 failed because no readcounts were returned. I checked failed variants without readcount, but there're readcounts in readcount file, e.g failed "chr1 10439 A -C 33 26 44.07% /-C 14 5 26.32% /-C Germline 1.1034090888899382E-11 0.9525820540573707...
1. BED-like var file for bamreadcount For SNVs it's straightforward as you say: chromosome, start, stop, and start equals stop. Indels are more challenging because of how they can be represented at different positions. I usually create a BED-like file for indels in which I print one line for every position from (start - indel_sizse - 1) to (stop + indel_size + 1). Bam-readcounts will return counts for all positions, and VarScan will use the best match to obtain metrics for filtering. 2. Running FPfilter...
Hi, Dan, it worked for my snp.Somatic.hc. I have four further questions: 1) To generate a bed-like var file for bam-readcount: if it's a snp, say, chr1 100 A C..., I do print $1,$2,$2 and get chr1 100 100 for var; however, for an indel: chr1 100 A +C..., do I still print $1,$2,$2 because it's an insertion; then, only for a deletion: chr1 100 A -C..., I print $1,($2+1),($2+1). Why isn't ($2-1)? What if it's a long deletion: chr1 100 A -CCC..., do I print $1,($2+1),($2+1) or $1,($2+1),($2+4)? 2) I...
Hi, Dan, thanks so much for prompt response. I'll try it out and let you know.
Hello, thanks for your question. The input for VarScan fpfilter is NOT from VarScan readcounts. It instead requires a separate program, bam-readcount. This is a dependency we hope to fix with a future release, but if you use bam-readcount to generate the readcounts, you should be good to go.
Hi, Does anyone know why I get no readcounts when do fpfilter? 10438 variants in input file 0 had a bam-readcount result 0 had reads1>=2 0 passed filters 10438 failed filters 10438 failed because no readcounts were returned I followed STD Varscan steps: 1) samtools mpileup -B -f ref.fa -q 1 normal.bam tumor.bam | varscan somatic; 2) varscan processSomatic; 3) samtools mpileup -B -f ref.fa tumor.bam | varscan readcounts --variants-file; 4) varscan fpfilter var.file readcount.file. var.file looks like:...
This was mentioned within the manual, the command (mpileup2cns) reports all positions that met the miniumum coverage (reference, SNP, or indel), However, when I used the command below and expect to output all of the position(11578) include monomorphic position which met the miniumum coverage,but in the output vcf Only variants will be reported(7268). I wonder what change am I to make to get all of the 11578 position genotype that met the miniumum coverage , Thank you very much. java -jar VarScan.jar...
Update: problem soved. the parameter --mpileup 1 has to be added into the command. This was mentioned within the manual but not the protocol article. This is really basic. I was following the instructions from the protocol Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection I have a pair of normal tumor samples to analyze so I followed the instruction to mpileup them together with samtools. A combined, mpileup'ed file was generated. However, when I called the function VarScan...
This is really basic. I was following the instructions from the protocol Using VarScan 2 for Germline Variant Calling and Somatic Mutation Detection I have a pair of normal tumor samples to analyze so I followed the instruction to mpileup them together with samtools. A combined, mpileup'ed file was generated. However, when I called the function VarScan somatic, I put in one input, followed by the name of my designated output filename. I keep getting error notice that I finally figured out it's expecting...
I´ve just stumbled upon a variant whose REF and ALT fields are identical (REF=A and ALT=A). Is this supported in the VCF standard?
I just stumbled upon a variant whose REF and ALT fields are identical (REF=A and ALT=A). Is this supported in the VCF standard?
There is a variant result from varscan fpfilter using default parameter like below : 17 7579362 . AACC A . MinMMQSdiff DP=473;SOMATIC;SS=2;SSC=255;GPV=1E0;SPV=6.8578E-82 GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:89:89:0:0%:68,21,0,0 1/1:.:384:14:370:96.35%:5,9,244,126 the used bam-readcount result is 17 7579363 A 405 =:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 A:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 C:0:0.00:0.00:0.00:0:0:0.00:0.00:0.00:0:0.00:0.00:0.00 G:1:60.00:32.00:0.00:1:0:0.01:0.01:32.00:1:0.01:135.00:0.01...
No problem! It should be an easy fix for the next release. For the time being I am using awk to replace the ambiguous codes with N's in the REF column of the VCF files. It seems to do the trick.
No problem! It should be an easy fix for the next release. For the time being I am using awk to replace the ambiguous codes with N's in the VCF files. It seems to do the trick.
No problem! It should be an easy fix for the next release. For the time being I am using awk to replace the ambiguous codes for N's. in the VCF files. It seems to do the trick.
Thank you for providing the examples; now I see what you were trying to describe. An ambiguity code in the reference is an unusual edge case, and a possibility that I had not considered (or coded for) in VarScan. It should be possible to add this compatibility for the next release.
Hi Dan, thanks for your answer! I invoked samtools mpileup like this: samtools mpileup -C 50 -B -q 1 -Q 15 -f genome.fasta sample.bam > sample.pileup VarScan's output (2 selected entries) chr2 233142970 . Y C . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:25:0:0:2:0 chr2 233143729 . Y T . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:36:0:0:2:0 Samtools mpileup's...
I am running Varscan 2.4.4 (mpileup2cns) using mpileups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but the VCF output is malformed as it contains IUPAC codes in the REF column. The reference genome contains IUPAC codes but none of the reads do, would it be possible to not report variants called on these?
Hi Dan, thanks for your answer! I invoked samtools mpileup like this: samtools mpileup -C 50 -B -q 1 -Q 15 -f genome.fasta sample.bam > sample.pileup VarScan's output (2 selected entries) chr2 233142970 . Y C . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:25:0:0:2:0 chr2 233143729 . Y T . PASS ADP=2;WT=0;HET=0;HOM=1;NC=0 GT:GQ:SDP:DP:RD:AD:FREQ:PVAL:RBQ:ABQ:RDF:RDR:ADF:ADR 1/1:0:2:2:0:2:100%:9.8E-1:0:36:0:0:2:0 Samtools mpileup's...
Jose, That's an odd one! Would you mind providing a few lines of the relevant VCF output, and the mpileup for those positions?
I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains IUPAC codes in the REF column. The reference genome contains IUPAC codes but none of the reads do, would it be possible to not report variants called on these?
I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains IUPAC codes in the REF column. The reference genome contains IUPAC codes, would it be possible to not report variants called on these?
I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains "Y" characters in the REF column. Is this expected? Needless to say that none of the reads contain "Y" characters.
I am running Varscan 2.4.4 (mpileup2cns) using pile-ups computed with samtools 1.9. It is RNA-seq data that has been processed following the GATK4 best-practices for calling variants on RNA-seq datasets. The execution completes fine but VCF output is malformed as it contains "Y" characters in the REF column. Is this expected? Needless to say that none of the reads contain "Y" characters.
Thank you very much for the answer! This explains why I got slightly different results for indels but identical results for LOH.
Dear Yuyu Liu, Thank you for your question. I should first emphasize that using this parameter may not be necessary, as VarScan is designed around the notion that tumors are usually not pure. The tumor-purity option essentially adjusts the minimum variant allele frequency (--min-var-freq) downward, such that a lower VAF threshold is used when calling the consensus base for the tumor. The actual code looks something like this: tumorMinVarFreq = (minVarFreq * tumorPurity); Thus, if your normal --min-var-freq...
I am running Varscan for LOH analysis. I estimated that my samples have a tumor purity of 0.4. I ran the program with tumor purity of 1.0 and 0.4. The results didn't change at all. Is this normal? How does the tumor purity option change the calculations?
Hi, I am wondering if Varscan2 is compatible with Nanopore data. If yes, are there any tips or reminders for user like myself? Have a nice day. Cheers, Jonathan
Brandon, thanks for the question. The differences that you see all come down to read counting. The denominator for the VAF calculation is the total number of reads counted for any allele at that position, regardless of which allele is being counted. If >2 alleles are observed, it's very possible that the denominator accounts for base calls that you may not be aware of in the VCF, i.e. base calls that support neither the reference nor the main ALT allele. Consensus calling mode typically only reports...
Hello, I use the following command for poolseq data: VarScan.v2.4.3.jar mpileup2cns --min-coverage 8 --p-value 0.05 --min-var-freq 0.000625 --strand-filter 1 --min-freq-for-hom 0.80 --min-avg-qual 20 --output-vcf 1 > out.vcf I then use GATK VariantsToTable to write the file to .txt. I then use the TYPE column to distinguish either SNP or INDEL. I put each TYPE into separate files. I then further filter the SNP data for biallelic loci. I've noticed that for the TYPE=SNP data, I have inconsistencies...
Hello, I use the following command for poolseq data: VarScan.v2.4.3.jar mpileup2cns --min-coverage 8 --p-value 0.05 --min-var-freq 0.000625 --strand-filter 1 --min-freq-for-hom 0.80 --min-avg-qual 20 --output-vcf 1 > out.vcf I then use GATK VariantsToTable to write the file to .txt. I then use the TYPE column to distinguish either SNP or INDEL. I put each TYPE into separate files. I've noticed that for the TYPE=SNP data, I have inconsistencies for FREQ, DP, AD, and RD. Sometimes AD + RD != DP (but...
Hello, I use the following command: VarScan.v2.4.3.jar mpileup2cns --min-coverage 8 --p-value 0.05 --min-var-freq 0.000625 --strand-filter 1 --min-freq-for-hom 0.80 --min-avg-qual 20 --output-vcf 1 > /scratch/lindb/JP_pangenome/JP_pooled/varscan/JP_pooled_varscan_bedfile_0000.vcf I then use GATK VariantsToTable to write the file to .txt. I then use the TYPE column to distinguish either SNP or INDEL. I put each TYPE into separate files. I've noticed that for the TYPE=SNP data, I have inconsistencies...
Thank you for your question. I'm not sure I have enough information to answer entirely. Are you working with WES or WGS data? By de novo mutations, you mean ones that are present in the child but not the parents, I presume. There should be 1-2 in the exome and perhaps 60-80 genome-wide. Are you using VarScan mpileup2cns on the trio, and then looking for variants that appear heterozygous only in the proband? That would be my recommendation, for starters. I would then apply VarScan FPfilter with the...
Hello, I am currently using VarScan v2.4.4 to call SNP variants. There are two things which are really confusing. The first is How the P-value is calculated. what I know so far is that it is based on the supporting reads for ALT and REF. The confusing part is Fisher exact test needs at least 2X2 table. So, what are the other two data? How can I verify the P-value I got from VarScan by using the information from samtools by my own. The second is How to deal with 1/2 GT. I found that there is a situation...
Hi we perform WES on trios, (mother, father and child with cancer). we are very much interested in catching de novo mutations in the children. We have tried two sets of values for fpfilter: set 1: Fewer than 3 variant-supporting reads Variant allele frequency below 0.05 Relative average read position < 0.15 Average distance to effective 3' end < 0.15 Average mismatch quality sum for variant reads > 100 Average mapping quality of variant reads < 30 Average base quality of variant reads < 30 Strand...
Hi we perform WES on trios, (mother, father and child with cancer). we are very much interested in catching de novo mutations in the children. We have tried two sets of values for fpfilter: set 1: Fewer than 3 variant-supporting reads Variant allele frequency below 0.05 Relative average read position < 0.15 Average distance to effective 3' end < 0.15 Average mismatch quality sum for variant reads > 100 Average mapping quality of variant reads < 30 Average base quality of variant reads < 30 Strand...
Dear VarScan users, We are writing an NIH grant application for funding to harden, improve, and expand VarScan functionality. If you have feature requests or suggestions for improving our tool, please share them here or send them to me directly. Also, if you or someone at your institution uses VarScan and would be willing to sign a letter of support (LOS) for our application, please drop me a line at dankoboldt at gmail. I will send you our specific aims and a template letter that you can customize....
Hi, how can I limit the analysis using a target design BED file ? Thanks Gianfilippo
Hi, is the varscan "readcount" command replacing the bam-readcount utility ? Thanks Gianfilippo
I have some files,when I run the perl script different file will have different result. For example: Use of uninitialized value $input in <handle> at mergeSegments.pl line 127. readline() on unopened filehandle at mergeSegments.pl line 127. Can't use an undefined value as a symbol reference at mergeSegments.pl line 215. This is some files run the script appear this error and no anything output. Meanwhile a tsv file is generated when the error information in the previous reply. Please tell me the...
Thank you . I have create my ref-arm-size file. But when I run the perl script still have some error.example: Use of uninitialized value $input in <handle> at mergeSegments.pl line 127. readline() on unopened filehandle at mergeSegments.pl line 127. Can't use an undefined value as a symbol reference at mergeSegments.pl line 215. I exam my ref-arm-size file , it have all chr. I don't know the reason of this question. Hope you can help me !</handle>
Hello colleagues I am working on haploid genomes and I would like to know with which parameter I could launch "varscan mpileup2snp" to consider haploid variants. Thank you in advance
Hello, I'm using Varscan in a project that we are working on, and it is crucial for me to know how Varscan2 pileup2cns decides on the most frequent non-reference allele to report. I understand that it calls a consensus variant for each sample based on the greatest number of supporting reads but what happens when two different alleles share the same number of supporting reads. Thank you.
Hi, I am using Varscan2 (version 2.4.3) to call somatic variants and I would have a question regarding the fpfilter. I thought that through the parameter "--max-somatic-p" I could apply a threshold to the SPV field values of my variants but it seems to me that it is not really responsive, even if I use very low values for the parameter "--max-somatic-p-depth". I had a look at the fpfilter code (I am an absolute Java beginner) and it seems to me that "--max-somatic-p" is not really used for processing...
Hello, thanks in advance for the help If I use samtools mpileup -B -q 1 -f hg38.fa recal_reads08.bam recal_reads06.bam recal_reads02.bam > trio.mpileup dad mum child And I want to call the snp variants from mpileup file java -jar VarScan.v2.3.9.jar mpileup2snp trio.mpileup --output-vcf > varscansnp.vcf in the results of vcf file I get CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample1 Sample2 Sample3 Sample1 correspond to recal_reads08.bam? Because I have the feeling is not in the same order as...
Sara, Thanks again for bringing this to my attention. It turns out the somatic P-value threshold was not being applied in the FP filter. This has been fixed for the next release (v2.4.4).
Changhan, thanks for clarifying!
Manal, you have to run it on each pair. I'm not sure what you're asking about the read depth; this is one of the output fields provided by VarScan, but only reported where VarScan calls a variant.
Zhou, Your ref-arm-sizes file is wrong: it should have one line for each chromosome arm.
Harshmi, I'm not sure what to make of the columsn, which don't match normal VarScan output formats. If you're still having trouble with this issue, would you mind posting your mpileup for the relevant line, as well as the VarScan version and output line?
Guillaume, thank you for your question and I apologize for the delay in replying. In your example, the read counts are somewhat confusing because it's an insertion. Based on the VarScan output, there are 237 reads supporting the reference, 1692 supporting the insertion, and 86 reads supporting other alleles. Getting a look at your mpileup is the only way to be sure; would you mind sharing it?
Zhou, the ref-arm-sizes file is one that you create, designating the coordinates of the "p" and "q" arms of each chromosome. It should be tab-delimited with four columns: chromosome, start, stop, p/q. For example: chr1 1 125000000 p chr1 125000001 249250621 q If you need to find out the coordinates of p/q arms, the "cytoBand.txt.gz" file in the UCSC Genome Database provides this information.
Can you tell me how the chr-arm file is obtained? Thank you very much!
Can you tell me how the chr-arm file is obtained? Thank you very much!
Hello, I have used varscan with java -jar VarScan.v2.3.9.jar mpileup2cns RS.pileup --vcf-sample-list sample.list --min-reads2 3 --min-coverage 15 --p-value 0.9 --strand-filter 0 --variants --output-vcf > RS.vcf The vcf file is attached to this message. When I look at the results, I see that there is a field called AD (allele depth), a field called DP and a field called FREQ. In most cases, FREQ = AD/DP. However in some cases, this doesn't seem to be the case. Furthermore in those cases, I can't see...
How do you adjust the parameters of VARSCAN2 SOMATIC according to your own data?please help me! Thank you
Hi I have 5 normal files and 14 meningioma files Can I use VarScan to read all files at once or I have to run it on each pair (normal -meningioma) also , is it possible to detect the read deapth ? I need your help please thank you
I missed filtering steps which I made. It's my mistake.
Hi, I am confused. I am using VarScan2 (2.3.9) to identify and profile somatic mutations from cfDNAs and matched normal blood samples. When I used 0.005 for –-min-var-freq, I could not find some variants with higher VAF. However, when I used 0.05 for --min-var-freq, I found previously unfound variants with VAF over 0.05. For example, A variant (chr2 29606628 . C G . PASS DP=1775;SOMATIC;SS=2;SSC=255;GPV=1E0;SPV=1.136E-98; GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:447:444:0:0%:240,204,0,0 0/1:.:1328:700:601:46.2%:349,351,313,288)...
Hi, I am confused. I am using VarScan2 (2.3.9) to identify and profile somatic mutations from cfDNAs and matched normal blood samples. When I used 0.005 for –-min-var-freq, I could not find some variants with higher VAF. However, when I used 0.05 for --min-var-freq, I found previously unfound variants with VAF over 0.05. For example, A variant (chr2 29606628 . C G . PASS DP=1775;SOMATIC;SS=2;SSC=255;GPV=1E0;SPV=1.136E-98; GT:GQ:DP:RD:AD:FREQ:DP4 0/0:.:447:444:0:0%:240,204,0,0 0/1:.:1328:700:601:46.2%:349,351,313,288)...
Dear Sara, That certainly is the intent of the parameter, but it's possible the logic wasn't implemented. I'll take a look at the code and get back to you here.
Hi, I am using Varscan2 (version 2.4.3) to call somatic variants and I would have a question regarding the fpfilter. I thought that through the parameter "--max-somatic-p" I could apply a threshold to the values of my SPV field of my variants but it seems to me that it is not really responsive, even if I use very low values for the parameter "--max-somatic-p-depth". I had a look at the fpfilter code (I am an absolute Java beginner) and it seems to me that "--max-somatic-p" is not really used for...
On another topic, whe we use samtools 1.9 (last version) for the mpileup for the commands mpileup2snp and mpileup2indel, and then we use those archives for the filter command, the command stops due to parsing errors in the snp.vcf archive (pointing to the FREQ fiel of the Format Column). But when we use the same archives with samtools 1.8, this problem do not appear.
Good Morning we are having some issues with the Filter command. It seems that the option --min-strand2 it is not working correctly. We are launching the command with the option by default (1, that is, as we understand, that the variant has to be in at least one of the strands to pass the filter), but it is filtering out some variants by the strand2 filter. If we use other numbers for this filter, the same number of variants (and the same variants) are being filtered out. Is this a bug, or we are...
Dear Dan, many thanks for your reply. How can I merge my vcf files? I tried to use CombineVariants to combine my 6 vcf files, but I get this error : Unable to access jarfile/ GenomeAnalysisTK.jar In fact there is no file namely GenomeAnalysisTK.jar. Do you know how can I fix this problem? Another problem is about the format of vcf file, it's not the same as accepted format for ANNOVAR. How can I convert them to annovar format? About the vcf file could you please tell me why the third colum (ID) is...
Dear Dan, many thanks for your reply. How can I merge my vcf files? I tried to use CombineVariants to combine my 6 vcf files, but I get this error : Unable to access jarfile/ GenomeAnalysisTK.jar In fact there is no file namely GenomeAnalysisTK.jar. Do you know how can I fix this problem? Another problem is about the format of vcf file, it's not the same as accepted format for ANNOVAR. How can I convert them to annovar format? About the vcf file could you please tell me why the third colum (ID) is...
I already generated a mpileup file for varscan2. I'm using mpileup2snp. I want to output non-variant positions as well as snps. What should I do? Thanks.
Sorry for my lateness in answering this, but the VarScan native output uses IUPAC Ambiguity codes. If you want VCF-style output, use the parameter --output-vcf 1.
Cristina, I sent you a more detailed explanation by message, but in general you should NOT do this. When calling de novo mutations, it's essential to know that you have sufficient coverage in both parents. If you combine them into a single BAM, all of the coverage could be from one parent.
Raheleh, thanks for your question. I'm not sure I have enough information to provide detailed guidance, but most studies with paired tumor-normal data are most interested in somatic alterations. To that end, you should work with the FPfilter-passed somatic mutation calls from each tumor (one for SNVS, one for indels). You might combine the SNVs and indels into a single file for convenience. I can't think of a good reason you'd want to combine somatic calls from multiple patients into a single file....