- So I’ve been trying to get a new program, Methyl Extract, to work. It takes .sam files from Bismark or BSMap and does both methylation calling as well as SNP calling, which will be useful downstream for calculating relatedness of our samples.
The instructions seemed fairly clear, with Methyl Extract just being a perl script ran from the command line, but I’ve gotten some odd results/lack of results making me think that there’s either an issue with the program, or how I call it.
- My call in R looks like this:
which is derived from the manual’s example SNP and Methylation call, with the addition of the flagW and flagC parameters, which indicate that we’re using paired end reads as well as optional output in .wig and .bed formats (wigOut and bedOut arguments).
- Initially, everything seemed to go well, with methylation and snp calling by scaffold working
- Later on in the run, it started failing to find scaffolds with an engrish level error message that is difficult to understand what is actaully failing.
- I took a moment to check if the scaffold actually existed in the reference genome, and it does.
- It still, however outputs files. Lots of files. About 60,000 in this particular run.
- These are split in to two main groups both of which have no headers The first are the CG files (of which there are three types, .wig, .bed, and .output) which are the results of methylation calls. According to the manual, the headers would be:
- Scaffold
- Position
- Context
-
Watson Methylcytosines
- Watson Coverage
- Avg Watson PHRED score
-
Crick Methylcytosines
- Crick Coverage
- Avg Crick PHRED score
Secondly, there are the SNP calls.
- These are supposed to follow a .vcf format, with the vcf file headers shown here in an example. Note it’s also missing any tag for sample the SNP came from, which likely makes the file in it’s current state useless for analysis.
I’m re-running the MethylExtract program now with just a single .sam input to see if it’s some oddity in the way MethylExtract concatenates fasta/q files, or perhaps something else. Hopefully this will produce something meaningful. Currently, all of the output files are residing in owl at owl.fish.washington.edu/scaphapoda/Sean/MethylExtractOut