Today I started working on identifying transposable elements in the Ostrea lurida genome via RepeatMasker.
RepeatMasker is a linux based program that identifies transposable elements, satellites, and regions of low DNA complexity from an existing .FASTA file. The program can be found at http://www.repeatmasker.org/RMDownload.html
After downloading the requisite files and dependencies (For me this was RMBlast, TNF, and the repeat database from GIRI, which requires user authentication provided by the GIRI people, a roughly 3 hour turnaround time for me). Dtrx can be gotten from your favorite package manager, ex. apt-get install dtrx
Installation Walkthrough:
note: This was done on a Macbook Pro running Ubuntu 16.04 roughly following repeatmasker.org instructions.
-
Check perl version, as RepeatMasker requires version 5.8.0 or greater.
-
Unzip RepeatMasker in to /usr/local.
-
Unzip blast in to /usr/local
-
Unzip rmblast in to /usr/local
-
copy rmblast files in to the blast director
-
At this point, I switched to using dtrx (do the right extraction) as opposed to using tar/gunzip. Just a quality of life thing. I unzipped the repeat libraries obtained from GIRI at this point. note: Be better than I am and use the force overwrite (-o) argument for dtrx, RepeatMasker comes with a Library directory, so dtrx will make a second Libraries directory as opposed to overwrite without that argument. The subsequent pictures show how to fix that.
-
Updating Dfam libraries, the picture below shows dtrx with both the -o and without the -o argument. I took the time to delete the extra copy of the dfam library for cleanliness. Note: I’m not sure this step was necessary for me, as the Dfam libraries are only human related transposable elements.
-
Config time! There’s a few little traps, but overall it went smoothly.
-
First screen, just enter through.
-
Path to perl interpretar, default worked for me.
- Repeat Masker installation location, again default worked for me.
- Next, it asks for the TRF installation location, and an interesting bug came up. TRF has a file name “trf.linux64”, but RepeatMasker expects a file name of just “trf” when inspecting the directory given.
- Initially, I tried to rename the file, that was a no-go, so I set up a symlink (symbolic link) to fool RM into thinking TRF existed as it expected.
- Next step asks for Search Engine location, and if you’d like it to be the default.
- Config done! Err… not so much. The configuration removes some files, which requires sudo as they live in /usr/local.
So, just rerun the config using sudo perl ./configure.
- Success! and a new error. Apparently we were missing a required perl module. Easy fix as follows.
- Success for real!