CpG OE Analysis for DNA Methylation

Historical DNA Methylation Analysis: CpG O/E

Background on CpG O/E:

Based on scripts and pipeline instructions from:

1. Preparing M.capitata and P.acuta CDS files

Download from Rutgers

wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.cds.fna.gz 
wget http://cyanophora.rutgers.edu/Pocillopora_acuta/Pocillopora_acuta_HIv2.genes.cds.fna.gz 

Format of these files:

Pocillopora_acuta_HIv2.genes.cds.fna:

>Pocillopora_acuta_HIv2___RNAseq.g24143.t1a
ATGCCTCTTTCTTCCACAGTAGTTGGTTGTTTTCCAAAGCCTTCCTATCTGCACATTCCAGACTGGTTCCAAACCTCCCACAGTGGAACATTCACCGAGGATTACAATCGCTTCCTCGAAACCTCATCCAAGGAAGAAACAGAAGACCTGATCAAGAGAGCTACAAAAGATATAATTGATATCCAAGCCGAAGCTGGCATTGATGTGATCAGTGACGGTGAGTTACGGAGAGAATCTTACATCTTGCACTTCTGTCGGGCGCTAAAAGGGTTAGACTTCCACAATTTGTTTTCTAAAAACTGTCGCGATGGTGCTTGTGTCACGGACGTTCCCCGTATCATTGGCCGGGTGACACCGCGAGAAAGCGAAGCGTGGGTATGGAAAGAGTGGAAATTTTCTCAAGATTTGTCTCCGTTGCCCATGAAAATTACTATACCGGGACCGATGACAATAATAAACTCCACCGAGGATCAGTATTACAATGATGACAGGGCGCTAGGGAAAGTTTTGGCAGAGATAATAAACAGTGAGATCAAGGCTCTTGTTTCCGCTGGATGTAGATGTATTCAGATACCGGACTGGTTCCAGCCAGCGTACGCCGAAAATTTCCCAAGGGAGTACAATCGCTTTTTGCAGAGTTTTTCATCCACAGACATAGAAGAATTGTTTATCAGAGCTATTCGAGAAGTGATTGAAATACAGAGCAAAGCCGGTCTTGATGTGATAACAGATGGGGAAGTGCGGCGTGAAAACTACATCCATTACTTTTGTAGAAAGCTGAATGGCTTTGATTTCAAAAGTTTATGTCTGACGTCGATTCGCGACGACGCTGCAACAATTCTATTGCCACAAATTGTGGGAGAGGTATCAGCGCAGGAAAATGAAGGTTGGGTTTGGAAAGAATGGAAGAGTTCACAGGTTTTATCCAAACTTCCTGTGAAAATTACTCTTCCGGGACCCATGACAATTGCGCACACAGTCGTCGATCAATACTACAAGGATCAAAAGGTTCTAGGTGCTGTGTTGACAAAAATTTTAAACAGGGAAATTAAAGCTCTTGTATCGGTTGGCTGTAAGGTTATTCAGGTTGATGAACCAGTGTTAATGCGATTTCCTGATGAGGCTTTGGATTATGGAATTGATCAACTTTCAACTTGTTTTGACGGAGTTACATCTCCTGATGTAGTCAAGGCTGTTCATTTGTGTTGCGGTTATCCAACCTACTTGGATCAGGATGACTATAAGAAGGCAGACAAGGATGGATACCTGAGACTGGCTGATAAACTAGATGCTGCAGGCTTTGATGAGATTTCAATTGAGGATGCTCATAGGCACAATGATTTATCATTATTTGAGCATTTTAAAAAGAGCAAGGTTGTTCTAGGTGTTGTTAAGATTGCTAGCAGCAAAGTAGAGAGTGTCGAGGAGATTCGCAGCCGTCTAAAGCAAGTACTGACTGTTTTACCAGCAGAACGATTGGTTGTTGCTCCAGACTGTGGTTTGGGATTTCTTCCAACTGATCTGGCAAAGCAGAAACTTTTTAATATGGTACAAGCGGCAAAGTCTTTACCCTAA

Montipora_capitata_HIv3.genes.cds.fna:

>Montipora_capitata_HIv3___RNAseq.g7985.t1
ATGGCAGAGGCTGCTTCACGGACTTGTGTGAAAGTTACGGGCGAACAGTGTTCCAACAAATGGAAGAAACTGGAGGAGAAGTATAAAAAAGTCAAGGAACACAACGACAAAACGGGTAACGACAGGAAAGAGTGCGACTTCCAAGAAGAGCTGGAAGAGTTTTTCGGAACTGATCCGAAGATAATTCCAACGGCTACCGTTTCTTCTTTGACAGCAAAGCTGGCCACAGGCAATACGTCAACGGATGATGAGGAGGAATTACCTCTGGCAAACTGCCGAGAGCCACCAAAGAAGAAGAAGAAACGCCGCTCGAAGTCGTCCGCTACTGAAATGATTGAGTTTCTGACCGAGTACAAGGAAGAAAAGCGGAAAGAGGAAGAAGCAAAAATTGTCCTTGCGCAGAAAATGCATAATGAAAAATTGGGCATAATGGAGAGATTCCTTGACGTCCTGTCGAAGAAAGCAAATTGA
>Montipora_capitata_HIv3___RNAseq.g36632.t1
ATGGCACTCACTGAGGACCAACGGTCGTGTCGTAGCAGTTTTCTGTCAGGGACTCCAGGATCAGGGGCTGATGTAGCTCCACCAGCTGAAAAAAAAAACCTAGAGCTTACAGAGCAAGTCAACGCTCTGCCCGAAGAGATTCGTAAGCTGAAAGAACACATTACTGAACATACTAGCTCTCCGGACCGCAGTGCACGGTCAACAGAAGCGGCTTTGCAGTTTTACAGTGATGCACATGACGAGTCTCAAACCTTCCAAGCAGATGTGAGTAAGGAACTGAAACGCTTTAGTTCTTGGCTCGCTGAAATCAGCACCAGAGTTGATGAAATCGGGAAGGCAATTGATAGTATGTGTGAATATAGCTATCAATACAATGTAAAGATCGTTGGGATGCCCGAGCTAAGTGAACAAGAGTCTTACTCTCAAACAATTTCTTTATGCGTCAAATTATTCTCTGATATGGGAGCCGATGTTTCTCTTCACGATATAGACATTGCACATCGTGTGCCGTAG

Number of sequences from each:

fgrep -c ">" Pocillopora_acuta_HIv2.genes.cds.fna: 33730 fgrep -c ">" Montipora_capitata_HIv3.genes.cds.fna: 54384

Converting FASTA to tabular format and placing output file in analyses directory

cd /data/putnamlab/estrand/CpG_OE for the following commands

perl -e '$count=0; $len=0; while(<>) {s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) {print "\n"} s/ |$/\t/; $count++; $_ .= "\t";} else {s/ //g; $len += length($_)} print $_;} print "\n"; warn "\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n";' \
../Pocillopora_acuta_HIv2.genes.cds.fna > PA.v2.fasta2tab

Output: Converted 33730 FASTA records in 67460 lines to tabular format Total sequence length: 45794064

perl -e '$count=0; $len=0; while(<>) {s/\r?\n//; s/\t/ /g; if (s/^>//) { if ($. != 1) {print "\n"} s/ |$/\t/; $count++; $_ .= "\t";} else {s/ //g; $len += length($_)} print $_;} print "\n"; warn "\nConverted $count FASTA records in $. lines to tabular format\nTotal sequence length: $len\n\n";' \
../Montipora_capitata_HIv3.genes.cds.fna > MC.v3.fasta2tab

Output: Converted 54384 FASTA records in 108768 lines to tabular format Total sequence length: 63481968

head -2 MC.v3.fasta2tab:

Montipora_capitata_HIv3___TS.g31918.t1          ATGTCATGTTATGCTATTCGATCTCATCTAGATACTGTTGCCAAGGCTTACTATGTCGGTTTGAGTCAAGCGATCGAGGATAATCCTTCACGAAATTCTAAGCTTTGCATCCTTCGATCCCAACTTGTAACACGCAGTCGTCAATTCATCCTCCTTTTGAGAGAATGCACCACCTCCTACACTGCATCATGCCTTGTACCCGACGAAAAGAGCAGCACACTTCGTGATGCCCTCGCCCGATTAGCAGTGGGTCTACACCCCCTCGATGGACCTCAAGCTGTCATTCGGGTCGACCCTGCACCGGGATTTGTCTCACTGAAAACCACACATGCTTTACAGCCTCTTGGCATCTCAGTCGAAGTTGGTCGCATCAAAAACTCCAACAAAAATCCTGTTGCCGAACGAGCTGTGCTGGAACTACAAGAAGAGCTGTTACGTCAGGAGCCTGGCGGAGGCCCTGTTACCGAACTCGGCCTTGCTGTTGCCACTGCCCGCCTGAACTCGCGACTCCGCAGCCAAGGCCTGTCCTCACGGGAGCTTTGGACACAGTGCAACCAGTGCACCAATGAGCAGATACCCGTCAATGACGTACAACACATTGTAGCCAAGCACCAAGCTCGCCAAACCAACCATCCTTTCAGCGAAGCGGCCAAAGGTGGCTACCGCCCACTAGCTCCCGTTGCCCCCTTGCAGGTTGGAGACGTGGTGTATGTCAAGTCTGACCGAGATAAGTCACGCGCCTGTGATCGCTATATCATTGTTAGCATCGACGGCGAATGGTGTTTCATTAAGAAGTTCTCTGGCTCAAAACTCAGGGCTACATCTTACAAGGTGAAGCTAGCCGAGTGCTACACTGTACCACATACCCTCCCACCACCCTCGTACCAGTCTGTCGTCCCTCCACTTGACGAAGACGAATGTGTGGAGATTCCTGAACAGCCCCAGCCTTGTCAAGAGCCCTCTGCCCCACCAGACCTGTTACGTCCGCCCAACCCAGATCCTCCAGACTCCACCCCGGACCAAGCCGAACAGCAGGAAGACTCGACCCCCCTTGAAACTGATCGTGTTCCTGTTTCAATTTCATGTGAACCAAGACCCCAACGGGCACGCCGGGCCCCCGCATACTTACAGGAGTACATTCTCAACTAG
Montipora_capitata_HIv3___RNAseq.g39902.t1              ATGCAGACGTTTCGCATGGGATACAACACGCCGGTGTCTGCTCAGCGTTTCTTGGGAGTTTATGACGACTTGCGCTTGATTGTAACTCAACTCGCAAACTCTTGGGTGAAAATTGGGGGAAATGGAGAAGTTGCAGAAGATTTTGTTATTTGTGGCAATAAAGTAGGCAATAAGAGGGCACGAGAGAAACTGAACACTATTGTGGCCGGTGTCAGCTGTAAAACGATTGACATCAAACAGCCTGGGCTGAGGAAATTCTATACCAGTGGTAAAGCAGATCTTCTGGAAAAATCGATAGAGAAAAACCATGAATGCGCGATCCTTGTCCAGAAGAACTTTGACTCGAAGCGAGAGGAAGGTAAAGTACAAGGTGCACAAGTAGGTTCTGACTCCACAAGCAGTGATGACGATGATGATGAGGCTGTCATTAGTGGTGATGATGATATTTATGATGAGTGGGAAGCAGTTGTAAGTGAAACTGATCGATCATCTTTCGCCACCAAAACTGGCCACCATGTTTCGTGGAGAGCTGGAATCATTGAGGCTGAAAAGGTGAGTTAA

head -2 PA.v2.fasta2tab:

Pocillopora_acuta_HIv2___RNAseq.g24143.t1a              ATGCCTCTTTCTTCCACAGTAGTTGGTTGTTTTCCAAAGCCTTCCTATCTGCACATTCCAGACTGGTTCCAAACCTCCCACAGTGGAACATTCACCGAGGATTACAATCGCTTCCTCGAAACCTCATCCAAGGAAGAAACAGAAGACCTGATCAAGAGAGCTACAAAAGATATAATTGATATCCAAGCCGAAGCTGGCATTGATGTGATCAGTGACGGTGAGTTACGGAGAGAATCTTACATCTTGCACTTCTGTCGGGCGCTAAAAGGGTTAGACTTCCACAATTTGTTTTCTAAAAACTGTCGCGATGGTGCTTGTGTCACGGACGTTCCCCGTATCATTGGCCGGGTGACACCGCGAGAAAGCGAAGCGTGGGTATGGAAAGAGTGGAAATTTTCTCAAGATTTGTCTCCGTTGCCCATGAAAATTACTATACCGGGACCGATGACAATAATAAACTCCACCGAGGATCAGTATTACAATGATGACAGGGCGCTAGGGAAAGTTTTGGCAGAGATAATAAACAGTGAGATCAAGGCTCTTGTTTCCGCTGGATGTAGATGTATTCAGATACCGGACTGGTTCCAGCCAGCGTACGCCGAAAATTTCCCAAGGGAGTACAATCGCTTTTTGCAGAGTTTTTCATCCACAGACATAGAAGAATTGTTTATCAGAGCTATTCGAGAAGTGATTGAAATACAGAGCAAAGCCGGTCTTGATGTGATAACAGATGGGGAAGTGCGGCGTGAAAACTACATCCATTACTTTTGTAGAAAGCTGAATGGCTTTGATTTCAAAAGTTTATGTCTGACGTCGATTCGCGACGACGCTGCAACAATTCTATTGCCACAAATTGTGGGAGAGGTATCAGCGCAGGAAAATGAAGGTTGGGTTTGGAAAGAATGGAAGAGTTCACAGGTTTTATCCAAACTTCCTGTGAAAATTACTCTTCCGGGACCCATGACAATTGCGCACACAGTCGTCGATCAATACTACAAGGATCAAAAGGTTCTAGGTGCTGTGTTGACAAAAATTTTAAACAGGGAAATTAAAGCTCTTGTATCGGTTGGCTGTAAGGTTATTCAGGTTGATGAACCAGTGTTAATGCGATTTCCTGATGAGGCTTTGGATTATGGAATTGATCAACTTTCAACTTGTTTTGACGGAGTTACATCTCCTGATGTAGTCAAGGCTGTTCATTTGTGTTGCGGTTATCCAACCTACTTGGATCAGGATGACTATAAGAAGGCAGACAAGGATGGATACCTGAGACTGGCTGATAAACTAGATGCTGCAGGCTTTGATGAGATTTCAATTGAGGATGCTCATAGGCACAATGATTTATCATTATTTGAGCATTTTAAAAAGAGCAAGGTTGTTCTAGGTGTTGTTAAGATTGCTAGCAGCAAAGTAGAGAGTGTCGAGGAGATTCGCAGCCGTCTAAAGCAAGTACTGACTGTTTTACCAGCAGAACGATTGGTTGTTGCTCCAGACTGTGGTTTGGGATTTCTTCCAACTGATCTGGCAAAGCAGAAACTTTTTAATATGGTACAAGCGGCAAAGTCTTTACCCTAA
Pocillopora_acuta_HIv2___RNAseq.g24143.t1b              ATGTCTCAATGTTTTGTTCAGGGAGCATTTATGCCGGAGCATCCACAGACATCCTTTTGCAGGGCGGATTTTGTCGTTCGAGTGAAAGTGCTGGGAATGATCAAAAGGGACGACGAGGTACCCACAACGCAAGCTTCAGCGGGTGATAAGTTTACGCTTCCTCCCACGACTGAGACGCTTGTAGCGGACACAGACGCCCCAGAAACGAAGACTGCTTCCCCTCCGAGGGTTCAGAAAATTGGTGATTTACCGCCACAGAGCAATAACAGCGATGTTGGGTCGCTTCTAGGTTTCATCTTTGGCCGGCCCAAGCGAAGTATCGGAATACGAGGTGCCCCACCTCCTGGAGGAGAAATGCGGTACCGTCCAATAGGCGGGAAAGGCCGCGACAAGGGAATGATGTACTATGATGTTCTAATAAAGAAAATCTTCAGGGGTGAAGACAGAGTGCGGAGAACCAAAAGAACTTTTGTTGATGCCGCCAATCCATCTCGATTGTTTGCAAGGATTTACTTCTCCAGTCATCACGGTCAGGACCTGGTACCGGGTGATGCTTACATGCTTTCTGGTAGAATCATGGGTAATGAATTGGTTCTCCCATCGCGAGGCTGCTGGATGCAAAAATGGCGTGACCTGTCCAGAGATCAAACTCACGGTCTGAGTAAGGTCTATGCGGAAAACTGCGAGTGTATGATCCAGTTTTGCATTCCTGGTCTTTGTGCATCTATGCCTGAGTCTCAGTTGCAGTGTAACTGGGAATTGCAAAATTTTAGGCAACCAAGAAGGGATTGTGGAGCAATGCACAACACATGCGTCAATAAACATGGTCAATGCCAGTGGAAAATAGGCAAAGGATATGATTCTTGTACAAAACCCTCGGGACCTCTCCCATAA

Add column with length of sequence

perl -e '$col = 2;' -e 'while (<>) { s/\r?\n//; @F = split /\t/, $_; $len = length($F[$col]); print "$_\t$len\n" } warn "\nAdded column with length of column $col for $. lines.\n\n";' \
MC.v3.fasta2tab > MC.v3.tab_1

## Added column with length of column 2 for 54384 lines.

perl -e '$col = 2;' -e 'while (<>) { s/\r?\n//; @F = split /\t/, $_; $len = length($F[$col]); print "$_\t$len\n" } warn "\nAdded column with length of column $col for $. lines.\n\n";' \
PA.v2.fasta2tab > PA.v2.tab_1

## Added column with length of column 2 for 33730 lines.

Example output:

Montipora_capitata_HIv3___TS.g31918.t1          ATGTCATGTTATGCTATTCGATCTCATCTAGATACTGTTGCCAAGGCTTACTATGTCGGTTTGAGTCAAGCGATCGAGGATAATCCTTCACGAAATTCTAAGCTTTGCATCCTTCGATCCCAACTTGTAACACGCAGTCGTCAATTCATCCTCCTTTTGAGAGAATGCACCACCTCCTACACTGCATCATGCCTTGTACCCGACGAAAAGAGCAGCACACTTCGTGATGCCCTCGCCCGATTAGCAGTGGGTCTACACCCCCTCGATGGACCTCAAGCTGTCATTCGGGTCGACCCTGCACCGGGATTTGTCTCACTGAAAACCACACATGCTTTACAGCCTCTTGGCATCTCAGTCGAAGTTGGTCGCATCAAAAACTCCAACAAAAATCCTGTTGCCGAACGAGCTGTGCTGGAACTACAAGAAGAGCTGTTACGTCAGGAGCCTGGCGGAGGCCCTGTTACCGAACTCGGCCTTGCTGTTGCCACTGCCCGCCTGAACTCGCGACTCCGCAGCCAAGGCCTGTCCTCACGGGAGCTTTGGACACAGTGCAACCAGTGCACCAATGAGCAGATACCCGTCAATGACGTACAACACATTGTAGCCAAGCACCAAGCTCGCCAAACCAACCATCCTTTCAGCGAAGCGGCCAAAGGTGGCTACCGCCCACTAGCTCCCGTTGCCCCCTTGCAGGTTGGAGACGTGGTGTATGTCAAGTCTGACCGAGATAAGTCACGCGCCTGTGATCGCTATATCATTGTTAGCATCGACGGCGAATGGTGTTTCATTAAGAAGTTCTCTGGCTCAAAACTCAGGGCTACATCTTACAAGGTGAAGCTAGCCGAGTGCTACACTGTACCACATACCCTCCCACCACCCTCGTACCAGTCTGTCGTCCCTCCACTTGACGAAGACGAATGTGTGGAGATTCCTGAACAGCCCCAGCCTTGTCAAGAGCCCTCTGCCCCACCAGACCTGTTACGTCCGCCCAACCCAGATCCTCCAGACTCCACCCCGGACCAAGCCGAACAGCAGGAAGACTCGACCCCCCTTGAAACTGATCGTGTTCCTGTTTCAATTTCATGTGAACCAAGACCCCAACGGGCACGCCGGGCCCCCGCATACTTACAGGAGTACATTCTCAACTAG       1149
Montipora_capitata_HIv3___RNAseq.g39902.t1              ATGCAGACGTTTCGCATGGGATACAACACGCCGGTGTCTGCTCAGCGTTTCTTGGGAGTTTATGACGACTTGCGCTTGATTGTAACTCAACTCGCAAACTCTTGGGTGAAAATTGGGGGAAATGGAGAAGTTGCAGAAGATTTTGTTATTTGTGGCAATAAAGTAGGCAATAAGAGGGCACGAGAGAAACTGAACACTATTGTGGCCGGTGTCAGCTGTAAAACGATTGACATCAAACAGCCTGGGCTGAGGAAATTCTATACCAGTGGTAAAGCAGATCTTCTGGAAAAATCGATAGAGAAAAACCATGAATGCGCGATCCTTGTCCAGAAGAACTTTGACTCGAAGCGAGAGGAAGGTAAAGTACAAGGTGCACAAGTAGGTTCTGACTCCACAAGCAGTGATGACGATGATGATGAGGCTGTCATTAGTGGTGATGATGATATTTATGATGAGTGGGAAGCAGTTGTAAGTGAAACTGATCGATCATCTTTCGCCACCAAAACTGGCCACCATGTTTCGTGGAGAGCTGGAATCATTGAGGCTGAAAAGGTGAGTTAA   561

wc MC.v3.tab_1 = 54384 163152 66110515 MC.v3.tab_1 wc PA.v2.tab_1 = 33730 101190 47386752 PA.v2.tab_1

Create only sequence file - The file used to count Cs and Gs will only include the sequence

awk '{print $2}' MC.v3.tab_1 > MC.v3.tab_2

awk '{print $2}' PA.v2.tab_1 > PA.v2.tab_2

Counts CGs - both cases

-F fs = –field-separator=fs

echo "CG" | awk -F\CG '{print NF-1}' MC.v3.tab_2 > mcap_CG

echo "CG" | awk -F\CG '{print NF-1}' PA.v2.tab_2 > pacuta_CG

Count C’s

echo "C" | awk -F\[Cc] '{print NF-1}' MC.v3.tab_2 > mcap_C

echo "C" | awk -F\[Cc] '{print NF-1}' PA.v2.tab_2 > pacuta_C

Count G’s

echo "C" | awk -F\[Gg] '{print NF-1}' MC.v3.tab_2 > mcap_G

echo "C" | awk -F\[Gg] '{print NF-1}' PA.v2.tab_2 > pacuta_G

Combining counts

paste MC.v3.tab_1 \
mcap_CG \
mcap_C \
mcap_G \
> mcap_comb


head -1 mcap_comb

Montipora_capitata_HIv3___TS.g31918.t1          ATGTCATGTTATGCTATTCGATCTCATCTAGATACTGTTGCCAAGGCTTACTATGTCGGTTTGAGTCAAGCGATCGAGGATAATCCTTCACGAAATTCTAAGCTTTGCATCCTTCGATCCCAACTTGTAACACGCAGTCGTCAATTCATCCTCCTTTTGAGAGAATGCACCACCTCCTACACTGCATCATGCCTTGTACCCGACGAAAAGAGCAGCACACTTCGTGATGCCCTCGCCCGATTAGCAGTGGGTCTACACCCCCTCGATGGACCTCAAGCTGTCATTCGGGTCGACCCTGCACCGGGATTTGTCTCACTGAAAACCACACATGCTTTACAGCCTCTTGGCATCTCAGTCGAAGTTGGTCGCATCAAAAACTCCAACAAAAATCCTGTTGCCGAACGAGCTGTGCTGGAACTACAAGAAGAGCTGTTACGTCAGGAGCCTGGCGGAGGCCCTGTTACCGAACTCGGCCTTGCTGTTGCCACTGCCCGCCTGAACTCGCGACTCCGCAGCCAAGGCCTGTCCTCACGGGAGCTTTGGACACAGTGCAACCAGTGCACCAATGAGCAGATACCCGTCAATGACGTACAACACATTGTAGCCAAGCACCAAGCTCGCCAAACCAACCATCCTTTCAGCGAAGCGGCCAAAGGTGGCTACCGCCCACTAGCTCCCGTTGCCCCCTTGCAGGTTGGAGACGTGGTGTATGTCAAGTCTGACCGAGATAAGTCACGCGCCTGTGATCGCTATATCATTGTTAGCATCGACGGCGAATGGTGTTTCATTAAGAAGTTCTCTGGCTCAAAACTCAGGGCTACATCTTACAAGGTGAAGCTAGCCGAGTGCTACACTGTACCACATACCCTCCCACCACCCTCGTACCAGTCTGTCGTCCCTCCACTTGACGAAGACGAATGTGTGGAGATTCCTGAACAGCCCCAGCCTTGTCAAGAGCCCTCTGCCCCACCAGACCTGTTACGTCCGCCCAACCCAGATCCTCCAGACTCCACCCCGGACCAAGCCGAACAGCAGGAAGACTCGACCCCCCTTGAAACTGATCGTGTTCCTGTTTCAATTTCATGTGAACCAAGACCCCAACGGGCACGCCGGGCCCCCGCATACTTACAGGAGTACATTCTCAACTAG       1149    60      362     249
paste PA.v2.tab_1 \
pacuta_CG \
pacuta_C \
pacuta_G \
> pacuta_comb

head -1 pacuta_comb

Pocillopora_acuta_HIv2___RNAseq.g24143.t1a              ATGCCTCTTTCTTCCACAGTAGTTGGTTGTTTTCCAAAGCCTTCCTATCTGCACATTCCAGACTGGTTCCAAACCTCCCACAGTGGAACATTCACCGAGGATTACAATCGCTTCCTCGAAACCTCATCCAAGGAAGAAACAGAAGACCTGATCAAGAGAGCTACAAAAGATATAATTGATATCCAAGCCGAAGCTGGCATTGATGTGATCAGTGACGGTGAGTTACGGAGAGAATCTTACATCTTGCACTTCTGTCGGGCGCTAAAAGGGTTAGACTTCCACAATTTGTTTTCTAAAAACTGTCGCGATGGTGCTTGTGTCACGGACGTTCCCCGTATCATTGGCCGGGTGACACCGCGAGAAAGCGAAGCGTGGGTATGGAAAGAGTGGAAATTTTCTCAAGATTTGTCTCCGTTGCCCATGAAAATTACTATACCGGGACCGATGACAATAATAAACTCCACCGAGGATCAGTATTACAATGATGACAGGGCGCTAGGGAAAGTTTTGGCAGAGATAATAAACAGTGAGATCAAGGCTCTTGTTTCCGCTGGATGTAGATGTATTCAGATACCGGACTGGTTCCAGCCAGCGTACGCCGAAAATTTCCCAAGGGAGTACAATCGCTTTTTGCAGAGTTTTTCATCCACAGACATAGAAGAATTGTTTATCAGAGCTATTCGAGAAGTGATTGAAATACAGAGCAAAGCCGGTCTTGATGTGATAACAGATGGGGAAGTGCGGCGTGAAAACTACATCCATTACTTTTGTAGAAAGCTGAATGGCTTTGATTTCAAAAGTTTATGTCTGACGTCGATTCGCGACGACGCTGCAACAATTCTATTGCCACAAATTGTGGGAGAGGTATCAGCGCAGGAAAATGAAGGTTGGGTTTGGAAAGAATGGAAGAGTTCACAGGTTTTATCCAAACTTCCTGTGAAAATTACTCTTCCGGGACCCATGACAATTGCGCACACAGTCGTCGATCAATACTACAAGGATCAAAAGGTTCTAGGTGCTGTGTTGACAAAAATTTTAAACAGGGAAATTAAAGCTCTTGTATCGGTTGGCTGTAAGGTTATTCAGGTTGATGAACCAGTGTTAATGCGATTTCCTGATGAGGCTTTGGATTATGGAATTGATCAACTTTCAACTTGTTTTGACGGAGTTACATCTCCTGATGTAGTCAAGGCTGTTCATTTGTGTTGCGGTTATCCAACCTACTTGGATCAGGATGACTATAAGAAGGCAGACAAGGATGGATACCTGAGACTGGCTGATAAACTAGATGCTGCAGGCTTTGATGAGATTTCAATTGAGGATGCTCATAGGCACAATGATTTATCATTATTTGAGCATTTTAAAAAGAGCAAGGTTGTTCTAGGTGTTGTTAAGATTGCTAGCAGCAAAGTAGAGAGTGTCGAGGAGATTCGCAGCCGTCTAAAGCAAGTACTGACTGTTTTACCAGCAGAACGATTGGTTGTTGCTCCAGACTGTGGTTTGGGATTTCTTCCAACTGATCTGGCAAAGCAGAAACTTTTTAATATGGTACAAGCGGCAAAGTCTTTACCCTAA     157553      297     380

2. Calculate CpG O/E

pacuta_comb and mcap_comb = final files from step #1

use ^ instead of ** for exponent

awk '{print $1, "\t", (($4)/($5*$6))*(($3^2)/($3-1))}' mcap_comb > mcap_ID_CpG

head mcap_ID_CpG:

Montipora_capitata_HIv3___TS.g31918.t1   0.765493
Montipora_capitata_HIv3___RNAseq.g39902.t1       0.766505
Montipora_capitata_HIv3___RNAseq.g22918.t1       1.10518
Montipora_capitata_HIv3___RNAseq.g7985.t1        1.02165
Montipora_capitata_HIv3___RNAseq.g36632.t1       0.797079
Montipora_capitata_HIv3___RNAseq.g20158.t1       0.306959
Montipora_capitata_HIv3___RNAseq.g27719.t1       0.558765
Montipora_capitata_HIv3___RNAseq.g34935.t1       0.538187
Montipora_capitata_HIv3___RNAseq.g14270.t1       1.04623
Montipora_capitata_HIv3___RNAseq.g4696.t1        0.818074

awk '{print $1, "\t", (($4)/($5*$6))*(($3^2)/($3-1))}' pacuta_comb > pacuta_ID_CpG

head pacuta_ID_CpG:

Pocillopora_acuta_HIv2___RNAseq.g24143.t1a       0.740103
Pocillopora_acuta_HIv2___RNAseq.g24143.t1b       0.766137
Pocillopora_acuta_HIv2___RNAseq.g22918.t1        0.827021
Pocillopora_acuta_HIv2___RNAseq.g18333.t1        0.494065
Pocillopora_acuta_HIv2___RNAseq.g7985.t1         0.608104
Pocillopora_acuta_HIv2___RNAseq.g13511.t1        0.570192
Pocillopora_acuta_HIv2___RNAseq.g27719.t1        0.192805
Pocillopora_acuta_HIv2___RNAseq.g14270.t1        0.647634
Pocillopora_acuta_HIv2___TS.g15308.t1    0.489107
Pocillopora_acuta_HIv2___RNAseq.g2057.t1         0.537751

3. Functional annotation of Mcap and Pacuta

Genome functional annotation:

  • Emma Strand Mcap v3 functional annotation: https://github.com/emmastrand/EmmaStrand_Notebook/blob/master/_posts/2022-11-02-M.capitata-Genome-v3-Functional-Annotation.md
  • Emma Strand Pacuta v2 functional annotation: https://github.com/emmastrand/EmmaStrand_Notebook/blob/master/_posts/2022-11-17-Pocillopora-acuta-v2-Functional-Genome-Annotation.md

RNASeq gene names functional annotation from Rutgers:

wget http://cyanophora.rutgers.edu/Pocillopora_acuta/Pocillopora_acuta_HIv2.genes.EggNog_results.txt.gz
wget http://cyanophora.rutgers.edu/montipora/Montipora_capitata_HIv3.genes.EggNog_results.txt.gz

Sort files in EggNog functional annotation and CpG OE results files

sort -i pacuta_ID_CpG > pacuta_ID_CpG_sorted

Pocillopora_acuta_HIv2___RNAseq.10002_t          0.874359
Pocillopora_acuta_HIv2___RNAseq.10010_t          0.824372
Pocillopora_acuta_HIv2___RNAseq.1016_t   0.852742
Pocillopora_acuta_HIv2___RNAseq.10171_t          0.711555
Pocillopora_acuta_HIv2___RNAseq.10212_t          0.625815
Pocillopora_acuta_HIv2___RNAseq.10213_t          1.22103
Pocillopora_acuta_HIv2___RNAseq.10263_t          0.781719
Pocillopora_acuta_HIv2___RNAseq.10287_t          0.733141
Pocillopora_acuta_HIv2___RNAseq.10316_t          1.05813
Pocillopora_acuta_HIv2___RNAseq.10405_t          0.640849

sort -i mcap_ID_CpG > mcap_ID_CpG_sorted:

Montipora_capitata_HIv3___RNAseq.10136_t         0.615473
Montipora_capitata_HIv3___RNAseq.10187_t         1.07151
Montipora_capitata_HIv3___RNAseq.10207_t         0.566746
Montipora_capitata_HIv3___RNAseq.10214_t         0.475799
Montipora_capitata_HIv3___RNAseq.10304_t         0.893919
Montipora_capitata_HIv3___RNAseq.10384_t         0.707401
Montipora_capitata_HIv3___RNAseq.10476_t         0.666366
Montipora_capitata_HIv3___RNAseq.10478_t         0.666366
Montipora_capitata_HIv3___RNAseq.1048_t          0.827187
Montipora_capitata_HIv3___RNAseq.10547_t         0.613498

sort -i Pocillopora_acuta_HIv2.genes.EggNog_results.txt > CpG_OE/Pocillopora_acuta_HIv2.genes.EggNog_results_sorted.txt

sort -i Montipora_capitata_HIv3.genes.EggNog_results.txt > CpG_OE/Montipora_capitata_HIv3.genes.EggNog_results_sorted.txt

Combine functional annotation file with CpG sorted file

join mcap_ID_CpG_sorted Montipora_capitata_HIv3.genes.EggNog_results_sorted.txt | awk '{print $1, "\t", $2}' > Mcap_cpg_anno:

Montipora_capitata_HIv3___RNAseq.10187_t         1.07151
Montipora_capitata_HIv3___RNAseq.10207_t         0.566746

join pacuta_ID_CpG_sorted Pocillopora_acuta_HIv2.genes.EggNog_results_sorted.txt | awk '{print $1, "\t", $2}' > Pacuta_cpg_anno:

Pocillopora_acuta_HIv2___RNAseq.10002_t          0.874359
Pocillopora_acuta_HIv2___RNAseq.10010_t          0.824372

Export data to local computer

M. capitata is for Bleaching Pairs project.

scp emma_strand@ssh3.hac.uri.edu:/data/putnamlab/estrand/CpG_OE/Mcap_cpg_anno /Users/emmastrand/MyProjects/HI_Bleaching_Timeseries/Dec-July-2019-analysis/output/WGBS/Mcap_cpg_anno.txt

P. acuta is for Holobiont Integration project.

scp emma_strand@ssh3.hac.uri.edu:/data/putnamlab/estrand/CpG_OE/Pacuta_cpg_anno /Users/emmastrand/MyProjects/Acclim_Dynamics_molecular/data/WGBS/Pacuta_cpg_anno.txt

Next steps

The dataframe I need is:

Gene_ID CpG_ratio Annotation

The annotation is the GOSlim terms from the functional annotation pipeline I use.

1.) R script to analyze CpG Density 2.) Identify which Gene IDs (and annotations) are in weakly and strongly methylated sets 3.) Interpret what P. acuta and M. capitata traditionally heavily methylate (or not) 4.) Track % methylation on those two groups of genes (weakly and strongly) through time, treatment (might have to collapse temperature)

CpG ratio is only at the species level not individual because the timescale changes in nucleotides is larger than the shorter-term effect of environment or phenotype in the case of bleaching pairs.

Written on November 19, 2022