Genome-wide association study betweenDSE polymorphism and Poly-A usage inHuman population
Hiren Karathia
Sridhar Hannenhalli
Transcription & Polyadenylation (Poly-A)
http://geneed.nlm.nih.gov/images/transcription_sm.jpg
http://www.mun.ca/biology/scarr/iGen3_05-11b_Figure-L.jpg
Objectives
Genome-wide estimation of alternate Poly-A (PA) usageon 3’UTR
 
Genome-wide Prediction and investigation ofpolymorphisms in DSE (Downstream Sequence Element)motifs
Population-wide correlation study between the PAusage and DSE polymorphisms
Annotation status of Poly-A sites on 3’UTR ofHuman Genome (hg19 – 2009)
37% - Multiple Poly-A points
Target of theanalysis
RNA-Seq processing for Human Samples
Sample
Fastq files
BWA
Samtools
BAM file
Merged BAM file
Samtools
Samtools
Sorted BAM file
De-duplicated file
Picard tool
Indexing the BAM
Samtools
SAM file
Calculate Coverage
Bed tools
Calculate Relative usage of PAs
Python script
Symbol
Group of Samples
Male
Female
DNA
RNA
BR
British in England and Scotland
1
1
 
 
FI
Finnish in Finland
1
1
 
 
UT
Utah residents with Northern and Western European ancestry
1
1
 
 
YO
Yoruba in Ibadan, Nigeria
1
1
 
 
Differential Expression of UTR
Cuffdiff tools
Python script
De-novo assembly
Genome-wide estimation of alternate Poly-A(PA) usage on 3’UTR
PA1 Coverage
PA2 Coverage
PA1 Junction
PA2 Junction
Complete UTR coverage
                                           Coverage (Stop codon – PA1 junction) / Distance
PA1 Usage   =
                                                        Complete (complete 3’ UTR) / Distance
                                           Coverage (Stop codon – PA1 junction) / Distance
PA1 Usage   =
                                                        Complete (complete 3’ UTR) / Distance
                                               Coverage (Stop codon -  PA2 junction) / Distance
        PA2 Usage     =
                                                              Coverage (complete 3’UTR) / Distance
                                               Coverage (Stop codon -  PA2 junction) / Distance
        PA2 Usage     =
                                                              Coverage (complete 3’UTR) / Distance
Stop Codon
Cleaved 3’UTR
Prediction of DSE
Coding Strand of DNA
Sample  A  RNA-Seq
Sample  A DNA-Seq
De-novo assembled 3’UTR fragment
Prediction of DSE motif
Template Strand of DNA
Frequency of Poly-A usage in the samples
Correlation of different PA usage in a HumanSample
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\PEARSON_CORR_1_2_USAGE.png
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\PEARSON_CORR_2_3_USAGE.png
PA1 – PA2
PA2 – PA3
r = - 0.643; p = 0.0
r = - 0.182; p = 1.06e-33
Correlation of PA usage and correspondingDSE polymorphism
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Utah-Male_British-Female_High_Low_DSE_GG.png
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Utah-Male_British-Female_High_Low_GG.png
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Utah-Male_British-Female_High_Low_DSE_GT.png
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Utah-Male_British-Female_High_Low_GT.png
Correlation of PA usage and corresponding DSEpolymorphism
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Finnish-Female_Nigeria-Male_High_Low_GT.png
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Finnish-Female_Nigeria-Male_High_Low_DSE_GT.png
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Finnish-Female_Nigeria-Male_High_Low_GG.png
C:\UMIACS_Work\UTR_Work\UTR_RESULTS\COMPARATIVE_RESULTS\Finnish-Female_Nigeria-Male_High_Low_DSE_GG.png
Functional enrichment of Genes associatedwith Differential PA Usage and Polymorphic forof DSEs in Population
Thank you !!
Differential Expression of complete 3’UTR
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\DE_HG00266_HG00146.png
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\DE_HG00146_HG00155.png
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\DE_HG00267_HG00146.png
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\DE_HG00266_HG00267.png
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\SAMPLE_VENN.png
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\SAMPLE_VENN.png
Inter/Intra group correlation of a PA usage
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\PEARSON_CORR_1_USAGE.png
r = 0.8; p = 0.0
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\PEARSON_CORR_1_USAGE.png
r = 0.8; p = 0.0
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\PEARSON_CORR_1_USAGE.png
r = 0.98; p = 0.0
PA1 usage
BR1 – BR2
FN1 – FN2
BR1 – FN1
Statistics of predicted DSE motifs
Sample
PA type
Mean(Motif Length)
Max(Motif Length)
Min(Motif Length)
Mean(Distance)
Max(Distance)
Min(Distance)
BR-1
Single
12
79
9
30
89
1
Multiple
12
52
9
34
89
1
BR-2
Single
12
62
9
31
89
1
Multiple
12
52
9
34
89
1
FN - 1
Single
12
90
9
35
89
1
Multiple
12
54
9
39
89
1
Find Polymorphism in the DSEs
Find Correlation between the PA-usage and
DSE polymorphism
Pending
Alternate Poly-A selection mechanism
Complete 3’UTR coverageVSAlternate 3’UTR coverage
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\DE_SAMPLE_2.png
U:\Hiren\Poly_A_Samples\Results\Annotation\DE_Expression\DE_HG00266_HG00267.png
Differential expression of complete 3’UTR usage
Differential expression of PA Usage
Poly Adenylation Usage on 3’UTR
PA1 Coverage
PA2 Coverage
PA1 Junction
PA2 Junction
Complete UTR coverage
                                                          PA1 Coverage
Relative PA1 Usage     =
                                                         Longest UTR Coverage
                                                          PA1 Coverage
Relative PA1 Usage     =
                                                         Longest UTR Coverage
                                                          PA2 Coverage
Relative PA2 Usage     =
                                                         Longest UTR Coverage
                                                          PA2 Coverage
Relative PA2 Usage     =
                                                         Longest UTR Coverage
Stop Codon
Intron
Cleaved 3’UTR
DSE statistic
Sample
PA type
Mean(Motif Length)
Max(Motif Length)
Min(Motif Length)
Mean(Distance)
Max(Distance)
Min(Distance)
BR-1
Single
12
79
9
30
89
1
Multiple
12
52
9
34
89
1
BR-2
Single
12
62
9
31
89
1
Multiple
12
52
9
34
89
1
FN - 1
Single
12
90
9
35
89
1
Multiple
12
54
9
39
89
1
+ strand
- strand
Gene Strand
Template Strand
+ Read
+ Read
+ Read
- Read
- Read
RNA Strand
DNA Strand
Locations of annotated multiple PA locations on 3’UTR
PA1 Junction
PA2 Junction
Stop Codon
Cleaved 3’UTR
PA1 Junction
PA2 Junction
Stop Codon
PAs on same exon
PAs on multiple exons
U:\Hiren\Poly_A_Samples\Results\Annotation\Pearson_Correlation.png
r = 0.2578
p = 8.44e10-111
Poly-A Location
Length  of  3’UTR