Identification of regulatory links between transcription and RNA processing with long-read sequencing

STAR Protocols release

About the Documentation

All tools required for transcriptional couplings can be found in this R packages.

install.packages("devtools")
devtools::install_github("hilgers-lab/LATER")
devtools::install_github("hilgers-lab/LASER")

Preparing the data.

Data analysis starts from the bam files from Long-read sequencing data. Files can be produced using minimap2.

minimap2 -ax splice -u f genome.fa long_read.fastq.gz | samtools sort -@ 4 -o output.bam

LATER

5′-3′ database creation

refExons <- "dm6.ref.gtf"
isoformData <- prepareIsoformDatabase(refExons,
             tss.window=50,
             tes.window=150)

Complementing reference annotation with databases

refTSS <- "TSS_reference_database_dmel.bed"
isoformData <- addPromoterDatabase(refTSS, ref_tss_annot,
reference_annotation,
window = 50)

Counting full length reads

bamPath <- system.file("exdata/testBam.bam", package = 'LATER')
countData <- countLinks(bamPath, isoformData)

To explore the reads on IGV is possible to subset the alignment .bam file using the read ids. Export the read ids using:

readr::write_tsv(readAssignments(countData), "read_assignments.txt")

Then go to bash terminal and subset the bam file using samtools command:

samtools view -N read_assignments.txt -o filtered_output.bam output.bam

Estimate promoter dominance

Promoter dominance can be estimated using the following code:

gene_bias_estimates <- estimatePromoterDominance(countData, isoformData, method="chisq")

Data can be explored using following functions

results(gene_bias_estimates)
dominance(gene_bias_estimates)

Additional documentation

Additional LATER documentation can be access via

vignette("LATER")

LASER

For LASER detailed documentation and explanations go here