Biology 3594A Lecture Notes - Lecture 3: Scatter Plot, Oncogene, Ribosomal Rna

77 views2 pages
Jeffrey 1990 how will we take chaos game theory and look at DNA sequence organization?
This is testable but components are put into the lecture
In 1990, DNA sequence more popular. Yeast chromosome sequenced
People wanted to look at patterns in DNA sequences, they had a language but not tools we have today
Pattern frequency of A vs. C G T (biases in single nucleotides)
Typically sequences are not the same proportions of ATCG, and not random there are runs of Cs and As, etc. Biases
Biases in subsequence structure
Local and global structures patterns in a few hundred nucleotides, then genome
Local splice site junctions conserved, promoter sequences for transcription factors
1990 we could have pictures
Jeffrey: take chaos game theory and apply it to DNA as a language, and find patterns
2D scatter plots (X Y coordinates) plotting nucleotides
Double scoop pattern, fractal nature
All different organisms, diff genomes and diff patterns
Standard: C G A T in their respective corners (Vertices)
This a 2D scatter plot
Origin is the dot in the middle start the plotting there
Sequence on the left 4 nucleotides (5’ to 3’)
Plot half distance between origin and the vertex labeled with that nucleotide
Next nucleotide: half distance between first point and G
T appears half distance from the blue dot and T vertex
Two points next to each other are not near one another in the actual DNA sequence
But if we had a whole string of As, they would be next to each other and getting closer to the A vertex
Number of data points in the 4 quadrants 1 to 1 representation of the frequency of single nucleotides
16 subquadrants proportions of the dinucleotides
64 subquadrants proportions of trinucleotides
Red algal genome fractals and repeating patterns
Less As and GAs
Vertebrate-like pattern sparsity of points, underrepresentation of CG called the double scoop
TG is one of the highest frequencies
C to T mutation weve learned about methylation and deamination transition from C to T
K10 subsequences up to the level of 10
When a sequence is completely absent in a genome it leaves a hole in the CGR
Oncogenes and rRNA genes different origins and selective pressures -> different pattern
Viral sequences that affect human genome have this pattern, it’s like camouflage
Viral sequences that exist autonomously don’t have this pattern
There is technology to analyze these images now, and can compare them
Human mitochondrion triangle effect whe re G is sparse
Structure: subsequence composition (single nucleotide, di, tri, etc.)
Subsequence structure isn’t gene specific – plots are about genome not the gene
If you grab same gene from plant and human, it’s very different
Flexibility for a sequence to be genic (code for a gene) but also genomic (be identifiable as part of a human or plant)
Species specific related species have similar patterns, but can distinguish between species when you analyze further
Not regional (exceptions are oncogenes and rRNA genes)
Protein coding genes reflecting a genomic pattern
Lec 3 Genomic Signatures
May 12, 2018
11:55 PM
3594 Page 1
Unlock document

This preview shows half of the first page of the document.
Unlock all 2 pages and 3 million more documents.

Already have an account? Log in

Document Summary

This is testable but components are put into the lecture. People wanted to look at patterns in dna sequences, they had a language but not tools we have today. Pattern frequency of a vs. c g t (biases in single nucleotides) Typically sequences are not the same proportions of atcg, and not random there are runs of cs and as, etc. Local and global structures patterns in a few hundred nucleotides, then genome. Signature can tell if human genome, viral genome, etc. Local splice site junctions conserved, promoter sequences for transcription factors. Jeffrey: take chaos game theory and apply it to dna as a language, and find patterns. 2d scatter plots (x y coordinates) plotting nucleotides. All different organisms, diff genomes and diff patterns. Standard: c g a t in their respective corners (vertices) Origin is the dot in the middle start the plotting there. Se(cid:395)ue(cid:374)ce o(cid:374) the left (cid:1008) (cid:374)ucleotides (cid:894)(cid:1009)" to (cid:1007)"(cid:895)

Get access

Grade+20% off
$8 USD/m$10 USD/m
Billed $96 USD annually
Grade+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
40 Verified Answers
Class+
$8 USD/m
Billed $96 USD annually
Class+
Homework Help
Study Guides
Textbook Solutions
Class Notes
Textbook Notes
Booster Class
30 Verified Answers

Related Documents