Class Notes (1,100,000)
CA (650,000)
UTSG (50,000)
MGY (100)

MGY299Y1 Lecture Notes - Deoxyribonuclease, Receiver Operating Characteristic, Encode

Molecular Genetics and Microbiology
Course Code
Timothy Hughes

of 2
Most of the functional DNA appear to be regulatory elements
Top down science
K562 a common cell line used in research
RNA-seq, ET, CAGE are the most commonly used for transcript identification
DNAse-seq is the big one for chromatin accessibility identification
- Used for genome sequencing
- Long pieces of DNA and just sequence the ends of them can then assume the middle part of the ends is
- Also can know the size of the transcript
- Take fragements and clone them into vectors have restriction sites (type 2, dont cut at the site, it reaches
over the site and cut)
- Yellow parts are constant
CAGE= Cap Analysis of Gene Expression
- ?
- RNA-seq is more accurate for expression level: the above techniques are useful for identifying the location of
sequence but not ID the sequence
- MMase 1 cut nucleosomal DNA
- DNAses 1 can cut nucleosomal DNA but cant get into chromatin
- DNAse 1 accessible= open chromatin
- DHS (DNAse hypersensitive site)
- Whole genome is coated with nucleosomes
- Not against the rules to have nucleosomes in the empty space becuz DNAse can cut nucleosomal DNA
- MMase1 often picks up flanking region of the empty space: Nucleosomes carrying specific modification of
regulation site distinct nucleosome peak at assay and this is where MMase cuts
- Integration: compare tracks to each other
- Highly constrained bases should be functional
- Major highlights: GENCODE is human annotation (manually)
- More TTS than protein coding genes
- More red= higher degree of code correlation. Hypersensitive sites are hard to find motifs. TFs are just choosing
hypersensitive sites using their inherent preferences. One TF thats not following the rule is ZNF274 (C2H2 Zinc
fingers recruit repressor to repress transcripts) totally diff from all the rest. Its also not associating with
hypersensitive site
- Downer: Not many sequence specific TF. There are 1500 human pol, but they only used 87.
- # of C2H2 is half of human TF
- There 20000 human genes but there are 3 million different enhancers on top of these.
- Single TF binding site is not enough info because there are many binding sites in DNA. Most sequences dont
have binding sites so if have it it may be expressed.
- ***CTCF: insulator binding protein
- For each cell type we know the top 200 genes
- Linear regression: ?
- Area under ROC curve? Displaying false and true positive rates at a specific threshold. Random guesses will be a
diagonal. 0.5= random, 1= perfect.
- 10,000 genes are low expressed. Low specificity
- Vertical= positive, horizontal= false positive e.g. one dot could be 20 positive and 80 false positive)
- ****post-transcriptional regulation (ON THE TEST)
- RNA are not degraded randomly or at a uniform rate
- These processes are degraded by miRNA or RNA binding proteins