ReplicationDomain - Detailed Protocol


What follows is an early draft of what is now published as: Ryba T, Battaglia D, Pope BD, Hiratani I, Gilbert DM.

Genome-scale analysis of replication timing: from bench to bioinformatics.

Nat Protoc. 2011 Jun;6(6):870-95.

Genome-Scale Analysis of Replication Timing: from Bench to Bioinformatics

Tyrone Ryba, Dana Battaglia, Benjamin D. Pope, Ichiro Hiratani and David M. Gilbert*

1Department of Biological Science, Florida State University, Tallahassee, FL 32306
2Biological Macromolecules Laboratory, National Institute of Genetics, Japan
*Corresponding author
Email: gilbert@bio.fsu.edu

Running title: Genome-Scale Analysis of Replication Timing



1. Abstract

Replication timing profiles have been shown to be cell type-specific and reflect genome organization changes upon differentiation. In this work we describe details of the methodology used to analyze replication timing genome wide in mammalian cells. In this protocol, asynchronously cycling cells are pulse labeled with the nucleotide analog 5-bromo-2-deoxyuridine (BrdU) and then sorted into S-phase fractions based on DNA content using flow cytometry. BrdU-labeled DNA from each fraction is immunoprecipitated, amplified, differentially labeled, and co-hybridized to a whole-genome CGH microarray.

We also present a guide to the computational analysis of array data. Subjects include initial normalization, scaling, and data quality measures, loess (local polynomial) smoothing of replication timing values, segmentation of data into domains, and assignment of timing values to gene promoters. After these preprocessing steps, we cover various common downstream analyses, such as k-means and hierarchical clustering, as well as methods to relate changes in the replication program to gene expression and other genetic and epigenetic datasets. We recently used these methods to describe the dynamics of widespread replication timing changes in mouse and human cell culture models of development1, 2, 3.


Introduction

Although the mechanisms that specify the timing and placement of origin firing in higher eukaryotes remain a mystery, all eukaryotes have a defined replication timing program that is largely conserved between closely related species4, including human and mouse3, 5. Analyses of replication timing in various cell types have yielded insights into genome organization and repackaging events during development, suggesting an important role for the timing program itself or 3D genome organization in regulating developmental gene expression. In this article, we describe approaches for measuring replication timing genome-wide in mammalian cells. As data processing and analysis are often a bottleneck in these studies, the second half of the protocol covers several common methods for downstream analysis.


Part I: The wet side

This portion of the protocol describes how to derive raw data from mammalian cells for genome-wide replication timing analysis. Because DNA replication is a cell cycle regulated event, some form of synchronization is necessary. For multi-cellular organisms, most synchronization schemes are cumbersome and only applicable to a limited number of cell lines6, 7. Hence, in cases where multiple lines are to be compared, retroactive synchronization using a fluorescence activated cell sorter (FACS) to select cells based upon the increase in DNA content during S-phase is preferred because it can be applied to any proliferating cell population that can be dissociated into a single celled suspension. This method has been applied to Drosophila8, 9, many mouse and human cell types1, 2, 3, Arabidopsis10, and was recently used to compare replication timing across 14 yeast mutants11. The original conception of the method12, 13 labeled cells with BrdU for a fraction of S-phase and then sorted cells into several different time points during S-phase. BrdU-substituted DNA could then be isolated either based on its increased density or using anti-BrdU-antibodies and specific loci could be examined by hybridization or PCR. With microarray analysis, replication of the entire genome can be queried in a single array hybridization by limiting the analysis to two differentially labeled samples, allowing all probes to be assigned one internally normalized relative replication timing value. The two most popular permutations of the FACS method of retroactive synchronization are described. In one method, BrdU-labeled cells are sorted into early and late S-phase populations, BrdU labeled DNA is immunoprecipatated with an anti-BrdU antibody, and DNA synthesized either early or late is used as the microarray target. Since immunoprecipitation (BrdU-IP) substantially enriches for DNA synthesized in each half of S-phase, this method produces a high signal to noise ratio. However, the efficacy of the BrdU-IP can be variable and must be carefully monitored. In the second method, unlabeled cells are sorted into total S-phase vs. G1-phase populations and DNA from these stages is differentially labeled and used as the target. This obviates BrdU-IP but quantification relies entirely on the 2-fold copy number increase during S-phase. Both methods have been used and produce similar results; a direct comparison in the same cell line has been done in one study1. In both methods, DNA from each fraction is differentially labeled with Cy3 and Cy5 dyes and then cohybridized to a whole-genome oligonucleotide microarray (Nimblegen Systems). The ratio of the abundance of each probe in each fraction is then used to generate a replication-timing profile.


BrdU Incorporation and FACs sorting

The nucleotide analog 5-bromo-2-deoxyuridine (BrdU) can be used to pulse-label newly synthesized DNA during S-phase. For mammalian cell types that have 8-12 hour S phases, incubation with BrdU for two hours has been empirically determined to provide sufficient incorporation to ensure successful BrdU-IP in subsequent steps yet be short enough to identify even subtle differences in replication timing, such as between cells with one vs. two early replicating X chromosomes2. Success has also been achieved with BrdU labeling times as short as one hour, but BrdU-IP can be problematic as there is very little substituted DNA relative to background of unsubstituted DNA that will contribute to noise in the BrdU-IP. The BrdU-labeling times for cells with significantly different S-phase lengths, such as amphibian12 or fly9 cells, should be adjusted appropriately.

For first time users, it is recommended that at least 5 x 106 cells be used; however, with experience suffice sufficient S phase cells, as few as 0.5-1 x 106 starting cells can be successfully profiled. The important parameter is to obtain 20,000 - 30,000 cells in each of the early and late S-phase fractions. As described in protocol 1, ethanol-fixed cells can be stained with propidium iodide (PI) and sorted based on DNA content. Alternative fluorochromes that do not require RNAse digestion, such as chromomycin A3, can also be used with ethanol fixed cells12, 13. Some cell types tend to clump or produce a lot of cellular debris when fixed in ethanol. For these cell types, the fixation step can be skipped and DNA can be stained in permeabilized cells with DAPI, as described in protocol 2. The advantage of protocol 1 is that cells can fixed in ethanol can be stored at -20°C (empirically determined to be the optimal temperature) or shipped to collaborators. Shipping should be done on dry ice, with a partition between the tube and the dry ice to prevent cell freezing. All steps, particularly storage, should be performed in the dark since BrdU-substituted DNA is light sensitive.


REAGENTS

  • 1mg/mL BrdU (5-bromo-2-deoxyuridine)( Sigma Aldrich, B5002) (All solutions are in ddH20 unless otherwise indicated)
  • Cell culture medium appropriate for cell type
  • 1 X PBS (10 X Phosphate Buffered Saline, 137 mM NaCl , 2.7mMKCl,10mM Na2HPO4, 2mM KH2PO4, diluted with dH2O)
  • 0.2X Trypsin-EDTA (diluted with PBS from a 1X solution, Mediatech 25-053-Cl) or Accutase (Innovative Cell Technologies AT104)
  • Fetal Bovine Serum (FBS)
  • DAPI
  • Ethanol
  • 1 mg/mL Propidium Iodide(PI, Sigma P4179-100MG dissolved in ddH2O and filtered)
  • 10 mg/mL RNase A
  • SDS-PK buffer (ddH2O with 50mM Tris-HCl, pH8.0/ 10mM EDTA/ 1M NaCl/ 0.5%SDS)
  • 20mg/mL Proteinase K (dissolved in ddH2O)
  • 20mg/mL Glycogen (Fermentas RO561)
  • Tris-saturated Phenol-chloroform
  • Chloroform
  • Isopropanol
  • 70% ethanol
  • 1 X TE
  • 10X IP Buffer (ddH20 with 0.1M sodium phosphate, pH 7.0 / 1.4M NaCl / 0.5% Triton X-100)
  • 1 X IP Buffer (10 X IP buffer diluted in ddH20)
  • Anti-BrdU antibody (BD Biosciences Pharmigen, Cat.#555627; 0.5mg/ml diluted to 12.5 g/ml in PBS)
  • 10 M ammonium acetate
  • Rabbit anti-mouse IgG (Sigma, Cat#M-7023)
  • Anti-Mouse IgG-AlexaFluor488 (Invitrogen/Molecular Probes, Cat#A-11029)
  • Digestion Buffer (ddH20 with 50mM Tris-HCl, pH 8.0 / 10mM EDTA / 0.5% SDS)
  • Standard PCR and electrophoresis equipment and reagents
  • PCR primers for BrdU-IP quality verification (See steps 45-49) CRITICAL
  • 2 mg/mL glycogen
  • Anti-BrdU antibody (BD Biosciences Pharmigen, Cat.#555627, undiluted)
  • 0.1M HCl / 0.5% Triton X-100 in ddH2O
  • 0.1M Sodium Tetraborate, (Na2B4O7 10H2O), pH 8.5
  • 0.5% Tween20 / 1% BSA in PBS
  • 0.1% Triton X-100 in PBS
  • 0.5% Triton X-100 in PBS
  • GenomePlex Complete Whole Genome Amplification Kit CRITICAL
  • GenomePlex WGA Reamplification Kit CRITICAL
  • QIAquick PCR Purification Kit CRITICAL
  • NimbleGen Dual-Color DNA Labeling Kit (cat. no. 05223547001) CRITICAL
  • NimbleGen Hybridization Kit (cat. no. 05583683001) CRITICAL
  • NimbleGen Wash Buffer Kit (cat. no. 05584507001) CRITICAL

EQUIPMENT

  • Nylon mesh 37 micron (Small Parts, CMN-0040-D)
  • 5mL round bottom polystyrene tube (Falcon 352054)
  • Hemacytometer
  • Vortex Genie
  • 15mL round bottom tube (Falcon 2059)
  • FACS Aria cell sorter (or comparable sorter)
  • Sonicator
  • Appropriate NimbleGen Arrays and Mixers CRITICAL
  • Appropriate NimbleGen Hybridization System CRITICAL
  • Appropriate NimbleGen microarray scanner CRITICAL

REAGENT SETUP

  • SDS-PK buffer : To make 50mL, combine 34mL ddH2O, 2.5mL 1M Tris-HCl pH8.0, 1mL 0.5 M EDTA, 10mL 5 M NaCl and 2.5mL 10% SDS. Warm to 56°C before use to completely dissolve SDS.
  • 10X IP Buffer: To make 50mL, combine 28.5mL ddH2O, 5mL 1M Sodium Phosphate pH 7.0, 14 mL 5M NaCl, and 2.5mL 10% Triton X-100.
  • Digestion Buffer: To make 50mL, combine 44mL ddH2O, 2.5mL 1M Tris-HCl pH8.0, 1mL 0.5M EDTA, 2.5mL 10% SDS.

PROCEDURE

PROTOCOL 1. BrdU labeling and staining of cells for FACS

This protocol can be performed using PI staining and ethanol fixation prior to sorting (option A), or, for cells that break or clump in ethanol, by collecting cells following pulse labeling with BrdU and treating with a DAPI staining solution containing Triton X-100 to permeabilize cell membranes (option B). The drawback of option B is that cells need to be sorted immediately following staining.

(A) Labeling and PI staining of cells for FACs following ethanol fixation. TIMING 3.5 hours

  1. For adherent cells, remove cell culture medium and replenish with fresh medium. Add BrdU at a final concentration of 50M. For suspension cells, add BrdU to the cell culture medium at a final concentration of 50M.
  2. Incubate cells for two hours in a carbon dioxide incubator at 37°C, 5% CO2.
  3. For adherent cells, rinse gently with ice-cold PBS twice. For suspension cells, collect cells in a 15mL tube and proceed directly to step 6.
  4. Detach adherent cells using 0.2X Trypsin-EDTA for 2-3 minutes or Accutase for 3-6 minutes. CRITICAL STEP Incubate cells at 37°C with the enzyme treatment and/or use gentle trituration if necessary to achieve a single cell suspension, as this is necessary for accurate FACS sorting.
  5. Add 5mL of cell culture medium (containing FBS if trypsin has been used) to the cell culture dish or flask, pipette gently, and transfer contents to a 15mL round bottom tube.
  6. Count the number of cells collected using a hemacytometer.
  7. Centrifuge at approximately 200 g for 5 minutes.
  8. Aspirate supernatant carefully and resuspend cells in 2.5 mL of ice-cold PBS containing 1% FBS.
  9. Add 7.5 mL of ice-cold 100% ethanol dropwise while gently vortexing. CRITICAL STEP Note that gentle vortexing is required.
  10. Seal the cap of the 15 mL tube with parafilm and mix gently but thoroughly.
  11. PAUSE POINT Proceed to the next step to begin PI staining or stop here and store tubes in the dark at -20°C. indefinitely.
  12. Resuspend the BrdU-labeled, ethanol fixed cells by tapping and inverting the tube.
  13. Transfer 4 x 106 - 8 x 106 cells to a 5 mL polystyrene round bottom tube.
  14. Centrifuge at approximately 200 g for 5 minutes.
  15. Decant supernatant carefully
  16. Add 2 mL of PBS with 1% FBS and mix well by tapping the tube.
  17. Centrifuge at approximately 200 g for 5 minutes.
  18. Decant supernatant carefully.
  19. Resuspend cell pellet in PBS with 1% FBS to achieve a solution of 3 x 106 cells/mL.
  20. Add 1mg/mL propidium iodide to a final concentration of 50 g/ mL.
  21. Add 10 mg/mL RNase A to a final concentration of 250 g/mL.
  22. Tap the tube to mix and incubate for 20 to 30 minutes at room temperature in the dark.
  23. Filter cells by pipeting them through 37 micron nylon mesh into a 5 mL polystyrene round bottom tube.
  24. Keep samples on ice in the dark and proceed directly to FACS sorting.

(B) BrdU labeling and DAPI staining of cells for FACS. TIMING 3 hours

  1. Remove cell culture medium from growing cells and replace with cell culture medium containing BrdU at a final concentration of 50uM.
  2. Incubate cells for two hours in a carbon dioxide incubator at 37°C, 5% CO2.
  3. Rinse adherent cells gently with ice-cold PBS twice.
  4. Detach adherent cells using 0.2X Trypsin-EDTA for 2-3 minutes or Accutase for 3-6 minutes. CRITICAL STEP Incubate cells at 37 °C with the enzyme treatment and/or use gentle trituration if necessary to achieve a single cell suspension, as this is necessary for accurate FACS sorting.
  5. Add 5mL of cell culture medium (containing FBS if trypsin has been used) to the cell culture dish or flask, pipette gently, and transfer contents to a 15mL tube. Count the number of cells collected using a hemacytometer.
  6. Centrifuge at approximately 200 g for 5 minutes.
  7. Aspirate the supernatant carefully.
  8. Add 5mL of ice-cold PBS and pipette gently but thoroughly.
  9. Centrifuge at approximately 200 g for 5 minutes.
  10. Aspirate supernatant carefully.
  11. Resuspend the cell pellet in DAPI staining solution (1 X PBS with 0.1% Triton X-100 and 2g/mL DAPI) to achieve a solution of 5 x 106 10 x 106 cells/mL.
  12. Filter cells by pipeting them through 37 micron nylon mesh into a 5 mL polystyrene round bottom tube.
  13. Keep samples on ice in the dark and proceed directly to FACS sorting.

PROTOCOL 3. FACS sorting fractions of S-phase. TIMING 1 hours

This protocol describes the use of flow cytometry to sort cells based on DNA content. Electrostatic sorting is performed as described here on a BD (Becton-Dickinson) FACS Aria flow cytometer. During FACS analysis forward and side scatter analyses should be used to select an appropriate population of cells free of doublets or cell debris, both of which can hinder accurate sorting of desired populations. Lasers used in this protocol include 488 Blue to detect PI or 407 Violet to detect DAPI in cells that have been stained for DNA content. Two separate fractions of S phase, early and late, are chosen to be collected, but more can be collected if desired12, 13.

  1. Run sample on FACS Aria cell sorter (Any comparable cell sorter can be used) CRITICAL STEP It is very important to keep all samples chilled on ice or at 4°C during FACS analysis to avoid cell cycle progression in the absence of BrdU. Protect samples from light.
  2. Use forward and side scatter information to select the desired population of cells to be included in the sort.
  3. Create a histogram that plots cell count on the y-axis and DNA content (fluorochrome intensity) on the x-axis. See Fig.1
  4. Select two distinct S-phase populations to be sorted into separate fractions as indicated in Fig.1.
  5. Store cell immediately on ice in the dark until all samples have been sorted. Cells are sorted into fresh 5ml round bottom tubes and kept at 4°C during the sort.
  6. Centrifuge at 400g for 10 minutes at 4°C. Alternatively, if fewer than 150,000 cells have been collected for each fraction, proceed directly to step 8.
  7. Decant supernatant gently, only once. CRITICAL STEP The cell pellet can easily become loose. Some residual sheath fluid can be left in the tube.
  8. Add 1 mL of SDS-PK buffer containing 0.2mg/mL Proteinase K and 0.05mg/mL glycogen for every 100,000 cells collected and mix vigorously by tapping the tube.
  9. Incubate samples in a 56°C water bath for 2 hours.
  10. Mix sample thoroughly and aliquot 200l, equivalent to approximately 20,000 cells, per 1.5 mL tube.
  11. Stop here and store samples at -20°C in the dark or proceed directly to step 12. PAUSE POINT Samples can be stored for at least 6 months before use.
  12. Add 200l SDS-PK buffer with 0.05mg/mL glycogen to each aliquot and proceed directly to BrdU immunoprecipitation.
Fig.1

Fig.1 A typical cell cycle profile for a mammalian fibroblast population obtained during FACS analysis by plotting cell count vs. DNA content. In this example, cellular DNA was stained with PI, so DNA content is represented by PI intensity. A G1 peak, representing cells with 2N DNA content, and a G2/M peak, representing cells that have undergone replication and therefore possess a 4N DNA content are labeled. The area between these two peaks is representative of cells in S phase, and can be sorted into two fractions, as indicated here, to obtain early and late S phase samples.


Immunoprecipitation of BrdU labeled DNA

This protocol begins with cells that have been pulse labeled with BrdU and sorted into different cell cycle stages by FACS. Here DNA from these cells is sonicated into fragments ranging from 250bp to 2kb and then immunoprecipitated using an anti-BrdU antibody. Immunoprecipitated DNA is then purified and ready for use in further steps of replication timing analysis for a time period of up to one month. If samples have been stored at -20°C prior to beginning the immunopreciptation, thaw samples in a 56°C water bath and add 200l of SDS-PK Buffer pre-warmed to 56°C with 0.05mg/mL glycogen to each sample prior to performing the phenol-chloroform extraction in step 1.


PROTOCOL 4. BrdU-IP. TIMING 2 days

This protocol describes the use of flow cytometry to sort cells based on DNA content. Electrostatic sorting is performed as described here on a BD (Becton-Dickinson) FACS Aria flow cytometer. During FACS analysis forward and side scatter analyses should be used to select an appropriate population of cells free of doublets or cell debris, both of which can hinder accurate sorting of desired populations. Lasers used in this protocol include 488 Blue to detect PI or 407 Violet to detect DAPI in cells that have been stained for DNA content. Two separate fractions of S phase, early and late, are chosen to be collected, but more can be collected if desired12, 13.

  1. Extract once with phenol-chloroform, collecting the upper phase in a 1.5mL tube.
  2. Extract once with chloroform, collecting the upper phase in a 1.5mL tube.
  3. Add 1 volume of isopropanol and mix well.
  4. Store at -20°C for 20 minutes.
  5. Centrifuge at 16.1 g for 30 minutes at 4°C.
  6. Discard the supernatant and add 750l 70% ethanol to the pellet
  7. Centrifuge at 16.1 g for 5 minutes at 4°C, then remove all ethanol and let the pellet dry.
  8. Resuspend the pellet in 500l 1x TE. PAUSE POINT Stop here and store overnight at 4°C or proceed directly to step 9.
  9. Sonicate DNA to an average size of ~0.7-0.8 kb.
  10. Incubate sample at 95°C for 5 minutes to heat denature.
  11. Cool sample on ice for 2 minutes.
  12. Add 60l of 10X IP Buffer to a clean 1.5mL tube.
  13. Add the denatured DNA to the tube containing 60l 10X IP Buffer.
  14. Add 40l of 12.5 g/ml anti-BrdU antibody.
  15. Incubate for 20 minutes at room temperature with constant rocking. CRITICAL STEP Foil tubes to keep samples in the dark.
  16. Add 20g of rabbit anti-mouse IgG.CRITICAL STEP Foil tubes to keep samples in the dark.
  17. Incubate for 20 minutes at room temperature with constant rocking.
  18. Centrifuge at 16.1 g for 5 minutes at 4°C
  19. Remove supernatant completely. CRITICAL STEP If pellet becomes loose, then centrifuge sample again in order to completely remove the supernatant without disturbing the pellet. Several centrifugations may be necessary to completely remove supernatant.
  20. Add 750l of 1X IP Buffer that has been chilled on ice.
  21. Centrifuge at 16.1 g for 5 minutes at 4°C.
  22. CRITICAL STEP Remove supernatant completely, as in step 19.
  23. Resuspend the pellet in 200l digestion buffer with freshly added 0.25mg/mL proteinase K. PAUSE POINT Incubate samples overnight at 37°C.
  24. Add 100l of fresh digestion buffer with freshly added 0.25mg/mL proteinase K.
  25. Incubate samples for 60 minutes at 56°C.
  26. Extract once with phenol-chloroform, collecting the upper phase in a 1.5mL tube.
  27. Extract once with chloroform, collecting the upper phase in a 1.5mL tube.
  28. Add 0.625l of 20mg/mL glycogen, 100l of 10 M ammonium acetate, and 750l of 100% ethanol and mix well.
  29. Store at -20°C for 20 minutes.
  30. Centrifuge at 16.1 g for 30 minutes at 4°C.
  31. Remove supernatant, rinse pellet with 70% ethanol and dry.
  32. Resuspend the pellet in 80l 1X TE. PAUSE POINT Store DNA at 4°C for up to one month.

Quality control check of early and late S-phase DNA

Due to the sensitivity and large number of steps involved, BrdU-IP is one of the trickiest parts of the protocol. Thus to ensure quality, BrdU-IPs are screened by PCR amplification using primers specific to DNA markers of known relative replication time (i.e. early or late). Although real-time PCR can be performed, we find gel electrophoresis to be sufficient to evaluate enrichment of DNA in each IP sample. Importantly, PCR results can vary between aliquots of the same sample. Furthermore, replication timing can vary between cell types2, 3. Consistency across multiple samples from the same cell type is therefore the best way to verify quality. We typically use the primer sets listed below to screen a few IPs from both early and late S phase fractions.

Among the mouse sequences listed below, mitochondrial DNA replicates throughout the cell cycle14 and is equally represented in early and late S-phase fractions. Alpha-globin, Pou5f1 and Mmp15 are generally early replicating markers, whereas beta-globin, Zfp42, Dppa2, Ptn, Mash1, and Akt3 are generally late replicating markers. Note that some of these genes are known to switch replication timing at some point during development; for instance, Zfp42 and Dppa2 are early replicating in ESCs whereas they are late replicating in all somatic cell types examined to date. Therefore, as mentioned, consistency across multiple samples from the same cell type is usually the most reliable way to assess the quality of IP samples.

Among the human sequences listed below, mitochondrial DNA is equally represented in early and late S-phase fractions, while alpha-globin, MMP15 and BMP1 are generally early replicating markers. PTGS2, NETO1, SLITRK6, ZFP42 and DPPA2 are generally late replicating. Markers amplified, sizes and primer sequences are as follows:

Mouse test regions

Intergenic mitochondrial DNA, 346 bp Forward 5'-GACATCTGGTTCTTACTTCA-3' Reverse 5'-GTTTTTGGGGTTTGGCATTA-3' Intergenic mitochondrial DNA, 346 bp Forward 5'-GACATCTGGTTCTTACTTCA-3' Reverse 5'-GTTTTTGGGGTTTGGCATTA-3'
Intergenic mitochondrial DNA, 346 bp Forward 5'-GACATCTGGTTCTTACTTCA-3' Reverse 5'-GTTTTTGGGGTTTGGCATTA-3' Alpha-globin, 439 bp Forward 5'-AAGGGGAGCAGAGGCATCA-3' Reverse 5'-AGGGCTTGGGAGGGACTG-3'
Beta-globin, 369 bp Forward 5'-CAGTAAGCCACAGATCCTATTG-3' Reverse 5'-CCCATAGTGACTATTGACTGTG-3' Pou5f1, 194 bp Forward 5'-CCCTCCCTAAGTGCCAGTTTCT-3' Reverse 5'-GTAATCGCCCTCAGCAGTGTCT-3'
Mmp15, 360 bp Forward 5'-AACAGAAGGCCTGCCTTGAC-3' Reverse 5'-TGCATAGCACGACAGCATTG-3' Zfp42, 211 bp Forward 5'-TGAGATTAGCCCCGAGACTGAG-3' Reverse 5'-CGTCCCCTTTGTCATGTACTCC-3'
Dppa2, 199 bp Forward 5'-CCACAGGAAGACAGGAAGCAGT-3' Reverse 5'-AGCCAGACAGGAGCCCTAGAGT-3' Ptn, 230 bp Forward 5'-CTGGAATGAGTTACTGACGGGG-3' Reverse 5'-CTGGCCCCACTGTGTAATAAGC-3'
Mash1, 182 bp Forward 5'-GAAGATGAGCAAGGTGGAGACG-3' Reverse 5'-AGTAGGACGAGACCGGAGAACC-3' Akt3, 173 bp Forward 5'-GAAGTGTGGGTTGAACCTCTGG-3' Reverse 5'-GCACCCTCTCCACTGTTCTGAT-3'

Human test regions

Intergenic mitochondrial DNA, 168 bp Forward 5'-CCTAGGAATCACCTCCCATTCC-3' Reverse 5'-GTGTTTAAGGGGTTGGCTAGGG-3' Alpha-globin, 257 bp Forward 5'-GACCCTCTTCTCTGCACAGCTC-3' Reverse 5'-GCTACCGAGGCTCCAGCTTAAC-3'
Beta-globin, 241 bp Forward 5'-CCTGAGGAGAAGTCTGCCGTTA-3' Reverse 5'-GAACCTCTGGGTCCAAGGGTAG-3' MMP15, 249 bp Forward 5'-CAGGCCTCTGGTCTCTGTCATT-3' Reverse 5'-AGAGCTGAGAAACCACCACCAG-3'
BMP1, 177 bp Forward 5'-GATGAAGCCTCGACCCCTAGAT-3' Reverse 5'-ACCCGTCAGAGACGAACTTGAG-3' hPTGS2, 230 bp Forward 5'-GTTCTAGGCTGGTGTCCCATTG-3' Reverse 5'-CTTTCTGTACTGCGGGTGGAAC-3'
hNETO1, 286 bp Forward 5'-GGAGGTGGAATGCTAGGGACTT-3' Reverse 5'-GCTGAGTGTGGCCTTAAGAGGA-3' hSLITRK6, 281 bp Forward 5'-GGAGAACATGCCTCCACAGTCT-3' Reverse 5'-GTCCTGGAAGTTGAGTGGATGG-3'
hZFP42, 233 bp Forward 5'-CTTGTGGGGACACCCAGATAAG-3' Reverse 5'-AACCACCTCCAGGCAGTAGTGA-3' hDPPA2, 168bp Forward 5'-AGGTGGACAGCGAAGACAGAAC-3' Reverse 5'-GGCCATCAGCAGTGTCCTAAAC-3'

PROTOCOL 5. PCR method for quality control of BrdU-immunoprecipitation TIMING 5 hours

  1. Prepare enough PCR master-mix to screen all IP samples with each primer set. Each reaction requires:

    9.63 uL nuclease free water
    1.25 uL 10X Buffer
    0.25 uL 10mM dNTPs
    0.06 uL X U/mL Taq Polymerase
    0.31 uL F/R 20uM combined primers***
     
  2. Aliquot 11.5 uL master-mix per tube and add 1 uL IP sample.
  3. Run samples in thermocycler with an initial denaturation step of 94°C for 2 min, followed by 38 cycles of 94°C for 45 sec, 60°C for 45 sec and 72°C for 2 min, and then a final elongation step of 72°C for 5 min.
  4. Add 2.5 uL 6x loading dye to every 12.5 uL reaction and load 6 uL onto 1.5% agarose gel.
  5. Score each IP based on anticipated enrichment of amplicon DNA. CRITICAL STEP Before proceeding, verify sample quality with corresponding primer sets listed above.
  6. If several IPs of the same sample and S-phase fraction pass the screening, pool equal amounts of each IP to a final volume of 50 uL (e.g. If two IPs pass, combine 25 uL of each to pool).

*** Mitochondrial primer sets should be used at 1.0 uM concentration instead of 0.5 uM. For this master-mix add 0.63 uL F/R 20uM combined primers and 9.31 uL nuclease-free water per reaction.


Amplification methods for immunoprecipitated single-stranded DNA

Once purified by immunoprecipitation and screened for sample quality, BrdU incorporated DNA must be amplified to obtain sufficient amounts for array hybridization. If multiple samples pass PCR screening, DNA from parallel immunoprecipitations can be pooled together and used as the starting material for whole genome amplification. Otherwise, a single screened immunoprecipitation is used. Whole genome amplification (WGA) is performed using GenomePlex Complete Whole Genome Amplification Kit*** (Sigma Catalog Number WGA2) followed by GenomePlex WGA Reamplification Kit (Sigma Catalog Number WGA3). Amplified samples are then loaded onto a gel to determine size range and screened once more by PCR to ensure no bias was introduced during amplification.

***GenomePlex Single Cell Whole Genome Amplification Kit (Sigma Catalog Number WGA4) could also be used15, but in our hands WGA2 IP samples "biased" after WGA2 did not improve after WGA4. Thus, we continue to use WGA2 for consistency.


PROTOCOL 6. Whole Genome Amplification TIMING 8 hours

  1. Precipitate DNA fractions by adding 1.25 uL 2 mg/mL glycogen, 20 uL 10M ammonium acetate and 150 uL ethanol to each 50 uL IP sample (If pooling multiple samples, 50 uL total volume should still be used). Mix well, let stand at -20°C for 20 min, then centrifuge for 30 min at max speed and 4°C.
  2. Rinse pellets with 70% ethanol, air dry and resuspend in 10 uL nuclease free water.
  3. Transfer the 10uL samples to 0.2 mL PCR tubes and follow GenomePlex Complete Whole Genome Amplification Kit (Sigma Catalog Number WGA2) procedure from the Library Preparation step (i.e. skip Fragmentation)16.
  4. Purify entire WGA products using QIAquick PCR Purification Kit (Qiagen Catalog Number 28106). Elute in 30 uL nuclease free water pre-warmed to 65°C and determine concentration using Nanodrop.
  5. Dilute WGA samples to appropriate concentration (we use 1 uL DNA of 20 ng/uL) and follow GenomePlex WGA Reamplification Kit (Sigma Catalog Number WGA3) Reamplification Procedure A.
  6. Purify entire reamplified WGA products using QIAquick PCR Purification Kit as done in step 54.
  7. Screen purified final products using same PCR method used to screen BrdU-IPs as described in protocol 5.

Alternative to Steps 1-57: S/G1 Method TIMING 1 day

In this method, cells are sorted into two fractions, G1 phase and S phase, based on DNA content, and replication timing is derived from the 2-fold copy number increase for early vs. late replicating sequences in pure S-phase populations. DNA analysis using flow cytometry can be performed simply by the use of a single DNA-binding fluorescent dye such as PI or DAPI as originally described (woodfine, 2004). Although this method is adequate, simultaneous measurement of BrdU-labeled DNA by performing BrdU/PI double staining for cell cycle analysis, can discriminate G1 and early-S cells much more efficiently than by PI-only staining. In addition, some cell types, particularly those derived from differentiated stem cells or primary tissues, can produce debris that interferes with good S phase sorting and a short BrdU label described here can eliminate debris, which is not labeled with BrdU. The advantage of this method is that it eliminates the need for BrdU-IP and whole genome amplification steps (described below), which need to be carefully controlled. However, direct comparisons have shown that this method produces a lower signal to noise ratio than Protocol 11.

A much shorter BrdU pulse label is used in this protocol at lower concentration, because we are only trying to identify the cells in S phase. With longer BrdU labeling times, G2/M cells become labeled. It should be noted that we originally used the standard protocol for BrdU/PI analysis provided by Becton-Dickenson, which is fine for analysis. However, we found that the high concentration of HCl in this method sheared S phase (BrdU-substituted) DNA to very small fragments that precluded subsequent steps of the protocol. By titrating the HCl, we found that 0.1M HCl produced the optimal compromise between good S vs. G1 separation and minimal DNA shearing.

For BrdU/PI double staining, correction of spectral overlap is critical for successful experiments. Spectral overlap exists between emission spectra of PI and FITC/AlexaFluor488 (for BrdU). Without correction, the BrdU/PI plots typically look like Fig. xxA. For this correction, the adjustment of the ratio between PI and AlexaFluor488 (or FITC) gains can significantly reduce the skewing shown in Fig. xxA. Subtraction of the FITC signal from the PI signal (=compensation) may also be required. To perform these corrections, a BrdU-only control is required, prepared by staining BrdU-labeled cells without the addition of PI. A PI-only control also helps, prepared by staining non BrdU-labeled cells for BrdU and PI. [Note: BrdU-labeled specimen stained for PI only does not reflect background signals derived from the anti-BrdU antibody and thus is not as good as unlabeled cells.] This step can be time-consuming but is critical for successful sorting. We suggest that you first adjust the gains of FSC and SSC, then adjust the PI and AlexaFluor488 gains by trial and error to get the best possible BrdU/PI plot. You may be able to obtain a reasonable BrdU/PI plot without compensation, but if not, compensate by subtracting FITC signal from PI signal. The lower the percentage subtracted, the better. Fig.1

Fig.2 2D Cell Cycle Sorting for S and G1 phases.


Cells labeled with BrdU and stained as described in Protocol 7, and then analyzed on a FACS.

A. A typical non-corrected BrdU/PI plot. Note how the plot is skewed to the right due to spectral overlap.

B. A corrected BrdU/PI plot . Sorting windows for nicely separated G1, S and G2/M phases of the cell cycle are indicated.


PROTOCOL 7. S/G1 FACS Sorting TIMING 4 hours

  1. For adherent cells, remove cell culture medium from exponentially growing cells and replace with cell culture medium containing BrdU at a final concentration of 10 uM. For suspension cells, add BrdU to the cell culture medium at a final concentration of 10M. In order to obviate the amplification step prior to labeling and array hybridization, start with 6 million cells. One should also prepare a small sample of non BrdU-labeled, EtOH-fixed cells for PI-only control and set aside a small number of BrdU-labeled cells for BrdU-only control (for adjustment/compensation, see TROUBLESHOOTING).
  2. Incubate cells for 30 minutes in a carbon dioxide incubator at 37C, 5% CO2
  3. Fix as in protocol 1, steps 3-11. PAUSE POINT Cells can be stored as in protocol 1
  4. Aliquot (multiples of) < 3 x 106 cells in a 1.5mL tube(s), centrifuge for 5 min., 200 g at room temperature. Centrifugation and removal of supernatant is much easier with 1.5 mL tubes as the pellets are very loose.
  5. Aspirate supernatant completely with P200. (will be effective if spun down ~3 sec once again to discard residual supernatant same for all aspiration procedures below)
  6. Loosen pellet by brief vortexing.
  7. Add 1ml of 0.1M HCl/0.5% Triton X-100, resuspend by tapping.
  8. Incubate 15 min,room temperature in the dark.
  9. Centrifuge 5 min, 200 g at room temperature, then aspirate supernatant completely.
  10. Add 1ml of 0.1M Sodium Tetraborate and resuspend by tapping.
  11. Centrifuge 5 min, 200 g at room temperature, then aspirate supernatant completely.
  12. Add 0.15 g anti-BrdU antibody in 0.5 ml 0.5% Tween20/1% BSA/PBS and resuspend by tapping.
  13. Incubate for 30 min at room temperature in the dark.
  14. Centrifuge 5 min, 200 g at room temperature, then aspirate supernatant completely.
  15. Add 0.5 ml of 0.5% Tween20/1% BSA/PBS.
  16. Centrifuge 5 min, 200 g at room temperature, then aspirate supernatant completely.
  17. Add 1 g anti-Mouse IgG-AlexaFluor488 in 100 l 0.5% Tween20/1% BSA/PBS and resuspend by tapping (or when 1-2 x 106 cells are used, add 0.5 g anti-Mouse IgG-AlexaFluor488 in 50 l).
  18. Incubate for 30 min at room temperature in the dark.
  19. Centrifuge 5 min, 200 g at room temperature, then aspirate supernatant completely.
  20. Add 0.5 ml of 0.5% Tween20/1% BSA/PBS.
  21. Centrifuge 5 min, 200 g at room temperature, then aspirate supernatant completely.
  22. Resuspend the pellet in 1 ml of 5g/ml PI in PBS (For BrdU-only control, just add PBS).
  23. Transfer to a round bottom 5mL tube (i.e. Falcon 2054).
  24. Adjust concentration to 2 x 106 /ml by adding 5 g/ml PI in PBS
  25. Bring to Flow lab for sorting (on ice, dark)
  26. Filter with 37 m mesh filter.

Labeling and hybridization of amplified samples

In order to avoid the ambiguity inherent to generalized methods, this protocol is based specifically on NimbleGen products. During our earliest attempts to profile replication timing genome-wide, NimbleGen outperformed other platforms we tested and became our platform of choice. Others have successfully applied this method to additional platforms8, 9, including deep sequencing of BrdU-IP DNA17. Currently, mammalian replication timing data generated from microarray hybridization and deep sequencing is of equal quality1, 3, while the microarray method remains more cost-effective and the bioinformatics are considerably less demanding for the typical laboratory. Future advances reducing BrdU-labeling times and sequencing limitations may make this method more worthwhile and accessible18.

Beyond platform choice, array design is also an important consideration. The nature of your study should be a guide in selecting between the variety of available standard and custom designs. For our genome-wide studies in both mouse and human, 385K and 3X720K Comparative Genomic Hybridization (CGH) tiling arrays have sufficient probe densities, showing no disadvantage compared to high density 2.1M CGH tiling arrays1, 2, but have considerable cost and convenience advantages. Once a platform is chosen, the hybridization step is fairly straightforward. Array scanning should be carried out according to manufacturer recommendations avoiding unnecessary laser exposure. Though care should be taken to align channels with respect to signal intensity frequencies, minor differences between channels usually do not impact smoothed timing profiles after normalization.


PROTOCOL 8. Labeling and hybridizing TIMING 1-3 days

Method

  1. Differentially label reamplified early and late WGA DNA fractions with Cy3 and Cy5 dyes following the Sample Labeling Instructions for NimbleGen Dual-Color DNA Labeling Kit (cat. no. 05223547001). Briefly, 1 ug of early or late replicating DNA is labeled with either Cy5 or Cy3 random 7-mer dyes by Klenow reaction, precipitated with isopropanol, and resuspended in nuclease-free water and quantified. As instructed, combine equal quantities of labeled early and late fraction DNA (specific quantity will depend upon array design).
  2. Transfer labeled samples to hybridization buffer, denature at 95C for 5 min, prepare array(s) with appropriate mixer(s) (specific mixer will depend upon array design), load samples on array(s) and hybridize using NimbleGen Hybridization Kit (cat. no. 05583683001) at 42C for 24 to 72 hours depending on the array feature size .
  3. Following Hybridization, wash array(s) as directed using NimbleGen Wash Buffer Kit (cat. no. 05584507001).
  4. Scan array(s) and save images as .tif files using appropriate NimbleGen microarray scanner and software package. Open .tif files in NimbleScan software, overlay grid on the scanned image (i.e. grid cells correspond to probes), and generate two .pair files per experiment, which contain the signal intensity of all probes on the array for either Cy3 or Cy5, for normalization and downstream analysis. Specific instructions for these steps can be found in NimbleGen Arrays Users Guide, CGH Analysis***.
  5. If desired, arrays can be stripped for reuse using NimbleGen Array Reuse Kit 40 (contact NimbleGen customer service for ordering information). Scan array to confirm successful stripping.

***In our hybridization experiments, we used NimbleGen scanner GenePix 4000B and the accompanying NimbleGen Arrays Users Guide, CGH Analysis v5.1. Newer equipment is accompanied with a newer version of the User's Guide and is operated in a slightly different manner. In v5.1 of the User's Guide, Chapter 3 covers Sample Labeling, Chapter 4 covers Hybridization and Washing, Chapter 5 covers Two-Color Array Scanning, and step 6 of Chapter 6 gives details on the generation of .pair reports.


Part II: Normalization and computational analysis of replication timing datasets

General methods for normalizing and analyzing microarray experiments for chromatin modifications or transcription at gene promoters have been described in detail in other works1-3. In this section of the protocol we focus on the methods specifically useful for replication timing analysis using whole-genome comparative genome hybridization (CGH) microarrays22. Similar to two-color microarray designs comparing an experimental sample to a reference, the replication timing experiments we describe employ a two-channel design comparing early versus late fraction enrichment for each target. Typically, we include two dye-swap replicates per sample to address bias due to dye-specific effects, such as more rapid photobleaching of Cy5 dye than Cy3.

All of the analysis described here uses the R framework for statistical computing5-7. Through user-submitted packages that facilitate a wide variety of methods, R has become an indispensible tool for many common computational tasks. Although R has an initially steep learning curve due to its command line interface, help is available in many locations and forms, including books8-10, online manuals (http://cran.r-project.org/), and mailing lists aggregated in the R Mailing Lists Archive (http://tolstoy.newcastle.edu.au/R/). Help can also be found within R itself; other than the commands presented here, str() is often helpful for viewing the structure of variables and datasets created along the way, and the ? operator (e.g., ?data.frame() ) provides a help page for the corresponding function.

Finally, while we present methods for extracting information from microarray experiments using a particular set of commands, these are intended as a set of guidelines or starting points rather than a cookbook, and a vast array of alternatives are available at every step.


Quality control

The overall quality of a microarray experiment can be examined from many angles. For replication timing analysis on CGH arrays we focus on six qualities important for reliable results:

  1. High quality of starting material
  2. Comparable signal intensity distributions for red and green channels
  3. Unbiased signal ratios with respect to signal intensity
  4. A high overall signal-to-noise ratio of the experiment
  5. Lack of artifacts in raw and false-color microarray images
  6. High correlations between replicate experiments

Here, we highlight steps 2-6 in verifying the array data in this section, with starting material verified as outlined in Quality Control of S-phase DNA. Many of the diagnostics for array quality control are most naturally checked in the course of normalizing a dataset, and are presented here in that context. Diagnostic plots and commands that examine the properties outlined above are covered in quality control sections 1-6 below.


Normalization

Normalization the process of transforming microarray datasets to remove systematic bias from true biological signal is a complex but vital part of any microarray study. Normalization methods are tailored to the platform and design of each experiment, but even within a platform a wealth of alternatives are available. Here, we describe a flexible method for normalizing two-color NimbleGen tiling CGH arrays for replication timing analysis. Our philosophy is to minimize the number of transformations applied to the data and apply only minimally invasive global methods for removing bias and scaling datasets to allow comparisons between them. We use the R package LIMMA (linear models for microarray data), also available with a user interface through the limmaGUI package, for normalization and scaling11,12. The steps for this process are straightforward, and are outlined below. To illustrate the method, we use two biological replicate datasets of mouse L1210 lymphoblast cells, available at [address].


1.) Create RGL files from the original NimbleGen .pair files. These "Red-Green lists" contain columns for both red (Cy5) and green (Cy3) channel signal intensities. When reading large tables in R, such as .pair files, explicitly noting the number of rows and data type of each column as illustrated in lines X-Y will save a substantial amount of memory and calculation time. Occasionally, line 2 below will set the genomic position columns of large datasets as an integer type, which lacks the memory space to store large numbers. If so, set the type manually with > classes[X] = "numeric" (where X is the column number containing position information) after line 2. The "nrows" in lines 3 and 4 can be a modest overestimate; the correct number of rows will be present in the final table, but an estimate allows the system to allocate the correct amount of memory.

1| tab5rows <- read.delim("318990_4L1210LymphoblastP1_532.pair", header = TRUE, nrows = 5, skip=1)
2| classes <- sapply(tab5rows, class)
3| mLymph1Cy3 <- read.delim("318990_4L1210LymphoblastP1_532.pair", header=T, nrows=390000, comment.char = "", colClasses=classes, skip=1)
4| mLymph1Cy5 <- read.delim("318990_4L1210LymphoblastP1_635.pair", header=T, nrows=390000, comment.char = "", colClasses=classes, skip=1)
5| mLymph1 <- data.frame(S_Cy5=mLymph1Cy5[,10],S_Cy3=mLymph1Cy3[,10])
6| mLymph2 <- data.frame(S_Cy5=mLymph2Cy5[,10],S_Cy3=mLymph2Cy3[,10])
7| write.table(mLymph1, file="318990_4L1210LymphoblastP1.rgl.txt", row.names=F, quote=F, sep="\t", eol="\r\n")
8| write.table(mLymph2, file="319048_4L1210LymphoblastP1-2.rgl.txt", row.names=F, quote=F, sep="\t", eol="\r\n")


2.) Create a "targets" text file that describes the target files for normalization.

We will name this file "T.txt". Note that, to be read correctly, the file should be tab-delimited and contain only one carriage return at the end of the final line. Place this file in the same directory as the raw .pair files and .rgl files generated above.

SlideNumberNameFileNameCy3Cy5
1mLymph1318990_4L1210LymphoblastP1.rgl.txtlateearly
2mLymph2319048_4L1210LymphoblastP1-2.rgl.txtearlylate

3.) Install the LIMMA package. Install a current version of the LIMMA package according to the instructions at http://bioinf.wehi.edu.au/limma/ , or using the command line interface:

9| source("http://www.bioconductor.org/biocLite.R")
10| biocLite("limma")
11| biocLite("statmod")


4.) Perform loess and scale normalization using LIMMA. This process is more straightforward than many two-color normalization methods, since NimbleGen arrays do not have print tip groups, spot background areas, or mismatch spots that must be accounted for. Loess normalization (normalizeWithinArrays) corrects the internal dependence of red-green ratios on their intensity independently for each array, and is examined further in sections QC#1 and QC#3. Scale normalization (normalizeBetweenArrays) equalizes the distribution of timing values between multiple samples for comparisons, and is verified in section QC#2. As with ChIP-chip methods13,14, this type of scale normalization may not be appropriate for examining subsets of the genome where large unbalanced timing changes are expected (e.g., timing of the X chromosome before and after inactivation), but is ideal for whole-genome analyses.

12| library(limma)
13| setwd("D:\\RT project\\Raw datasets")
14| t = readTargets("T.txt", row.names="Name")
15| r = read.maimages(t, source="generic",columns=list(R="S_Cy5", G="S_Cy3"))
16| MA.l = normalizeWithinArrays(r, method="loess")
17| MA.q = normalizeBetweenArrays(MA.l, method="scale")

This process generates three 'MAList'-type data objects: one that stores the raw array data (r), one that stores the samples after within-array normalization (MA.l), and one that stores the same information after scale normalization between arrays (MA.q). We examine these further in the quality control sections below.


QC # 1 Distribution of spot intensities for red and green channels

The distributions of red and green spot intensities should be fairly well aligned , and have tails with high (above 211, or 2048) signal. Experiments in which signal intensity drops off more sharply will often show higher levels of noise in the final dataset.

18| plotDensities(r)
19| plotDensities(MA.l)
20| plotDensities(MA.q)

Raw After loess normalization After scale normalization


QC #2 Distribution of ratio values before and after normalization

Boxplots of the Cy5/Cy3 ratios for each experiment may be slightly different before normalization (and after within-array normalization), but 1st and 3rd quartiles (the box boundaries) of all experiments should be equal after between-array normalization.

21| boxplot(MA.l$M~col(MA.l$M), names=colnames(MA.l$M))
22| boxplot(MA.q$M~col(MA.q$M), names=colnames(MA.q$M))

After within-array normalization After between-array normalization


QC #3 Ratio vs. intensity plots

Create MA plots to examine dependence of ratios on signal intensities. Points will often be skewed to low Cy5/Cy3 ratios at low intensities due to photobleaching of Cy5, but should be corrected after within-array loess normalization in Limma. This bias is the most common artifact for NimbleGen arrays, but refer to [ref] for discussion of other common types of bias diagnosed with MA plots.

23| plotMA(r, array=1) # Raw data
24| plotMA(MA.l, array=1) # After within-array normalization

Raw data After within-array normalization


5.) Create an intermediate file containing the normalized datasets. At this stage we can write a file containing the results of normalization and remove the other data from memory. (Alternatively, "rm(list=ls())"at line X will clear all variables.) This tab-delimited text file will be further processed to sort and average the normalized datasets and check other quality control measures.

25| write.table(MA.q$M, file="Loess_mLymph_112909.txt", quote=F, row.names=F, sep="\t")
26| rm(r, MA.l, MA.q, mLymphCy3, mLymphCy5, mLymph1, mLymph2); gc(reset=T)


6.) Assign position and chromosome information to the normalized datasets. This can be accomplished using the original .pair files, which typically contain this information in columns "POSITION" and "SEQ_ID" respectively.

27| tab5rows <- read.table("Loess_mLymph_112909.txt", header = TRUE, nrows = 5)
28| classes <- sapply(tab5rows, class)
29| RT = read.table("Loess_mLymph_112909.txt", header=T, nrows=390000, comment.char = "", colClasses=classes)
30| tab5rows <- read.delim("318990_4L1210LymphoblastP1_635.pair", header = T, nrows = 5, skip=1)
31| classes <- sapply(tab5rows, class)
32| a = read.delim("318990_4L1210LymphoblastP1_635.pair", header=T, nrows=390000, comment.char = "", colClasses=classes, skip=1)
33| RT$CHR=a$SEQ_ID
34| RT$POSITION =a$POSITION

Some data formats, such as HD2 triplex arrays, contain a different format of SEQ_ID column with chromosome and chromosome endpoints combined (e.g., "chr11:1-134452384"). In these cases use the following lines to parse chromosome labels from the PROBE_ID column:

35| b = a$PROBE_ID
36| b = as.character(b)
37| x = strsplit(b, "FS")
38| y = unlist(x) # chr [1:1439379] "CHR10" "057882816" "CHR10"
39| y1 = y[c(TRUE, FALSE)] # chr [1:719690] "CHR10" "CHR10" "CHR10"
40| y2 = y[c(FALSE,TRUE)] # chr [1:719689] "057882816" "104954995"
41| y2 = as.numeric(y2) # num [1:719690] 5.79e+07 1.05e+08 7.31e+07
42| RTa= data.frame(CHR=y1,POSITION=y2, RT, stringsAsFactors=F)


7.) Plot timing values across a chromosome. This serves to verify the orientation for early/late domains, as well as the overall technical quality of the experiments.

43| RTb = subset(RT, RT$CHR == "chr1")
44| par(mar=c(3.1,4.1,1,1),mfrow=c(2,1))
45| plot(RTb[,4]~RTb$POSITION,pch=19,cex=0.2,col="grey",ylim=c(-3,3))
46| plot(RTb[,5]~RTb$POSITION,pch=19,cex=0.2,col="grey",ylim=c(-3,3))


Fig.X

Fig.X Replication timing values across chromosome 1.


Notice in Figure X that, due to the photobleaching of Cy5 diagnosed in QC#1, timing is skewed toward early values in replicate 1 and late values in replicate 2, demonstrating the practical advantages of dye-swap replicates. In lines X, also note that 1 and 2 are the column indices of the first and second replicate; additional experiments can be plotted together using higher row numbers for mfrow in line X. Using known regions of early or late replication, we can now verify that the timing values are properly oriented. If not, we reverse them by multiplying the appropriate data columns by -1.

47| RT[,c(1,2)] = RT[,c(1,2)] * -1 # Reverses early/late orientation

For dye-swap replicates, LIMMA also has more advanced functions for identifying probes affected by dye effects in topTable(), though differentially affected probes are generally distributed equally in early and late domains and contribute to noise rather than systematic error.


8.) Average replicate datasets. This is quite straightforward:

48| RT$mLymphAve = (RT[,1] + RT[,2]) / 2


QC #4 Autocorrelation function

Since nearby loci should replicate almost synchronously, the correlation of timing between neighboring probes, or autocorrelation, is a useful measure of overall data quality. The autocorrelation function (acf) describes the correlation between neighboring data points as a function of their genomic distance. High-quality datasets will have a correlation between nearest neighbor timing values of R = 0.60-0.80. This measure of signal-to-noise ratio will improve as more replicates with equivalent states are averaged. Before checking the autocorrelation, it is important to ensure that the data is sorted by chromosome and genomic position. This can be accomplished with

49| RT = RT[order(RT$CHR, RT$POSITION),]

Note that the order of mouse chromosomes by the default sorting method will be 1, 10-19, 2-9, X, then Y. This order itself is unimportant, but should be consistent across experiments to prevent errors in downstream analysis. Next, plot the autocorrelation and output the second (nearest neighbor) autocorrelation to the console:

50| acf(RT[,1],lag=1000)$acf[2] # Replicate 1: R = 0.742
51| acf(RT[,2],lag=1000)$acf[2] # Replicate 2: R = 0.665
52| acf(RT$mLymphAve, lag=1000)$acf[2] # Averaged 1 and 2: R = 0.762



QC #5 False-color array image to check for spatial artifacts

Most probes on tiling microarray designs are not (collinear), but instead randomly distributed with respect to genomic location. Therefore, spatial artifacts in the scanned images should not much affect timing values in any particular location in the genome, but may reduce the overall signal to noise ratio of the experiment if they cover a substantial portion of the array. To check for spatial artifacts, we examine the original .tiff images, as shown below.



Downstream analysis
Static properties of the timing program in a given cell type

1.) Loess smoothing

Loess is a locally weighted polynomial smoothing method that proves very useful for deriving an overall replication profile from noisier raw data. Rather than fitting a global function using all available data, loess models the fit in individual segments, similar to a moving window with a higher weight for data points near the center. The smoothness of the fitted line can be set by the bandwidth parameter a, which represents the fraction of points used for each local fit (line X). While we typically set this parameter to the equivalent of 300 kilobases for studies of human and mouse replication timing, the optimal smoothing span is expected to be different for different systems or replication fork speeds, and should be determined empirically, typically with the smallest span that still reproduces most of the features between replicate profiles.

53|chrs = levels(RT$CHR); str(chrs) # Create a list of all chromosomes
54|chrs = chrs[-22] # Remove chrY_random
55|AllLoess = NULL
56|for (chr in chrs) { # For each chromosome,
57| RTl = NULL # Create a variable to store loess-smoothed values
58| RTb = subset(RT, RT$CHR == chr) # Subset the dataset to a single chromosome
59| lspan = 300000/(max(RTb$POSITION)-min(RTb$POSITION)) # Set smoothing span
60| cat("Current chromosome: ", chr, "\n") # Output current chromosome/dataset
61| RTla = loess(RTb$X318990_4L1210LymphoblastP1.rgl ~ RTb$POSITION, span = lspan)
62| RTlb = loess(RTb$X319048_4L1210LymphoblastP1.2.rgl ~ RTb$POSITION, span = lspan)
63| RTlc = loess(RTb$mLymphAve ~ RTb$POSITION, span = lspan)
64| RTl = data.frame(CHR=RTb$CHR, POSITION=RTb$POSITION, RTla$fitted, RTlb$fitted, RTlc$fitted)
65| AllLoess = rbind(AllLoess, RTl)
66| }
67| x = as.data.frame(AllLoess)
68| names(x)[4:5] = c("x300smo_4L1210LymphoblastR1", "x300smo_4L1210LymphoblastR2")

We can now check the results of loess smoothing using the following:

69| RTc = subset(RT, CHR == "chr1")
70| LSc= subset(LS, CHR == "chr1")
71| par(mar=c(2.2,5.1,1,1),mfrow=c(3,1), col="grey", pch=19, cex=0.5, cex.lab=1.8, xaxs="i")
72| plot(RTc$X318990_4L1210LymphoblastP1.rgl~RTc$POSITION, ylab="mLymph R1", xaxt="n")
73| lines(LSc$x300smo_4L1210LymphoblastR1~LSc$POSITION, col="blue3", lwd=3)
74| plot(RTc$X319048_4L1210LymphoblastP1.2.rgl~RTc$POSITION, ylab="mLymph R2", xaxt="n")
75| lines(LSc$x300smo_4L1210LymphoblastR2~LSc$POSITION, col="blue3", lwd=3)
76| plot(RTc$mLymphAve~RTc$POSITION, xlab="Coordinate (bp)", ylab="mLymph ave")
77| lines(LSc$x300smo_4L1210LymphoblastAve~LSc$POSITION, col="blue3", lwd=3)


Fig.X2

Fig.X2 Raw (gray) and loess-smoothed (blue) values along chromosome 1.


QC #6 Correlations between replicate experiments

Once loess smoothing is completed and the technical quality of the array data is established, we can compare biological replicate experiments to establish the relative level of biological similarity between samples. To isolate biological rather than array quality differences, we typically use loess-smoothed averaged-replicate data, rather than individual, raw or normalized data, for these correlations:

78| cor(x[,c(3:5)]

  Rep1 Rep2 Ave
Lymphoblast Rep1 1.000 0.978 0.995
Lymphoblast Rep2 0.978 1.000 0.994
Lymphoblast Ave 0.995 0.994 1.000

The cor() function defaults to Pearson correlation but other methods are available (see ?cor in R). If missing values are present, add "na.rm=T" to remove them.



2.) Segmentation

Segmentation is a useful way to define the boundaries of replication domains and determine their average timing. Biologically, these segments appear to correspond to domains of coordinately regulated, synchronously firing origins that may be part of replication factories. We perform segmentation using the DNACopy algorithm designed by Venkatraman et al.33, which has been shown to perform favorably compared to available alternatives for CGH copy number analysis34, 35, 36.

79| library(DNAcopy)
80| mLymph = CNA(RT$mLymphAve, RT$CHR, RT$POSITION, data.type="logratio", sampleid = "mLymph")
81| Seg.mLymph = segment(mLymph, nperm=10000, alpha=1e-15, undo.splits="sdundo", undo.SD=1.5, verbose=2)
82| write.table(Seg.mLymph$output, row.names=F, quote=F, file="Mouse lymphoblast segments.txt", sep="\t")

The result of this process is a 'CNA' object called Seg.mLymph, which contains the raw data and segmentation breakpoints assigned by circular binary segmentation37. The number of segments assigned can be examined using str(Seg.mLymph$output), and visualized using various functions built into DNAcopy.

83| par(ask=T,mar=c(3.1,4.1,1,1)) # Set figure margins
84| plot(Seg.mLymph, plot.type="c") # Plot each chromosome separately
85| plot(Seg.mLymph, plot.type="s") # Plot overview of all chromosomes
86| plot(subset(Seg.mLymph,chromlist="chr2"), pch=19, pt.cols=c("gray","gray"), xmaploc=T, ylim=c(-3,3), main="")


Fig.X3

Fig.X3 Raw (gray) and segmented (red) replication timing data along chromosome 2.


As before, a tab-delimited text file containing segment, and average replication timing values can be created:

87| write.table(Seg.mLymph$output, row.names=F, quote=F, sep=\t)

Though the goal of separating the genome into regions of synchronous replication is valuable in theory and very useful in practice, applying segmentation brings several challenges. The first is that not all domains of replication identified by the algorithm are bonafide synchronously replicating domains; in particular, regions where replication timing transitions from early to late can create artifactual "transition segments" that arise from the slope between early and late domains. If necessary, these can be filtered out by virtue of the fact that such segments are much smaller than typical domains and have replication timing values that are intermediate between their neighboring segments.

The second segmentation challenge is the sensitivity of most algorithms to both the properties of the data and internal parameters that can significantly affect the results. For DNACopy, the average number and size of segments are affected by the signal-to-noise ratio of the dataset: high-quality data will naturally have more significant transitions in timing and be assigned more segments. To control for this, two approaches are (possible):

1) Perform segmentation with parameters adjusted to offset quality difference between datasets (For DNAcopy, adjusting "undo.SD" is the most direct method)

2) Add a small amount of Gaussian noise to higher-quality datasets to equalize their autocorrelation before performing segmentation with equivalent parameters

In general, approach #1 is more subjective unless the effect of different levels of noise (autocorrelation levels) is thoroughly tested, making approach #2 preferable when only small quality differences separate the arrays (indeed, it is usually easier to compare segmentation results from arrays with lower, but more uniform, quality). For this purpose, we adapt the "jitter" function of Chambers et al.27, substituting Gaussian noise for uniformly distributed noise.

88| addNoise = function(x, factor = 1, amount = NULL) {
89| if (length(x) == 0)
90| return(x)
91| if (!is.numeric(x))
92| stop("'x' must be numeric")
93| z <- diff(r <- range(x[is.finite(x)]))
94| if (z == 0)
95| z <- abs(r[1])
96| if (z == 0)
97| z <- 1
98| if (is.null(amount)) {
99| d <- diff(xx <- unique(sort.int(round(x, 3 -
100| floor(log10(z))))))
101| d <- if (length(d))
102| min(d)
103| else if (xx != 0)
104| xx/10
105| else z/10
106| amount <- factor/5 * abs(d)
107| }
108| else if (amount == 0)
109| amount <- factor * (z/50)
110| x + stats::rnorm(length(x), 0, amount)
111| }
112| RT$mLymphR1noise = addNoise(as.numeric(RT$mLymphR1), factor=300, amount=.848)
113| RT$mLymphR2noise = addNoise(as.numeric(RT$mLymphR2), factor=300, amount=.968)
114|acf(RT$mLymphR1noise)$acf[2] #0.8274651
115|acf(RT$mLymphR2noise)$acf[2] #0.8274654

3.) Domain size analysis (boxplots of sizes)

After segmentation, the sizes of replication domains can be extracted from the segmented dataset to examine average sizes examined together or separately for domains with early or late timing.

116| LymphR1 = Seg.mLymphR1$output; LymphR2 = Seg.mLymphR2$output
117| LymphR1$size = LymphR1$loc.end - LymphR1$loc.start
118| LymphR2$size = LymphR2$loc.end - LymphR2 $loc.start
119| LymphR1Early = subset(LymphR1, LymphR1$seg.mean > 0)
120| LymphR2Late = subset(LymphR2, LymphR2$seg.mean < 0)
121| boxplot(LymphR1Early$size, LymphR2Early$size, LymphR1Late$size, LymphR2Late$size)

Dynamic changes in the timing program

To examine changes in the replication program during differentiation, several useful methods leverage the segmentation and loess smoothing methods introduced above. Alternatives, such as support vector machines and hidden Markov models, are reviewed by Diaz-Uriarte et al.38. Since no single method is sufficient to fully describe the type, degree, and distribution of timing changes during development, we cover several complementary ways to measure these properties and explore the relationships between cell types. These include:

1) The amount of the genome changing replication timing (percent changes analysis)

2) The degree and relationships of RT changes between cell types (clustering approaches)

3) The properties of domains that change timing upon differentiation (switching domain analysis)

1.) Percent changes analysis: The amount of the genome with differential timing between two or more cell types can be analyzed using an arbitrary, percentile, or significance-based cutoff for RT changes. Since the significance approach is affected by quality differences between datasets, and percentile cutoffs necessarily include an arbitrary fraction of the genome, we recommend scaling datasets to equivalent ranges and applying an empirical cutoff for biologically measurable timing changes to quantify these genome-wide. As most methods for quantifying timing changes are sensitive to scale differences, datasets should be scaled and normalized together in LIMMA as shown for mouse lymphoblast replicates 1 and 2.

122| RTd1 = RT$mLymphR1 - RT$mLymphR2
123| mLength = length(RTd1) # Total number of probes
124| s = 0.67 # Cutoff for significant changes
125| sum(abs(RTd1)>s)/mLength # Percentage changing, R1 vs. R2
126| sum(RTd1 < -s)/mLength # EtoL subset: 1.6%
127| sum(RTd1 > s)/mLength # LtoE subset: 1.3%

2.) Clustering approaches. Cluster analysis is a traditional staple for microarrays, particularly for analysis of differentially expressed genes. For replication profiles, our two most common methods are hierarchical clustering, which agglomerates experiments bottom-up from the most similar pairs of replication profiles to gradually more inclusive clusters, and k-means clustering, which partitions a set of timing profiles into a pre-set k number of similar patterns. For k-means clustering, we have used the programs Cluster39 and TreeView (http://rana.lbl.gov/EisenSoftware.htm) and refer readers to the corresponding guides. For hierarchical clustering, to compute clusters based on the stability of connections between cell types, we use the "pvclust" package in R40, which performs hierarchical clustering with p-values ascribed to each cluster. Since many clustering algorithms are computationally intensive, and to improve the precision of individual RT measurements, we typically average timing datasets into windows of approximately 200kb prior to clustering:

128| mLymph.R1 = NULL; mLymph.R2 = NULL
129| for (x in 1:10994) {
130| z1 = x * 35 # For 385k arrays,
131| z2 = (x+1) * 35 # 5.8kb median spacing * 35 = 203kb
132| mLymph.R1[x] = mean(RT$mLymphR1[z1:z2])
133| mLymph.R2[x]= mean(RT$mLymphR2[z1:z2])
134| cat("Current window: ", x, "\n")
135| }
136| RT = data.frame(mLymph.R1, mLymph.R2)
137| library(pvclust)
138| cluster.bootstrap <- pvclust(RT, nboot=1000, method.dist="abscor")
139| plot(cluster.bootstrap)
140| pvrect(cluster.bootstrap)

There are several reasons to take care when interpreting the results of hierarchical clustering: 1) a wide variety of topologies are possible for a single dendrogram, since any node can be flipped without changing the connections between clusters, 2) agglomerative clusters can change substantially when new experiments are added, and 3) the exact connections produced (though usually not the overall structure of the dendrogram) often change for different clustering algorithms or distance metrics. By default, pvclust measures distances between cell types using the common Euclidean distance metric. A wide variety of other options and other clustering approaches are also available through R; see http://cran.r-project.org/web/views/Cluster.html for an overview.

3.) Properties of RT switching domains. It is often useful to define the boundaries of domains that switch to earlier or later replication timing (switching domains) and analyze the properties of genetic and epigenetic elements within them. To compute these domains, we first subtract the normalized (not loess smoothed) values of the two experiments to be compared, then segment the resulting profile of timing changes as shown below:

141| dRT = CNA(RT$NPCave-RT$ESCave, RT$CHR, RT$POSITION, data.type="logratio", sampleid= "NPC-ESC dRT")
142| Seg.dRT = segment(dRT, nperm=10000, alpha=1e-15, undo.splits = "sdundo", undo.SD=1.5, verbose=2)
143| dRTdom = Seg.dRT$output
144| quantile(dRTdom$seg.mean, probs = c(0.05, 0.95)) # Top 5% >
145| quantile(dRTdom$seg.mean, probs = c(0.40, 0.60)) # Middle 20%
146| LtoEdom = subset(dRTdom, dRTdom$seg.mean > 1.28552)
147| EtoLdom = subset(dRTdom, dRTdom$seg.mean < -1.32328)
148| middleDom = subset(dRTdom, dRTdom$seg.mean > -0.14808 & dRTdom$seg.mean < 0.23698)

Comparison and alignment to outside datasets

While much useful information can be derived from the properties of replication domains themselves, the vast amount of publicly accessible data made available through initiatives such as ENCODE41, 42, 43 and public repositories like GEO44, 45 allow the timing program to be compared to a wide array of genome-wide or gene-centric data. However, the differences in formats between ChIP-chip, ChIP-seq, and other approaches requires some care in processing, and almost any dataset has specific quirks Here, we cover methods that are generally useful for comparing replication timing to other epigenetic datasets.

1.) Assignment of replication timing values to gene promoters

Although the main purpose of loess is to derive an overall smoothed replication timing profile, the smoothed data object produced is valuable for other tasks as well, since it can be interrogated at any set of genomic coordinates. This is especially helpful for comparing datasets from different array platforms, since it allows us to calculate timing (or other) values at a shared set of coordinates. Using this approach, we can assign timing data to NCBI's RefSeq gene promoter locations (http://www.ncbi.nlm.nih.gov/RefSeq/) as outlined below:

149| chrs = levels(RefSeq$CHR) #[DEFINE RefSeq]
150| AllSm = NULL # Store smoothed data for all chromosomes
151| ChrSm = NULL # Store smoothed data for current chromosome
152| for(chr in chrs) {
153| RTc = subset(RT, CHR == chr)
154| RSc = subset(RefSeq, CHR == chr)
155| cat("Current chromosome: ", chr, "\n")
156| lspan = 300000/(max(RTc$POSITION)-min(RTc$POSITION))
157|
158| smLym1 = loess(RT$mLymphR1 ~ RT$POSITION, span = lspan)
159| smLym2 = loess(RT$mLymphR2 ~ RT$POSITION, span = lspan)
160| smLym3 = loess(RT$mLymphAve ~ RT$POSITION, span = lspan)
161|
162| Lym1 = predict(smLym1, RSc$TSS)
163| Lym2 = predict(smLym2, RSc$TSS)
164| Lym3 = predict(smLym3, RSc$TSS)
165|
166| ChrSm = data.frame(CHR=chr,POSITION= RSc$TSS, Lym1, Lym2, Lym3)
167| AllSm = rbind(AllSm, ChrSm)
168| }
169| write.table(AllSm, Mouse lymphoblast RT at RefSeq gene positions.txt, quote=F, sep=\t)

The TSS column in the RefSeq gene table represents transcription start sites. These are equivalent to the txStart position in the original RefSeq table for genes with forward (+) orientation, and txEnd for genes in reverse (-) orientation.

2.) Assignment of histone and other epigenetic marks to gene promoters

Through the ENCODE project and other initiatives, a wide array of epigenetic datasets have been made available at GEO, the UCSC Genome Browser, and similar data repositories. Just as timing can be assigned to gene promoters as outlined above, these datasets can be used to assign values at any set of genomic loci, including gene promoters or the boundaries of replication domains. However, in all cases, special care must be paid to ensure that datasets are compared in compatible genomic builds and equivalent cell types, and that the method of quantification is consistent with the methods and goals of the studies involved.

Below is a generic method for assigning values from epigenetic mark datasets to promoter regions surrounding gene transcription start sites. Since histone marks are regulated in much smaller units than replication timing, we typically assign individual points or averages to promoters rather than a loess-smoothed value (however, the latter may be feasible with sufficiently high resolution and small span). Two files are required: One with columns describing the genomic coordinate, orientation, and chromosome of each gene (RefSeq), and another with the coordinate, chromosome, and data value for each mark (Marks). For our example, we match several modifications from a generic genomewide ChIP-seq experiment to windows +500 to -2500 bases from RefSeq gene promoters.

170| chrs = levels(Marks$CHR); AllGenes = NULL; AllHist = NULL
171| for (chr in chrs) {
172| RSc = subset(RefSeq, CHR == chr)
173| RTc = subset(Marks, CHR == chr)
174| for(m in 1:nrow(RSc)) {
175| if(RSc[m,]$Strand == "+") {
176| RTcSub = subset(RTc, (RTc$Start < RSc[m,]$txStart +500) & (RTc$Start > RSc[m,]$txStart - 2500))
177| AllHist = rbind(AllHist, apply(RTcSub, 2, max)[3:12])
178| AllGenes = rbind(AllGenes, RSc[m,]$Gene)
179| }
180| if(RSc[m,]$Strand == "-") {
181| RTcSub = subset(RTc, (RTc$Start < RSc[m,]$txEnd +2500) & (RTc$Start > RSc[m,]$txEnd - 500))
182| AllHist = rbind(AllHist, apply(RTcSub, 2, max)[3:12])
183| AllGenes = rbind(AllGenes, RSc[m,]$Gene)
184|
185| }
186| cat("Chromosome:", chr, " Gene:", m, "/", nrow(RSc), "\n")
187| }
188| }
189| OutFile = data.frame(cbind(AllGenes, AllHist), stringsAsFactors=F)
190| write.table(OutFile, file="Histone modifications at RefSeq gene positions.txt", row.names=F, quote=F, sep="\t")

Note that line X assigns the highest value within the promoter window to the gene. Other approaches to assign histone modification values to genes include averaging the number of reads within the body of genes46, individually analyzing equally spaced bins across open reading frames47, and assessing promoters with significant binding above background48. Bear in mind that the transcription start site may not be the best target for all modifications; indeed, for H3K36me3 marking transcription elongation, values at the transcription endpoint or exon 5' ends may better represent overall enrichment49. For some biological questions, additional information can be gleaned by analyzing reads separately for forward and reverse strands. In general, the methods of quantification should be informed by the biological properties of the marks to be quantified and the questions to be addressed.

3.) Integration of epigenetic mark values over replication domains (density vs. timing)

Given that the magnitude of correlations between genetic properties generally increases when measured in larger windows [ref], it is important to quantify these relationships in windows consistent with biologically regulated unit sizes. The method below can be used to compute correlations between domainwide replication timing values and the average level of epigenetic marks within timing domains segmented above.

191| Seg.RT = read.table("Lymph-1 segments.txt",header=T)
192| chrs = levels(Seg.RT$chrom)
193| MarksData = NULL # Store averaged mark data in domains
194| RTData = NULL
195| dom = 0
196| for(chr in chrs) { 197| Seg.RTb = subset(Seg.RT, Seg.RT$chrom == chr)
198| MarksB = subset(Marks, Marks$CHR == chr)
199| for (i in 1:dim(Seg.RTb)[1]) {
200| # Find subset of marks in RT domain
201| cat("Current chr:", chr, " Domain:", i, "\n")
202| MarksD = subset(MarksB, MarksB$Start > Seg.RTb[i,]$loc.start & MarksB$Start < Seg.RTb[i,]$loc.end)
203| MarksD = MarksD[,3:12]
204| MarksD[,1:10] = MarksD[,1:10] - MarksD[,1]
205| MarksData = rbind(MarksData, apply(MarksD,2, "mean"))
206| dom = dom + 1
207| }
208| }
209| cor(Seg.RT$seg.mean, data.frame(MarksData))
210| plot(Seg.RT$seg.mean, data.frame(MarksData)[1])




  1. Hiratani, I. et al. Global reorganization of replication domains during embryonic stem cell differentiation. PLoS biology 6, e245(2008).
  2. Hiratani, I. et al. Genome-wide dynamics of replication timing revealed by in vitro models of mouse embryogenesis. Genome research 20, 155-69(2010).
  3. Ryba, T. et al. Evolutionarily conserved replication timing profiles predict long-range chromatin interactions and distinguish closely related cell types. Genome research 20, 761-70(2010).
  4. Pope, B.D., Hiratani, I. & Gilbert, D.M. Domain-wide regulation of DNA replication timing during mammalian development. Chromosome research : an international journal on the molecular, supramolecular and evolutionary aspects of chromosome biology 18, 127-36(2010).
  5. Yaffe, E. et al. Comparative analysis of DNA replication timing reveals conserved large-scale chromosomal architecture. PLoS genetics 6, e1001011(2010).
  6. Farkash-Amar, S. et al. Global organization of replication time zones of the mouse genome. Genome research 18, 1562(2008).
  7. Karnani, N. et al. Pan-S replication patterns and chromosomal domains defined by genome-tiling arrays of ENCODE genomic areas. Genome research 17, 865-76(2007).
  8. Schbeler, D. et al. Genome-wide DNA replication profile for Drosophila melanogaster: a link between transcription and replication timing. Nature genetics 32, 438-42(2002).
  9. Schwaiger, M. et al. Chromatin state marks cell-type- and gender-specific replication of the Drosophila genome. Genes & development 23, 589-601(2009).
  10. Lee, T. et al. Arabidopsis thaliana chromosome 4 replicates in two phases that correlate with chromatin state. PLoS genetics 6, e1000982(2010).
  11. Koren, A., Soifer, I. & Barkai, N. MRC1-dependent scaling of the budding yeast DNA replication timing program. Genome research 20, 781-90(2010).
  12. Gilbert, D.M. Temporal order of replication of Xenopus laevis 5S ribosomal RNA genes in somatic cells. Proceedings of the National Academy of Sciences of the United States of America 83, 2924-8(1986).
  13. Gilbert, D.M. & Cohen, S.N. Bovine papilloma virus plasmids replicate randomly in mouse fibroblasts throughout S phase of the cell cycle. Cell 50, 59-68(1987).
  14. Aladjem, M.I. et al. Replication initiation patterns in the beta-globin loci of totipotent and differentiated murine cells: evidence for multiple initiation regions. Molecular and cellular biology 22, 442-52(2002).
  15. Acevedo, L.G. et al. Genome-scale ChIP-chip analysis using 10,000 human cells. BioTechniques 43, 791-7(2007).
  16. O'Geen, H. et al. Comparison of sample preparation methods for ChIP-chip assays. BioTechniques 41, 577-80(2006).
  17. Hansen, R.S. et al. Sequencing newly replicated DNA reveals widespread plasticity in human replication timing. Proceedings of the National Academy of Sciences of the United States of America 107, 139-44(2010).
  18. Pombo, A. & Gilbert, D.M. Nucleus and gene expression: the structure and function conundrum. Current opinion in cell biology (2010).doi:10.1016/j.ceb.2010.04.010
  19. Royce, T.E. et al. Extrapolating traditional DNA microarray statistics to tiling and protein microarray technologies. Methods in enzymology 411, 282-311(2006).
  20. Bolstad, B.M. et al. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics (Oxford, England) 19, 185-93(2003).
  21. Yang, Y.H. et al. Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic acids research 30, e15(2002).
  22. Pollack, J.R. et al. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nature genetics 23, 41-6(1999).
  23. Gentleman, R.C. et al. Bioconductor: open software development for computational biology and bioinformatics. Genome biology 5, R80(2004).
  24. Team, R.D. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna Austria ISBN 3, 2673(R Foundation for Statistical Computing: 2008).
  25. Crawley, M.J. The R Book. 950(Wiley: 2007).
  26. Chambers, J.M. Software for Data Analysis: Programming with R. 498(Springer: 2008).
  27. Spector, P. Data Manipulation with R. 154(Springer Publishing Company, Incorporated: 2008).
  28. Wettenhall, J.M. & Smyth, G.K. limmaGUI: a graphical user interface for linear modeling of microarray data. Bioinformatics (Oxford, England) 20, 3705-6(2004).
  29. Smyth, G.K. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical applications in genetics and molecular biology 3, 3(2004).
  30. Peng, S. et al. Normalization and experimental design for ChIP-chip data. BMC bioinformatics 8, 219(2007).
  31. Gottardo, R. Modeling and analysis of ChIP-chip experiments. Methods in molecular biology (Clifton, N.J.) 567, 133-43(2009).
  32. Venkatraman, E.S. & Olshen, A.B. A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics (Oxford, England) 23, 657-63(2007).
  33. Dellinger, A.E. et al. Comparative analyses of seven algorithms for copy number variant identification from single nucleotide polymorphism arrays. Nucleic acids research (2010).doi:10.1093/nar/gkq040
  34. Willenbrock, H. & Fridlyand, J. A comparison study: applying segmentation to array CGH data for downstream analyses. Bioinformatics (Oxford, England) 21, 4084-91(2005).
  35. Lai, W.R. et al. Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics (Oxford, England) 21, 3763-70(2005).
  36. Olshen, A.B. et al. Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics (Oxford, England) 5, 557-72(2004).
  37. Daz-Uriarte, R. & Rueda, O.M. ADaCGH: A parallelized web-based application and R package for the analysis of aCGH data. PloS one 2, e737(2007).
  38. Eisen, M.B. et al. Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the United States of America 95, 14863-8(1998).
  39. Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics (Oxford, England) 22, 1540-2(2006).
  40. The ENCODE (ENCyclopedia Of DNA Elements) Project. Science (New York, N.Y.) 306, 636-40(2004).
  41. Birney, E. et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799-816(2007).
  42. Rosenbloom, K.R. et al. ENCODE whole-genome data in the UCSC Genome Browser. Nucleic acids research 38, D620-5(2010).
  43. Edgar, R., Domrachev, M. & Lash, A.E. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic acids research 30, 207-10(2002).
  44. Barrett, T. et al. NCBI GEO: archive for high-throughput functional genomic data. Nucleic acids research 37, D885-90(2009).
  45. Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823-37(2007).
  46. Salcedo-Amaya, A.M. et al. Dynamic histone H3 epigenome marking during the intraerythrocytic cycle of Plasmodium falciparum. Proceedings of the National Academy of Sciences of the United States of America 106, 9655-60(2009).
  47. Guenther, M.G. et al. A chromatin landmark and transcription initiation at most promoters in human cells. Cell 130, 77-88(2007).
  48. Hon, G., Wang, W. & Ren, B. Discovery and annotation of functional chromatin signatures in the human genome. PLoS computational biology 5, e1000566(2009).

News & Events

x

Datasets need to be updated.

Click on X (right-top) or Click Out from the popup to close the popup!