Introducing BxChIPSeq 2.0: Analyzing ChIP-Seq Results Has Never Been This Easy!
Finally you can unleash the full potential of your ChIP-Seq data in a quick and easy way. With the biologist-friendly web interface powered by BxChIPSeq 2.0, you can focus on the biology without worrying about hardware, software, algorithms. Best of all, all the powerful analysis results are simple to use and there is no steep learning curve.
What is BxChIPSeq 2.0?
You have spent lots of time to figure out the right conditions and antibody for chromatin immunoprecipitation, and you have paid precious grant money for the sequencing run. What's next? Your ChIP-Seq data contain rich biological information, but you are only looking at the tip of the iceberg if all you have is an Excel spreadsheet listing peaks with some annotations. You can do a lot more, including:
With BxChIPSeq 2.0, all you need to do is to send us your raw sequence data, and within a week you will have access to all the analysis mentioned above for your data in a secure website. You can log into your webpage anytime from anywhere, and you can share the results with your team members and collaborators by giving them access to your webpage.
And this convenient service is very affordable. The price for each ChIP-Seq analysis is only $199, about 10% of what you have already spent on running ChIP experiments and generating sequencing data. With this small investment, you can delve into deeper layers of your data, and easily get 5x or 10X more biological insights from your ChIP-Seq experiments.
With BxChIPSeq 2.0 service, you can access all the data analysis outputs from a secure webpage built from your raw ChIP-Seq data. You can display your data in UCSC genome browser, view DNA motifs, and identify enriched biological functions and pathways. Click each of the tabs below to learn more about the data outputs.
Genome Browsers
DNA Binding Motif
Gene Ontology
Genomic Features
Peak Report
Review your ChIP-Seq Data in Genome Browsers
It's very useful to visualize your ChIP-Seq data in a genome browser. You can confirm binding events by examining sequence tag coverage from IP and control data, and you can review peaks in the genomic context of genes, conservation, expression, and other biological information.
Compatitable with UCSC Browser
With BxChIPSeq 2.0 service, you can go to UCSC browser to view your data with a single click. UCSC browser will load the appropriate file directly from your data webpage, so you don't need to upload a large file as custom track to UCSC browser. You can also share the link with your collaborators for them to view the data directly.
Review ChIP-Seq data in UCSC Browser
In this example, we are showing the sequence coverage (normalized tag counts) from ChIP sample and the peak calls as two custom tracks. Here you can browse the data along with other useful tracks provided by UCSC, like genes, conservation with other species, repeats, SNPs, and many more.
You can add all your data to UCSC genome browser by clicking one link at a time from the webpage for your own data.
Compatitable with Integrative Genomics Viewer (IGV)
Another popular genome browser is the Integrative Genomics Viewer (IGV) from the Broad Institute. Many researchers like to use IGV because it is faster for large data sets, and researchers can view raw sequence reads with it.
We provide output files (TDF file for sequence coverage, bed file for peaks) that can be loaded into IGV.
Review ChIP-Seq data in IGV
In this example, we are showing the sequence coverage (normalized tag counts) from two ChIP samples and the input sample, as well as the peak calls in IGV.
Identify DNA Binding Motifs for Transcription Factors
One great use of ChIP-Seq data is to perform motif search to identify DNA sequences that might be responsible for the factor to occupy this region. These DNA sequences are commonly called motifs or cis-elements. Motif search will also shed light on other factors that may work together with the transcription factor you used in ChIP, because those binding site will also be enriched in the peaks due to co-regulation.
BxChIPSeq 2.0 service generates motif search results, and listed the results in a webpage where you can easily browse and search for more information.
Motif Report
In this case, we search for motifs from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line. The top motif actually matches to the known motif for tcf3, another factor in the same family. A few other motifs also rank high, suggesting possible co-regulation of those other factors (FOXA1, HNF1A, GATA1, etc) with TCF7L2.
Functional Implication of Genes Regulated by the Factor
For each ChIP-Seq experiment, researchers can identify many genes that are regulated by the factor (or histone modification) due to the fact that peaks occur within the promoters of these genes. A natural next step is to see if there are common themes for these genes like biological function, cellular location, or protein domains. This kind of information can shed light on the biological role of the factor in the tissue tested.
BxChIPSeq 2.0 will search for enrichment of multiple commonly used functional categories, including Gene Ontology, KEGG Pathways, Interpro, wikipathways, etc. The reports can be accessed from the webpage, or downloaded for viewing in Excel.
Gene Ontology Report
Enriched Biological Processes (from Gene Ontology) for target genes from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line.
Protein Domain Report
Enriched protein domains (from Interpro) for target genes from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line.
Where in the Genome Do the Peaks Occur More Often?
A typical ChIP-Seq experiment will report many peaks around the genome. Depending on the transcription factor, the peaks may occur mostly at promoters, or can be located at exon, intron, UTRs, CpG island, or intergenic regions.
BxChIPSeq 2.0 will perform search for enriched genomic features and create webpage and text report.
Genomic Feature Report
Enriched genomic features for peaks from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line. Here the peaks tend to occur more often at gene rich regions, and promoters, consistent with the TCF7L2's role as a transcription factor.
The List of All Peaks with Detailed Annotations
Despite all the graphic reports, many researchers still need a comprehensive list of the peaks from ChIP-Seq experiment to view in Excel. BxChIPSeq 2.0 creates a detailed peak report that contain a plethora of information to help researchers analyze the data. As a friendly reminder, it's always useful to use the list in combination with genome browser and other reports provided by BxChIPSeq to get the most out of your data.
Peak Report with Annotations
Here the peak annotation file is shown open in Excel. The columns can be divided into four major categories, covering peak, annotations, nearest gene and sequence tag count.
To demonstrate the capabilities of BxChIPSeq 2.0, we downloaded two sets of ChIP-Seq experiment data from NIH SRA Database; one uses illumine sequencing for transcription factor binding, the other uses SOLiD sequencing for histone modification.
Illumine Sequencing For Transcription Factor Binding
TCF7L2 transcription factor in HepG2 cell line. The experiment contains two technical replicates for ChIP experiments, and one input control.
Study summary: GSE31477: ENCODE Transcription Factor Binding Sites by ChIP-Seq from Stanford/Yale/USC/Harvard (SRP007993)
Instrument model: Illumina Genome Analyzer IIx
Processing pipeline: Base Caller v
Species: Human (hg19)
Notes: ChIP-Seq experiment data for TCF7L2 transcription factor in HepG2 cell line from NIH SRA Database
Histone H3K4me3 modification in mouse brain. Brain tissue from 10 week male BABL/c mouse was used in the study. The experiment contains a single run for ChIP experiment and no input control.
Instrument model: ABI SOLiD System 3.0
Spot Descriptor: Forward
Species: Mouse (mm9)
Notes: ChIP-Seq experiment data H3K4me3 modification of mouse brain from NIH SRA Database
BxChIPSeq 2.0 generates a webpage for each ChIP-Seq experiment, plus an extra page for control sample so you can view control tracks in genome browsers as well. Let's use the TCF7L2A page as an example.
We have put notes in red to help you get started.
Example of BxChIPSeq 2.0 Output
Tutorial: Review data in UCSC Genome Browser
If you want to see the sequence tag coverage and peak calls for TCF7L2A ChIP experiments in UCSC genome browser, just click the "Review in UCSC Genome browser" links. Be patient, as it may take some time for UCSC browser to load the file from your data page, especially for the large sequence coverage file. Once the files are loaded, you will have two custom tracks, one for sequence tag coverage, the other for peaks. Now you can go to any genomic region, or search for your favorite gene within UCSC genome browser. You can also add other annotation tracks hosted at UCSC.
Review ChIP-Seq data in UCSC Browser
Now if you also want to display the input channel in the browser, you need to go back to home page, open the input webpage, and click the "Review in UCSC Genome browser" link under sequence coverage files. Wait for the browser to load the file, and now you have both tracks.
But wait, it looks like there are many peaks in the input channel! By careful examination, you will see that this is because UCSC auto scaled the sequence tag coverage tracks, and this makes the noise from input channel artificially tall in the display.
Review ChIP-Seq data in UCSC Browser with Default Settings (Input Track too high due to auto-scaling)
So what you need is to use the same display range for the two tracks. To do this, move mouse over the ChIP track, right click (or Control+Click in a Mac), and select configure. A new window will pop up, just enter 35 for the max vertical viewing and hit ok.
Open Configure Display Settings Window for Custom Track
Configure Display Settings for Custom Track in UCSC Browser
Next do the same for the input channel to set the vertical viewing range to 0-35. Now you have set the vertical display range to be the same for ChIP and Input tracks, and the data will be comparable between the two tracks. You can clearly see the very strong peak in ChIP track, and almost no signal from input control.
Review ChIP-Seq data in UCSC Browser with Correct Settings (Same Display Range for ChIP and Input Tracks)
Now you can add more tracks or change the view, and enjoy exploring your ChIP-Seq data in UCSC genome browser.
Tutorial: Review data in IGV
Integrative Genomics Viewer (IGV) is adopted by many researchers working with next-gen sequencing data due to its speed and capability to handle large data sets. BxChIPSeq provides output files (TDF file for sequence coverage, bed file for peaks) that can be loaded into IGV.
Step 1. Install IGV. Go to IGV website, register for a free account with your email address, and now you can download and install it to your computer. Launch IGV with 750MB will be enough for most users. But if you computer can handle it, launch with more memory to make it run faster.
Step 2. Download the TDF files for sequence coverage files and the bed files for peaks. Save them to a folder you can easily locate. You may need to move the files from the default place where your browser saves files to your destination folder.
Step 3. Launch IGV, select the appropriate genome build (e.g. hg19 or mm9) that matches your data. You can find the genome build information in the webpage for your ChIP-Seq data.
Step 4. Load the TDF and bed files.
Load the TDF and bed files to IGV Brower
Step 5. Adjust display range. We need to make the display range the same for all ChIP and Input tracks.
Open Set Data Range Option in IGV
In the Data Range window, enter 35 as the maximum. The sequence tag count file has been normalized to 10 million tags for all experiments, and we have found that 35 is a good starting point. However, you may want to increase this to 100-200 for very tall peaks, or decrease this number to ~10 to view weak peaks better.
Set Data Range for Sequence Tag Count Tracks
Step 6, now you can enter a genomic range, a gene or gene name in the search box in IGV. For example, for the demo ChIP-Seq experiment data for TCF7L2 transcription factor in HepG2 cell line, you can enter gene name VAV3 to see three strong binding sites at or near this gene.
View ChIP-Seq Data in IGV with Same Data Range for All Tracks
With IGV, you have many options to display your data. Please see the IGV's User Guide for more information.
View the annotated peaks file in Excel and find all target genes
This tutorial can also be applied to other text files (e.g. gene ontology output files) from the BxChIPSeq output.
To learn more about the columns for the annotated peak file, see the Peak Report for details.
Step 1. Save the text file to your local drive. You can do this by right click (or control click in Mac) the Annotated Peak File link and choose "save link as" or "save target as". Remember where you saved the file.
Step 2. Start Excel, from Open menu, go to the folder where you saved the peak file, make sure to choose all files (*.*) for file type, and open the text file (e.g. TCF7L2A_peaks.annotated.txt).
Import Peak Report Text File to Excel
Step 3. Turn on Auto filter in Excel.
Now you have the annotated peak file opened in Excel, let's try to make it easier to use. Turn on auto filter to access many useful features quickly.
To turn on autofilter, select all the cells, and hit Ctrl-Shift-L.
Turn on AutoFilter in Excel
Step 4. Find all target genes whose promoters are occupied by the factor
With auto filter, you can do a lot of sorting and filtering in Excel. Here we will show you how to quickly identify all the target genes for a factor.
In the peak report file, if a peak falls near a promoter of a gene, it is listed in the annotation field. To filter for peaks that fall into promoters, do the following for column H.
Create Custom AutoFilter to Select Promoters
Create Custom AutoFilter to Select Promoters
Enter promoter in the filter for annotations, and Excel will now only display genes whose promoters are occupied by the factor.
All Genes with Peaks at Promoters
Demo Data
In order to access the demo data, please register for a free account.
If you already have a BioInfoRx account, please click here to sign in.
A. It's very simple. After we receive your order, we will give you instructions to ftp your raw sequence data to us. Once we receive the data, we will build a secure website containing all analysis results from your data within a week. You will receive a link and password to view your data.
Q. What data format do you need for the raw data?
A. Typically researchers send us the raw fastq files. We can also take sequence read archive data format (.sra), or aligned SAM files from program like Bowtie.
Q. How long do you host my data on the website?
A. One year. After one year, you can pay a nominal fee to keep the data on the website, which is convenient if you often use the link to load custom track to UCSC genome browser.
Q. Can I download all the data from the website?
A. Absolutely. You can download all the data to your local drive. The advantage of the website is that you can access it from any computer, anywhere with internet connection.
Q. Can I compare between two ChIP experiments? I have drug treated and untreated ChIP samples, and I want to see what peaks are induced by the drug.
A. Yes, you can simply create a new analysis in the Sample Submission Form, with drug treated as ChIP run, and untreated ChIP data as control run.
Q. Is the website secure? Can I share the data with my collaborators?
A. Only you have the URL and passwords to your data webpage. You can share the user name and password with your team members or collaborators. If you really want to delete the data on the website, please inform us after you've downloaded a local copy.
Q. Can I get more technology and method background of BxChIPSeq 2.0 service?
A. To learn more about some of the tools and technologies related to BxChIPSeq 2.0, please check these references.
1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.2009;10(3):R25. Epub 2009 Mar 4. PubMed PMID: 19261174; PubMed Central PMCID:PMC2690996.
Bowtie is a fast tool to align sequence reads to the genome.
2. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D.The human genome browser at UCSC. Genome Res. 2002 Jun;12(6):996-1006. PubMed PMID: 12045153; PubMed Central PMCID: PMC186604.
3. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C,Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010 May 28;38(4):576-89. PubMed PMID: 20513432; PubMed Central PMCID: PMC2898526.
There are many tools for peak finding in ChIP-Seq data. Homer stands out as it provides one of the most comprehensive feature set and detailed annotations of the peaks and genes.
Integrative Genomics Viewer (IGV) is adopted by many researchers working with next-gen sequencing data due to its speed and capability to handle large data sets.