Process next-generation sequencing data online

Finally you can unleash the full potential of your ChIP-Seq data in a quick and easy way. With BxChIPSeq 2.0, you can focus on the biology without worrying about hardware, software, algorithms. Best of all, all the powerful analysis results are easily understandable and there is no steep learning curve.

Introducing BxChIPSeq 2.0

Analyzing ChIP-Seq Results Has Never Been This Easy!

What is BxChIPSeq 2.0?

BxChIPSeq 2.0 helps you to extract the rich biological information in your sequencing run. You can expand your spreadsheet to include much more:

  • Confirm binding events by examining sequence tag coverage from IP and control.
  • Put peaks in context of genes, conservation, expression, published ChIP-Seq data sets, SNP markers and many more by reviewing your data in a genome browser.
  • Identify DNA binding motifs for the factor or histone modification you are investigating.
  • Figure out whether the genes regulated by the factor are enriched for specific function, biological pathway, or protein domains.
  • Know where peaks tend to occur in the genome by identifying enriched genomic features like promoter, exons, introns, or intergenic regions.

With BxChIPSeq 2.0, all you need to do is to send us your raw sequence data, and within a week you will have access to all the analysis mentioned above for your data in a secure website. You can log into your webpage anytime from anywhere, and you can share the results with your team members and collaborators by giving them access to your webpage.

Simple Pricing: Each ChIP-Seq Analysis is only $199.00, about 10% of what you have already spent on running ChIP experiments and generating sequencing data. With this small investment, you can delve into deeper layers of your data, and easily get 5x or 10X more biological insights from your ChIP-Seq experiments.

What You Get

With BxChIPSeq 2.0 service, you can access all the data analysis outputs from a secure webpage built from your raw ChIP-Seq data. You can display your data in UCSC genome browser, view DNA motifs, and identify enriched biological functions and pathways. Click each of the tabs below to learn more about the data outputs.

Review your ChIP-Seq Data in Genome Browsers

It's very useful to visualize your ChIP-Seq data in a genome browser. You can confirm binding events by examining sequence tag coverage from IP and control data, and you can review peaks in the genomic context of genes, conservation, expression, and other biological information.

Compatitable with UCSC Browser

With BxChIPSeq 2.0 service, you can go to UCSC browser to view your data with a single click. UCSC browser will load the appropriate file directly from your data webpage, so you don't need to upload a large file as custom track to UCSC browser. You can also share the link with your collaborators for them to view the data directly.

Review ChIP-Seq data in UCSC Browser

In this example, we are showing the sequence coverage (normalized tag counts) from ChIP sample and the peak calls as two custom tracks. Here you can browse the data along with other useful tracks provided by UCSC, like genes, conservation with other species, repeats, SNPs, and many more.

You can add all your data to UCSC genome browser by clicking one link at a time from the webpage for your own data.

Compatitable with Integrative Genomics Viewer (IGV)

Another popular genome browser is the Integrative Genomics Viewer (IGV) from the Broad Institute. Many researchers like to use IGV because it is faster for large data sets, and researchers can view raw sequence reads with it.

We provide output files (TDF file for sequence coverage, bed file for peaks) that can be loaded into IGV.

Review ChIP-Seq data in IGV

In this example, we are showing the sequence coverage (normalized tag counts) from two ChIP samples and the input sample, as well as the peak calls in IGV.

Identify DNA Binding Motifs for Transcription Factors

One great use of ChIP-Seq data is to perform motif search to identify DNA sequences that might be responsible for the factor to occupy this region. These DNA sequences are commonly called motifs or cis-elements. Motif search will also shed light on other factors that may work together with the transcription factor you used in ChIP, because those binding site will also be enriched in the peaks due to co-regulation.

BxChIPSeq 2.0 service generates motif search results, and listed the results in a webpage where you can easily browse and search for more information.

Motif Report

In this case, we search for motifs from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line. The top motif actually matches to the known motif for tcf3, another factor in the same family. A few other motifs also rank high, suggesting possible co-regulation of those other factors (FOXA1, HNF1A, GATA1, etc) with TCF7L2.

Functional Implication of Genes Regulated by the Factor

For each ChIP-Seq experiment, researchers can identify many genes that are regulated by the factor (or histone modification) due to the fact that peaks occur within the promoters of these genes. A natural next step is to see if there are common themes for these genes like biological function, cellular location, or protein domains. This kind of information can shed light on the biological role of the factor in the tissue tested.

BxChIPSeq 2.0 will search for enrichment of multiple commonly used functional categories, including Gene Ontology, KEGG Pathways, Interpro, wikipathways, etc. The reports can be accessed from the webpage, or downloaded for viewing in Excel.

Gene Ontology Report

IEnriched Biological Processes (from Gene Ontology) for target genes from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line.

Protein Domain Report

Enriched protein domains (from Interpro) for target genes from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line.

Where in the Genome Do the Peaks Occur More Often?

A typical ChIP-Seq experiment will report many peaks around the genome. Depending on the transcription factor, the peaks may occur mostly at promoters, or can be located at exon, intron, UTRs, CpG island, or intergenic regions.

BxChIPSeq 2.0 will perform search for enriched genomic features and create webpage and text report.

Genomic Feature Report

Enriched genomic features for peaks from a ChIP-Seq experiment for TCF7L2 transcription factor in HepG2 cell line. Here the peaks tend to occur more often at gene rich regions, and promoters, consistent with the TCF7L2's role as a transcription factor.

The List of All Peaks with Detailed Annotations

Despite all the graphic reports, many researchers still need a comprehensive list of the peaks from ChIP-Seq experiment to view in Excel. BxChIPSeq 2.0 creates a detailed peak report that contain a plethora of information to help researchers analyze the data. As a friendly reminder, it's always useful to use the list in combination with genome browser and other reports provided by BxChIPSeq to get the most out of your data.

Peak Report with Annotations

Here the peak annotation file is shown open in Excel. The columns can be divided into four major categories, covering peak, annotations, nearest gene and sequence tag count.

Demo Data

Welcome to Demo Data Page for BxChIPSeq 2.0

To demonstrate the capabilities of BxChIPSeq 2.0, we downloaded two sets of ChIP-Seq experiment data from NIH SRA Database; one uses illumine sequencing for transcription factor binding, the other uses SOLiD sequencing for histone modification.

Illumine Sequencing For Transcription Factor Binding

TCF7L2 transcription factor in HepG2 cell line. The experiment contains two technical replicates for ChIP experiments, and one input control.

  • Study summary: GSE31477: ENCODE Transcription Factor Binding Sites by ChIP-Seq from Stanford/Yale/USC/Harvard (SRP007993)
  • Instrument model: Illumina Genome Analyzer IIx
  • Processing pipeline: Base Caller v
  • Species: Human (hg19)
  • Notes: ChIP-Seq experiment data for TCF7L2 transcription factor in HepG2 cell line from NIH SRA Database
Experiment Control ChIPSeq Name SRA Accession # of Spots # of bases
TCF7L2A Input TCF7L2A SRR340077 26,710,376 961.6M
TCF7L2B Input TCF7L2B SRR340078 25,005,577 900.2M
Input N/A Input SRR353506 28,007,793 896.2M

SOLiD Sequencing For Histone Modification

Histone H3K4me3 modification in mouse brain. Brain tissue from 10 week male BABL/c mouse was used in the study. The experiment contains a single run for ChIP experiment and no input control.

  • Instrument model: ABI SOLiD System 3.0
  • Spot Descriptor: Forward
  • Species: Mouse (mm9)
  • Notes: ChIP-Seq experiment data H3K4me3 modification of mouse brain from NIH SRA Database
Experiment Control ChIPSeq Name SRA Accession # of Spots # of bases
H3K4me3_Brain N/A H3K4me3_Brain SRX119340 52,457,979 2.6G

Understanding the BxChIPSeq Report

1. Review Summary

Example of BxChIPSeq 2.0 Output

BxChIPSeq 2.0 generates a webpage for each ChIP-Seq experiment, plus an extra page for control sample so you can view control tracks in genome browsers as well. Let's use the TCF7L2A page as an example.

We have put notes in red to help you get started.

2. Tutorial: Review data in UCSC Genome Browser

Review ChIP-Seq data in UCSC Browser

If you want to see the sequence tag coverage and peak calls for TCF7L2A ChIP experiments in UCSC genome browser, just click the "Review in UCSC Genome browser" links. Be patient, as it may take some time for UCSC browser to load the file from your data page, especially for the large sequence coverage file. Once the files are loaded, you will have two custom tracks, one for sequence tag coverage, the other for peaks. Now you can go to any genomic region, or search for your favorite gene within UCSC genome browser. You can also add other annotation tracks hosted at UCSC.

Review ChIP-Seq data in UCSC Browser with Default Settings (Input Track too high due to auto-scaling)

Now if you also want to display the input channel in the browser, you need to go back to home page, open the input webpage, and click the "Review in UCSC Genome browser" link under sequence coverage files. Wait for the browser to load the file, and now you have both tracks.

But wait, it looks like there are many peaks in the input channel! By careful examination, you will see that this is because UCSC auto scaled the sequence tag coverage tracks, and this makes the noise from input channel artificially tall in the display.

Open Configure Display Settings Window for Custom Track

So what you need is to use the same display range for the two tracks. To do this, move mouse over the ChIP track, right click (or Control+Click in a Mac), and select configure. A new window will pop up, just enter 35 for the max vertical viewing and hit ok.

Review ChIP-Seq data in UCSC Browser with Correct Settings (Same Display Range for ChIP and Input Tracks)

Next do the same for the input channel to set the vertical viewing range to 0-35. Now you have set the vertical display range to be the same for ChIP and Input tracks, and the data will be comparable between the two tracks. You can clearly see the very strong peak in ChIP track, and almost no signal from input control.

Now you can add more tracks or change the view, and enjoy exploring your ChIP-Seq data in UCSC genome browser.

3. Tutorial: Review data in IGV

Integrative Genomics Viewer (IGV) is adopted by many researchers working with next-gen sequencing data due to its speed and capability to handle large data sets. BxChIPSeq provides output files (TDF file for sequence coverage, bed file for peaks) that can be loaded into IGV.

Step 1. Install IGV. Go to IGV website, register for a free account with your email address, and now you can download and install it to your computer. Launch IGV with 750MB will be enough for most users. But if you computer can handle it, launch with more memory to make it run faster.

Step 2. Download the TDF files for sequence coverage files and the bed files for peaks. Save them to a folder you can easily locate. You may need to move the files from the default place where your browser saves files to your destination folder.

Step 3. Launch IGV, select the appropriate genome build (e.g. hg19 or mm9) that matches your data. You can find the genome build information in the webpage for your ChIP-Seq data.

Load the TDF and bed files to IGV Brower

Step 4. Load the TDF and bed files.

Open Set Data Range Option in IGV

Step 5. Adjust display range. We need to make the display range the same for all ChIP and Input tracks.

Set Data Range for Sequence Tag Count Tracks

In the Data Range window, enter 35 as the maximum. The sequence tag count file has been normalized to 10 million tags for all experiments, and we have found that 35 is a good starting point. However, you may want to increase this to 100-200 for very tall peaks, or decrease this number to ~10 to view weak peaks better.

View ChIP-Seq Data in IGV with Same Data Range for All Tracks

Step 6, now you can enter a genomic range, a gene or gene name in the search box in IGV. For example, for the demo ChIP-Seq experiment data for TCF7L2 transcription factor in HepG2 cell line, you can enter gene name VAV3 to see three strong binding sites at or near this gene.

With IGV, you have many options to display your data. Please see the IGV's User Guide for more information.

4. View the annotated peaks file in Excel and find all target genes

This tutorial can also be applied to other text files (e.g. gene ontology output files) from the BxChIPSeq output.

To learn more about the columns for the annotated peak file, see the Peak Report for details.

Import Peak Report Text File to Excel

Step 1. Save the text file to your local drive. You can do this by right click (or control click in Mac) the Annotated Peak File link and choose "save link as" or "save target as". Remember where you saved the file.

Turn on AutoFilter in Excel

Step 3. Turn on Auto filter in Excel.

Now you have the annotated peak file opened in Excel, let's try to make it easier to use. Turn on auto filter to access many useful features quickly.

To turn on autofilter, select all the cells, and hit Ctrl-Shift-L.

Create Custom AutoFilter to Select Promoters

Step 4. Find all target genes whose promoters are occupied by the factor

With auto filter, you can do a lot of sorting and filtering in Excel. Here we will show you how to quickly identify all the target genes for a factor.

In the peak report file, if a peak falls near a promoter of a gene, it is listed in the annotation field. To filter for peaks that fall into promoters, do the following for column H.

All Genes with Peaks at Promoters

Enter promoter in the filter for annotations, and Excel will now only display genes whose promoters are occupied by the factor.

Demo Data

In order to access the demo data, please sign up for a free account.

If you already have a BioInfoRx account, please sign in first.

Frequently Asked Questions

Q. How does the service work?

A. To learn more about some of the tools and technologies related to BxChIPSeq 2.0, please check these references.

A. It's very simple. After we receive your order, we will give you instructions to ftp your raw sequence data to us. Once we receive the data, we will build a secure website containing all analysis results from your data within a week. You will receive a link and password to view your data.

Q. What data format do you need for the raw data?

A. Typically researchers send us the raw fastq files. We can also take sequence read archive data format (.sra), or aligned SAM files from program like Bowtie.

Q. How long do you host my data on the website?

A. One year. After one year, you can pay a nominal fee to keep the data on the website, which is convenient if you often use the link to load custom track to UCSC genome browser.

Q. Can I download all the data from the website?

A. Absolutely. You can download all the data to your local drive. The advantage of the website is that you can access it from any computer, anywhere with internet connection.

Q. Can I compare between two ChIP experiments? I have drug treated and untreated ChIP samples, and I want to see what peaks are induced by the drug.

A. Yes, you can simply create a new analysis in the Sample Submission Form, with drug treated as ChIP run, and untreated ChIP data as control run.

Q. Is the website secure? Can I share the data with my collaborators?

A. Only you have the URL and passwords to your data webpage. You can share the user name and password with your team members or collaborators. If you really want to delete the data on the website, please inform us after you've downloaded a local copy.

Q. Can I get more technology and method background of BxChIPSeq 2.0 service?

1. Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol.2009;10(3):R25. Epub 2009 Mar 4. PubMed PMID: 19261174; PubMed Central PMCID:PMC2690996.

Bowtie is a fast tool to align sequence reads to the genome.

2. Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D.The human genome browser at UCSC. Genome Res. 2002 Jun;12(6):996-1006. PubMed PMID: 12045153; PubMed Central PMCID: PMC186604.

UCSC Genome Browser is the most popular tool to view most genomes.

3. Heinz S, Benner C, Spann N, Bertolino E, Lin YC, Laslo P, Cheng JX, Murre C,Singh H, Glass CK. Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. Mol Cell. 2010 May 28;38(4):576-89. PubMed PMID: 20513432; PubMed Central PMCID: PMC2898526.

There are many tools for peak finding in ChIP-Seq data. Homer stands out as it provides one of the most comprehensive feature set and detailed annotations of the peaks and genes.

4. Robinson JT, Thorvaldsdottir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nat Biotechnol. 2011 Jan;29(1):24-6.PubMed PMID: 21221095.

Integrative Genomics Viewer (IGV) is adopted by many researchers working with next-gen sequencing data due to its speed and capability to handle large data sets.