He Functional Annotation Tool in DAVID 2006 [33]. Enrichment for functions was calculated
He Functional Annotation Tool in DAVID 2006 [33]. Enrichment for functions was calculated using default background sets provided in DAVID. DAVID uses the Fisher Exact test to PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/26552366 measure functional enrichment in annotation categories from numerous public databases (e.g., KEGG pathways, GO terms, Spir keywords, etc). Enrichment for chromosomal locations was found using DAVID by searching only for enriched chromosomal cytobands. Genes were also clustered according to functional similarity using the Functional Annotation Clustering tool in DAVID. Many of the Additional Files showing gene annotation were modified from DAVID output. TF Coregulators with WT1 The set of potential TFs which may coregulate genes with WT1 was selected from the pool of factors whose classifiers had a measured PPV of 0.6 or greater. For each of the remaining TFs, the hypergeometric test was used to determine whether the number of overlapping targets was significant. Given 18660 genes in our study, 369 predicted targets for WT1 (known and new), and x targets predicted for a second TF, we ask what is the likelihood that y x genes are shared targets of the TF and WT1. The test was implemented using the Matlab statistics toolbox [214]. Positive Binding Targets Known binding sites for human TFs were parsed from several public databases in January 2006. The databases used are Oregano [221], TRDD [222], Transfac [223], Ensembl [224], and the Eukaryotic Promoter Database [225]. Many binding sites were also manually curated from literature sources. Several large-scale experimental binding studies were also examined to identify binding sites [2,32,226229]. In all cases, binding sites found outside of the sequence region studied (i.e., 2 kb upstream, 5′ UTR, introns, and 3′ UTR) were excluded. Lists of literature curated binding sites with Pub-med references and a spreadsheet of binding interactions parsed from the above databases can be downloaded in Additional File 2. Motif Discovery Motif Discovery was performed on WT1 known targets and new predictions. Sequence data for each gene went to 1 kb upstream and 0.5 kb downstream of transcriptional start. The sequence data was downloaded from the human promoter extraction database at Cold Spring Harbor Laboratory [230]. Motif discovery was performed with Weeder [204] and Oligo-analysis [1] available at the RSAtools website [202]. The full raw output from Weeder and1. k-mers his feature is similar to that used in [213] on the yeast genome, and results in a feature set very similar to the spectrum kernel described in [216-218]. The frequency of k-mer counts in intergenic regions can discriminate between genes that are bound by a TF and those that are not. The get Vesnarinone appearances of all k-mers (length 4,5, and 6) are tallied in a gene’s promoter region, 5’UTR, introns, and 3’UTR. The set of counts is assembled into the attribute vector for the gene. For each gene, the counts for 4-mers, 5-mers, and 6-mers are normalized separately to mean 0 and standard deviation 1. This is separate from the feature normalization which occurs prior to SVM training. k-mer counts are performed separately and summed for each regulatory region mentioned above. K-mer counting, which was used, in part, in datasets 1 and 3, was performed using code modified from a script that was kindly provided by Dr. William Stafford Noble of the University of Washington. 2. k-mer verrepresentation This method calculates the significance of occurrences of each k-mer in the a gene’.