Data Preperation

Experiment Data Format

Experiment data is the data we need from the experiment that isn't specific to an analysis. This includes the locations of tested elements and associated elements, if applicable.

The experiment data file is a tab-separated list of values. We often call it "tested_elements.tsv", but the name can be anything as long as it's also in the metadata file. The columns are as follows:

chrom: Tested element chromosome
start: Tested element start location (0-indexed, half-open)
end: Tested element end location (0-indexed, half-open)
strand: Tested element strand
parent_chrom: The chromosome of an associated element that is the "parent" of the tested element
parent_start: Parent element start location (0-indexed, half-open)
parent_end: Parent element end location (0-indexed, half-open)
parent_strand: Parent element strand
facets: Facets are used for categorization and filtering when searching. In the file these are key-value pairs in the form of key=value with a ; separating pairs. With the key being the name of a facet and the value being a specific kind of key. For example, the facet might be "Assays" and the facet value might be "Flow-FISH CRISPR Screen" These are the current facets. If you want a new facet or to add a new facet value to an existing facet please let us know!
misc: This is any miscellaneous data you'd like included with each item. It should be in the same format as the facets. It won't be used for searching, it'll just exist in the database.

If there are any columns that you don't need just leave them blank, don't remove them completely

Analysis Data Format

Analysis data is the data we need that is specific to a particular analysis. This includes the locations of tested elements, the effect sizes, and p values.

The analysis data file is a tab-separated list of values. We often call it "observations.tsv", but the name can be anything as long as it's also in the metadata file. The columns are as follows:

chrom: Tested element chromosome
start: Tested element start location (0-indexed, half-open)
end: Tested element end location (0-indexed, half-open)
strand: Tested element strand
gene_name: The name of the targeted gene (optional; not all experiments target specific genes)
gene_ensembl_id: The ensembl id of the targeted gene (optional; required if gene_name has a value)
raw_p_val: The p-value of the observation
adj_p_val: The adjusted (e.g., Bonferroni corrected) p-value of the observation
effect_size: The size of the observed effect
facets: See the facet explination above