Data Preperation

Experiment Data Format

Experiment data is the data we need from the experiment that isn't specific to an analysis. This includes the locations of tested elements and associated elements, if applicable.

The experiment data file is a tab-separated list of values. We often call it "tested_elements.tsv", but the name can be anything as long as it's also in the metadata file. The columns are as follows:

chrom
Tested element chromosome
start
Tested element start location (0-indexed, half-open)
end
Tested element end location (0-indexed, half-open)
strand
Tested element strand
parent_chrom
The chromosome of an associated element that is the "parent" of the tested element
parent_start
Parent element start location (0-indexed, half-open)
parent_end
Parent element end location (0-indexed, half-open)
parent_strand
Parent element strand
facets
Facets are used for categorization and filtering when searching. In the file these are key-value pairs in the form of key=value with a ; separating pairs. With the key being the name of a facet and the value being a specific kind of key. For example, the facet might be "Assays" and the facet value might be "Flow-FISH CRISPR Screen" These are the current facets. If you want a new facet or to add a new facet value to an existing facet please let us know!
misc
This is any miscellaneous data you'd like included with each item. It should be in the same format as the facets. It won't be used for searching, it'll just exist in the database.

If there are any columns that you don't need just leave them blank, don't remove them completely

Analysis Data Format

Analysis data is the data we need that is specific to a particular analysis. This includes the locations of tested elements, the effect sizes, and p values.

The analysis data file is a tab-separated list of values. We often call it "observations.tsv", but the name can be anything as long as it's also in the metadata file. The columns are as follows:

chrom
Tested element chromosome
start
Tested element start location (0-indexed, half-open)
end
Tested element end location (0-indexed, half-open)
strand
Tested element strand
gene_name
The name of the targeted gene (optional; not all experiments target specific genes)
gene_ensembl_id
The ensembl id of the targeted gene (optional; required if gene_name has a value)
raw_p_val
The p-value of the observation
adj_p_val
The adjusted (e.g., Bonferroni corrected) p-value of the observation
effect_size
The size of the observed effect
facets
See the facet explination above