Data Preperation
Experiment Data Format
Experiment data is the data we need from the experiment that isn't specific to an analysis. This includes
the locations of tested elements and associated elements, if applicable.
The experiment data file is a tab-separated list of values. We often call it "tested_elements.tsv", but the name
can be anything as long as it's also in the metadata file.
The columns are as follows:
- chrom
- Tested element chromosome
- start
- Tested element start location (0-indexed, half-open)
- end
- Tested element end location (0-indexed, half-open)
- strand
- Tested element strand
- parent_chrom
- The chromosome of an associated element that is the "parent" of the tested element
- parent_start
- Parent element start location (0-indexed, half-open)
- parent_end
- Parent element end location (0-indexed, half-open)
- parent_strand
- Parent element strand
- facets
-
Facets are used for categorization and filtering when searching. In the file these
are key-value pairs in the form of
key=value
with a ;
separating pairs. With the key
being the name of a facet and the
value
being a specific kind of key
. For example, the
facet might be "Assays" and the facet value might be "Flow-FISH CRISPR Screen"
These are the current facets. If you want
a new facet or to add a new facet value to an existing facet please
let us know!
- misc
-
This is any miscellaneous data you'd like included with each item. It should be
in the same format as the facets. It won't be used for searching, it'll just exist
in the database.
If there are any columns that you don't need just leave them blank, don't remove them completely
Analysis Data Format
Analysis data is the data we need that is specific to a particular analysis. This includes the locations of tested elements,
the effect sizes, and p values.
The analysis data file is a tab-separated list of values. We often call it "observations.tsv", but the name can be anything as
long as it's also in the metadata file. The columns are as follows:
- chrom
- Tested element chromosome
- start
- Tested element start location (0-indexed, half-open)
- end
- Tested element end location (0-indexed, half-open)
- strand
- Tested element strand
- gene_name
- The name of the targeted gene (optional; not all experiments target specific genes)
- gene_ensembl_id
- The ensembl id of the targeted gene (optional; required if gene_name has a value)
- raw_p_val
- The p-value of the observation
- adj_p_val
- The adjusted (e.g., Bonferroni corrected) p-value of the observation
- effect_size
- The size of the observed effect
- facets
- See the facet explination above