Here are some notes on the Jackson labs database. The schema viewer at: http://www.informatics.jax.org/schema/ is helpful for understanding their database. Galt has created a local copy of the database using some scripts around sqsh. The main tables we need for VisiGene are: GXD_Assay - Gene Expression Database Assay. Includes info on in situs and other assays such as northern blots. MRK_Marker - Contains information on markers, many of which are genes. PRB_Probe - Contains primers sometimes for RNA/DNA hybridization probes. GXD_InSituResultImage & GXD_InSituResult - link image to specimen GXD_Specimen - Describes sex, age, links to genotype IMG_Image - Describe image (but not how to get to image file). Getting from gene to image: MRK_Marker.symbol MRK_Marker._Marker_key GXD_Assay._Marker_key GXD_Assay._Assay_key [GXD_Assay._ImagePane_key (But this is NULL some places where image exists)] GXD_Specimen._Assay_key GXD_Specimen._Specimen_key GXD_InSituResult._Specimen_key GXD_InSituResult._Result_key GXD_InSituResultImage._Result_key GXD_InSituResultImage._ImagePane_key IMG_ImagePane._ImagePane_key IMG_ImagePane._Image_key IMG_Image._Image_key ACC_Accession._Object_key where prefixPart="PIX:" select MRK_Marker.symbol as gene,BIB_Refs._primary as author, BIB_Refs.journal as journal,GXD_Specimen.sex as sex, GXD_Specimen.ageMin as ageMin,GXD_Specimen.ageMax as ageMax, GXD_StructureName.structure as structure, GXD_Expression.expressed as exp, GXD_Pattern.pattern as pattern, IMG_ImagePane.paneLabel as pane,IMG_Image._Image_key as imageKey, ACC_Accession.accID as fileKey from MRK_Marker,BIB_Refs,GXD_Assay,GXD_Expression, GXD_Structure,GXD_StructureName, GXD_Specimen,GXD_InSituResult,GXD_Pattern, GXD_InSituResultImage,IMG_ImagePane, IMG_Image,ACC_Accession where MRK_Marker._Marker_key = GXD_Assay._Marker_key and GXD_Assay._Refs_key = BIB_Refs._Refs_key and GXD_Assay._Assay_key = GXD_Specimen._Assay_key and GXD_Assay._Assay_key = GXD_Expression._Assay_key and GXD_Expression._Structure_key = GXD_Structure._Structure_key and GXD_Structure._StructureName_key = GXD_StructureName._StructureName_key and GXD_Specimen._Specimen_key = GXD_InSituResult._Specimen_key and GXD_InSituResult._Result_key = GXD_InSituResultImage._Result_key and GXD_InSituResult._Pattern_key = GXD_Pattern._Pattern_key and GXD_InSituResultImage._ImagePane_key = IMG_ImagePane._ImagePane_key and IMG_ImagePane._Image_key = IMG_Image._Image_key and IMG_Image._Image_key = ACC_Accession._Object_key and ACC_Accession.prefixPart = "PIX:" limit 1000; select MRK_Marker.symbol,_Specimen_key,GXD_Assay._Assay_key,sex,age,ageMin,ageMax from MRK_Marker, GXD_Specimen, GXD_Assay where _Refs_key = 39558 and MRK_Marker._Marker_key = GXD_Assay._Marker_key and GXD_Specimen._Assay_key = GXD_Assay._Assay_key order by _Specimen_key limit 100; select MRK_Marker.symbol as gene,GXD_Assay._Assay_key as assay, GXD_InSituResult._Result_key as isr, BIB_Refs._primary as author, BIB_Refs.journal as journal,GXD_Specimen.sex as sex, GXD_Specimen.ageMin as ageMin,GXD_Specimen.ageMax as ageMax, IMG_ImagePane.paneLabel as pane,IMG_Image._Image_key as imageKey, ACC_Accession.accID as fileKey from MRK_Marker,BIB_Refs,GXD_Assay, GXD_Specimen,GXD_InSituResult, GXD_InSituResultImage,IMG_ImagePane, IMG_Image,ACC_Accession where MRK_Marker._Marker_key = GXD_Assay._Marker_key and GXD_Assay._Refs_key = BIB_Refs._Refs_key and GXD_Assay._Assay_key = GXD_Specimen._Assay_key and GXD_Specimen._Specimen_key = GXD_InSituResult._Specimen_key and GXD_InSituResult._Result_key = GXD_InSituResultImage._Result_key and GXD_InSituResultImage._ImagePane_key = IMG_ImagePane._ImagePane_key and IMG_ImagePane._Image_key = IMG_Image._Image_key and IMG_Image._Image_key = ACC_Accession._Object_key and ACC_Accession.prefixPart = "PIX:" and BIB_Refs._Refs_key = 39558; limit 1000; Main table for expression is GXD_Assay. We want cases where the field _AssayType_key is 1 - RNA In Situ 6 - Immunohistochemistry 9 - In Situ reporter (knock in) of which there currently are 9003 rows. This table mostly is pointers to other tables. Here's the fields and what they point to: GXD_Assay FIELDS _Assay_key Primary key for this table (GXD_Assay) _AssayType_key Key to GXD_AssayType _Refs_key Key to BIB_Refs which includes journal, author, etc. _Marker_key Key to MRK_Marker. Table includes symbol (gene symbol) when MRK_Marker._Marker_Type_key=1 _ProbePrep_key Key to GXD_ProbePrep. References PRB_Probe, GXD_Label, GXD_LabelCoverage, GXD_VisualizationMethod. _AntibodyPrep_key Key to GXD_AntibodyPrep. References GXD_Antibody, GXD_Label. Either this or ProbePrep is 0. _ImagePane_key Key to IMG_ImagePane. References IMG_Image, which has _ReporterGene_key Seems to be zero in all cases _CreatedBy_key Also need: GXD_InSituResultImage FIELDS _Result_key _ImagePane_key And GXD_InSituResult FIELDS _Result_key _Specimen_key Key to GXD_Specimen _Strength_key Key to GXD_Strength _Pattern_key Key to GXD_Pattern sequenceNum resultNote USEFUL QUERIES Here's a useful query to sample the GXD_Assay table: select _AssayType_key, _Refs_key, _Marker_key, _ProbePrep_key, _AntibodyPrep_key, _ImagePane_key, _ReporterGene_key, _CreatedBy_key, _ModifiedBy_key from GXD_Assay where _AssayType_key in (1,6,9) and (_AntibodyPrep_key != 0 || _ProbePrep_key<3000) order by rand() limit 10; And the GXD_Marker table: select _Marker_key as marker, _Organism_key as org, _Marker_Status_key as status, _Marker_Type_key as type, _CurationState_key as cura, symbol, name, chromosome as chrom, cytogeneticOffset as cyto, _CreatedBy_key from MRK_Marker order by rand() limit 10; REPORTER GENES 74771 - lacZ 74771 - GFP