Regulon Combination of module, associated gene, and descriptors of their spatial relationship. These are the relative region (HEAD, TAIL, intron, exon), offset (number of bases away from the start or end of the gene). There is no inherent strandedness of a module, thus there is no sense in which the module is on the same or opposite strand as the gene. However, what is important in comparisons is whether the optimal comparison of putatively orthologous module pair preserves or switches strand. This logic will be used to score the Regulon::Similarity RegulonFunctionality An estimate, based on heuristic reasoning, of the probability of true regulation. In the Bayesian sense of "sweep nothing under the rug", arbitrary rules like "let's only consider modules within 10,000 bases of a gene as potential regulators" are expressed in probabilistic concept as P(D|F) where D is an integer offset variable (displacement) and F is a boolean variable (site is functional). Here, the probability would be 1/10,000 for all values of D between 0 and 10,000 and true value of F. Together with P(S|F), we can calculate P(S,D|F) = P(D|F) * P(S|F). iff P(S,D,F) = P(S,F)P(D,F) This is reasonable since there is no reason apriori to suspect that the sequence variants in the module itself would be correlated with their positioning relative to the gene they regulate. Then, P(F|S,D) = P(S,D|F) * P(F) / P(S,D) = P(D|F) * P(S|F) * P(F) / [ P(S) * P(D) ] Should P(D) be uniform? RegulonSimilarity a score describing the similarity of two regulatory relations involving two orthologous genes. the score should be a weighted combination of the module similarity score, the difference in offsets, and the difference in relative regions. More than likely, there will be little emphasis paid to the difference in offsets since most researchers see that orthologous genes have very different patterns of modules...The difference in relative region might be more important, though possibly the regulatory probability estimate, which would render a zero probability for a lot of 'undesirable' relationships. Combinatorial search. Given two genes in which the fewer number of modules is N, find all possible bijections of N modules in each gene with the highest GeneRegulonSimilirity. This will require, for the greater number of modules M, M!/(M-N)! bijections and M*N pre-calculations of summands. Since this is a completely manageable number of summands, each GeneRegulonSimilirity requires just summing N numbers. There is really no optimization needed here... The GeneRegulonSimilirity has two nice properties. First, it disregards weak modules even if they are well conserved. Second, it scales proportionally to the number of modules.