Style Guide for Python Code CODE CONVENTIONS Follow the Python coding conventions laid out by the Python Style guide, except for the UCSC Genomics Group specific conventions outlined below. http://www.python.org/dev/peps/pep-0008/ INDENTATION AND SPACING Each block of code is indented by 4 spaces from the previous block. Do not use tabs to separate blocks of code. The indentation convention differs from the C coding style found in src/README, which uses 4-base indents/8-base tabs. Common editor configurations for disallowing tabs are: vim: Add "set expandtab" to .vimrc emacs: Add "(setq-default indent-tabs-mode nil)" to .emacs Lines are no more than 100 characters wide. INTERPRETER DIRECTIVE The first line of any UCSC Python script should be: #!/usr/bin/env python2.7 This line will invoke python2.7 found in the user's PATH. It ensures that scripts developed by UCSC can be distributed and explicitly states which Python version was used to develop the scripts. NAMING CONVENTIONS Use mixedCase for symbol names: the leading character is not capitalized and all successive words are capitalized. (Classes are an exception, see below.) Non-UCSC Python code may follow other conventions and does not need to be adapted to these naming conventions. Abbreviations follow rules in src/README: Abbreviation of words is strongly discouraged. Words of five letters and less should generally not be abbreviated. If a word is abbreviated in general it is abbreviated to the first three letters: tabSeparatedFile -> tabSepFile In some cases, for local variables abbreviating to a single letter for each word is okay: tabSeparatedFile -> tsf In complex cases you may treat the abbreviation itself as a word, and only the first letter is capitalized: genscanTabSeparatedFile -> genscanTsf Numbers are considered words. You would represent "chromosome 22 annotations" as "chromosome22Annotations" or "chr22Ann." Note the capitalized 'A" after the 22. Packages and Modules In Python, a package is represented as a directory with an __init__.py file in it, and contains some number of modules, which are represented as files with a .py extension. A module may in turn contain any number of related classes and methods. This differs from Java, where one file correlates to one class: in Python it is correct to treat one module similar to a whole namespace in Java. In general try to keep modules on the order of 100's of lines. Internal packages and modules should have short names in mixedCase, with no spaces or underscores. A good example of this style is the ucscGb package: ucscGb/ __init__.py ra.py cv.py ... For more information: http://docs.python.org/tutorial/modules.html Imports The most correct way to import something in Python is by specifying its containing module: import os from ucscGb import ra Then, the qualified name can be used: someRa = ra.RaFile() Do not import as below, as this may cause local naming conflicts: from ucscGb.ra import RaFile from ucscGb.track import * Imports should follow the structure: 1. Each import should be on a separate line, unless modules are from the same package: import os import sys from ucscGb import ra, track, qa 2. Imports should be at the top of the file. Each section should be separated by a blank line: a. standard library imports b. third party package/module imports c. local package/module imports For more information, see the "Imports" section: http://www.python.org/dev/peps/pep-0008/ Classes CapitalCase names. Note the leading capital letter to distinguish between a ClassName and a functionName. Underscores are not used, except for private internal classes, where the name is preceded by double underscores which Python recognizes as private. Methods mixedCase names. The leading character is not capitalized, but all successive words are capitalized. Underscores are not used, except for private internal methods, where the name is preceded by double underscores which Python recognizes as private. In general try to keep methods around 20 lines. Functions mixedCase names. The leading character is not capitalized, but all successive words are capitalized. In general try to keep methods around 20 lines. Variables mixedCase names. Underscores are not used, except for private internal variables, where the name is preceded by double underscores which Python recognizes as private. COMMENTING Note: Still working out which automation document tool to use. Automated documentation is carried out using the Epydoc tool: http://epydoc.sourceforge.net/ Comments should follow the conventions: 1. Every module should have a paragraph at the beginning. Single class modules may omit paragraph in favor of class comment. 2. Use Python's docstring convention to embed comments, using """triple double quotes""": http://www.python.org/dev/peps/pep-0257/ 3. Use Epytext markup language conventions when commenting: http://epydoc.sourceforge.net/epytext.html 4. Use Epytext field tags to describe specific properties of objects: Structure: a. Fields must be placed at the end of a docstring. b. Each field is distinguished by the following pattern: @tag: body @tag arg: body c. All blocks pertaining to a field must have equal indentation greater than or equal to field tag indentation. d. Optional field tags to use: i. @param - Description of parameter to a function ii. @return - Description of a function's return value def exampleFunction(): """ This paragraph describes the object. @param inputFile: Input file name @return: This is a description of the function's return value """ For more information and supported fields, see: http://epydoc.sourceforge.net/fields.html#fields TESTING Testing is carried out using the unittest module in Python: http://docs.python.org/library/unittest.html This module allows for self-running scripts, which are self-contained and should provide their own input and output directories and files. The scripts themselves are composed of one or more classes, all of which inherit from unittest.TestCase and contain one or more methods which use various asserts or failure checks to determine whether a test passes or not. Structure: 1. At the start of a script import unittest module: import unittest 2. A test case is created as a sub-class of unittest.TestCase: class TestSomeFunctions(unittest.TestCase): 3. Test method names should begin with 'test' so that the test runner is aware of which methods are tests: def testSomeSpecificFuntion(self): 4. Define a setUp method to run prior to start of each test. def setUp(self): 5. Define a tearDown method to run after each test. def tearDown(self): 6. To invoke tests with a simple command-line interface add the following lines: if __name__ == '__main__': unittest.main() For other ways to run tests see: http://docs.python.org/library/unittest.html