Benchmarks on PyTables Undo/Redo ================================ This is a small report for the performance of the Undo/Redo feature in PyTables. A small script (see undo_redo.py) has been made in order to check different scenarios for Undo/Redo, like creating single nodes, copying children from one group to another, and creating attributes. Undo/Redo is independent of object size --------------------------------------- Firstly, one thing to be noted is that the Undo/Redo feature is independent of the object size that is being treated. For example, the times for 10 objects (flag -n) each one with 10 elements (flag -s) is: $ time python2.4 undo_redo.py -n 10 -i 2 -s 10 data.nobackup/undo_redo.h5 Time for Undo, Redo (createNode): 0.213686943054 s, 0.0727670192719 s Time for Undo, Redo (createNode): 0.271666049957 s, 0.0740389823914 s Time for Undo, Redo (copy_children): 0.296227931976 s, 0.161941051483 s Time for Undo, Redo (copy_children): 0.363519906998 s, 0.162662982941 s Time for Undo, Redo (set_attr): 0.208750009537 s, 0.0732419490814 s Time for Undo, Redo (set_attr): 0.27628993988 s, 0.0736088752747 s real 0m5.557s user 0m4.354s sys 0m0.729s Note how all tests take more or less the same amount of time. This is because a move operation is used as a central tool to implement the Undo/Redo feature. Such a move operation has a constant cost, independently of the size of the objects. For example, using objects with 1000 elements, we can see that this does not affect the Undo/Redo speed: $ time python2.4 undo_redo.py -n 10 -i 2 -s 1000 data.nobackup/undo_redo.h5 Time for Undo, Redo (createNode): 0.213760137558 s, 0.0717759132385 s Time for Undo, Redo (createNode): 0.276151895523 s, 0.0724079608917 s Time for Undo, Redo (copy_children): 0.308417797089 s, 0.168260812759 s Time for Undo, Redo (copy_children): 0.382102966309 s, 0.168042898178 s Time for Undo, Redo (set_attr): 0.209735155106 s, 0.0740969181061 s Time for Undo, Redo (set_attr): 0.279798984528 s, 0.0770981311798 s real 0m5.835s user 0m4.585s sys 0m0.736s Undo/Redo times grow linearly with the number of objects implied ---------------------------------------------------------------- Secondly, the time for doing/undoing is obviously proportional (linearly) to the number of objects that are implied in that process (set by -n): $ time python2.4 undo_redo.py -n 100 -i 2 -s 10 data.nobackup/undo_redo.h5 Time for Undo, Redo (createNode): 2.27267885208 s, 0.779091119766 s Time for Undo, Redo (createNode): 2.31264209747 s, 0.766252040863 s Time for Undo, Redo (copy_children): 3.01871585846 s, 1.63346219063 s Time for Undo, Redo (copy_children): 3.07704997063 s, 1.62615203857 s Time for Undo, Redo (set_attr): 2.18017196655 s, 0.809293985367 s Time for Undo, Redo (set_attr): 2.23039293289 s, 0.809432029724 s real 0m48.395s user 0m40.385s sys 0m6.914s A note on actual performance and place for improvement ------------------------------------------------------ Finally, note how the Undo/Redo capability of PyTables is pretty fast. The next benchmark makes 1000 undo and 1000 redos for create_array: $ time python2.4 undo_redo.py -n 1000 -i 2 -t createNode -s 1000 data.nobackup/undo_redo.h5 Time for Undo, Redo (createNode): 22.7840828896 s, 7.9872610569 s Time for Undo, Redo (createNode): 22.2799329758 s, 7.95833396912 s real 1m32.307s user 1m16.598s sys 0m15.105s i.e. an undo takes 23 milliseconds while a redo takes 8 milliseconds approximately. The fact that undo operations take 3 times more than redo is probably due to how the action log is implemented. The action log has been implemented as a Table object, and PyTables has been optimized to read rows of tables in *forward* direction (the one needed for redo operations). However, when looking in *backward* direction (needed for undo operations), the internal cache of PyTables is counterproductive and makes look-ups quite slow (compared with forward access). Nevertheless, the code for Undo/Redo has been optimized quite a bit to smooth this kind of access as much as possible, but with a relative success. A more definitive optimization should involve getting much better performance for reading tables in backward direction. That would be a major task, and can be eventually addressed in the future. Francesc Alted 2005-03-10