Author: | Francesc Alted i Abad |
---|---|
Contact: | faltet@pytables.org |
The Row accessor implements a new __contains__ special method that allows doing things like:
for row in table:
if item in row:
print "Value found in row", row.nrow
break
Closes #309.
PyTables is more friendly with easy_install and pip now, as all the Python dependencies should be installed automatically. Closes #298.
Updated Blosc to 1.0 (final).
Filter ID of Blosc changed from wrong 32010 to reserved 32001. This will prevent PyTables 2.2 (final) to read files created with Blosc and PyTables 2.2 pre-final. ptrepack can be used to retrieve those files, if necessary. More info in ticket #281.
Recent benchmarks suggest a new parametrization is better in most scenarios:
Some plots have been added to the User’s Manual (chapter 5) showing how the new parametrization works.
Numexpr is not included anymore in PyTables and has become a requisite instead. This is because Numexpr already has decent enough installers and is available in the PyPI repository also, so it should be easy for users to fulfill this dependency.
When using a Numexpr package that is turbo-loaded with Intel’s VML/MKL, the parameter MAX_THREADS will control the number of threads that VML can use during computations. For a finer control, the numexpr.set_vml_num_threads() can always be used.
Cython is used now instead of Pyrex for Pyrex extensions.
Updated to 0.9 version of Blosc compressor. This version can make use of threads so as to accelerate the compression/decompression process. In order to change the maximum number of threads that Blosc can use (2 by default), you can modify the MAX_THREADS variable in tables/parameters.py or make use of the new setBloscMaxThreads() global function.
Reopening already opened files is supported now, provided that there is not incompatibility among intended usages (for example, you cannot reopen in append mode an already opened file in read-only mode).
Option --print-versions for test_all.py script is now preferred over the deprecated --show-versions. This is more consistent with the existing print_versions() function.
Fixed a bug that, under some circumstances, prevented the use of table iterators in itertool.groupby(). Now, you can safely do things like:
sel_rows = table.where('(row_id >= 3)')
for group_id, grouped_rows in itertools.groupby(sel_rows, f_group):
group_mean = average([row['row_id'] for row in grouped_rows])
Fixes #264.
Copies of Array objects with multidimensional atoms (coming from native HDF5 files) work correctly now (i.e. the copy holds the atom dimensionality). Fixes #275.
The tables.openFile() function does not try anymore to open/close the file in order to guess whether it is a HDF5 or PyTables one before opening it definitely. This allows the fcntl.flock() and fcntl.lockf() Python functions to work correctly now (that’s useful for arbitrating access to the file by different processes). Thanks to Dag Sverre Seljebotn and Ivan Vilata for their suggestions on hunting this one! Fixes #185.
The estimation of the chunksize when using multidimensional atoms in EArray/Carray was wrong because it did not take in account the shape of the atom. Thanks to Ralf Juengling for reporting. Fixes #273.
Non-contiguous arrays can now safely be saved as attributes. Before, if arrays were not contiguous, incorrect data was saved in attr. Fixes #270.
EXTDIM attribute for CArray/EArray now saves the correct extendeable dimension, instead of rubbish. This does not affected functionality, because extendeable dimension was retrieved directly from shape information, but it was providing misleading information to the user. Fixes #268.
Row.__contains__() disabled because it has little sense to query for a key in Row, and the correct way should be to query for it in Table.colnames or Table.colpathnames better. Closes #241.
[Semantic change] To avoid a common pitfall when asking for the string representation of a Row class, Row.__str__() has been redefined. Now, it prints something like:
>>> for row in table:
... print row
...
/newgroup/table.row (Row), pointing to row #0
/newgroup/table.row (Row), pointing to row #1
/newgroup/table.row (Row), pointing to row #2
instead of:
>>> for row in table:
... print row
...
('Particle: 0', 0, 10, 0.0, 0.0)
('Particle: 1', 1, 9, 1.0, 1.0)
('Particle: 2', 2, 8, 4.0, 4.0)
Use print row[:] idiom if you want to reproduce the old behaviour. Closes #252.
Added Expr, a class for evaluating expressions containing array-like objects. It can evaluate expressions (like ‘3*a+4*b’) that operate on arbitrary large arrays while optimizing the resources (basically main memory and CPU cache memory) required to perform them. It is similar to the Numexpr package, but in addition to NumPy objects, it also accepts disk-based homogeneous arrays, like the Array, CArray, EArray and Column PyTables objects.
Added support for NumPy’s extended slicing in all Leaf objects. With that, you can do the next sort of selections:
array1 = array[4] # simple selection
array2 = array[4:1000:2] # slice selection
array3 = array[1, ..., ::2, 1:4, 4:] # general slice selection
array4 = array[1, [1,5,10], ..., -1] # fancy selection
array5 = array[np.where(array[:] > 4)] # point selection
array6 = array[array[:] > 4] # boolean selection
Thanks to Andrew Collette for implementing this for h5py, from which it has been backported. Closes #198 and #209.
Numexpr updated to 1.3.1. This can lead to up a 25% improvement of the time for both in-kernel and indexed queries for unaligned tables.
HDF5 1.8.3 supported.
After the introduction of the shape attribute for Column objects, the shape information for multidimensional columns has been removed from the dtype attribute (it is set to the base type of the column now). Closes #232.
Enjoy data!
– The PyTables Team