This class represents homogeneous datasets in an HDF5 file.
This class provides methods to write or read data to or from array objects in the file. This class does not allow you neither to enlarge nor compress the datasets on disk; use the EArray class (see The EArray class) if you want enlargeable dataset support or compression features, or CArray (see The CArray class) if you just want compression.
An interesting property of the Array class is that it remembers the flavor of the object that has been saved so that if you saved, for example, a list, you will get a list during readings afterwards; if you saved a NumPy array, you will get a NumPy object, and so forth.
Note that this class inherits all the public attributes and methods that Leaf (see The Leaf class) already provides. However, as Array instances have no internal I/O buffers, it is not necessary to use the flush() method they inherit from Leaf in order to save their internal state to disk. When a writing method call returns, all the data is already on disk.
Parameters : | parentnode :
name : str
obj :
title :
byteorder :
|
---|
An Atom (see The Atom class and its descendants) instance representing the type and shape of the atomic objects to be saved.
The size of the rows in bytes in dimensions orthogonal to maindim.
On iterators, this is the index of the current row.
The number of rows in the array.
Get the enumerated type associated with this array.
If this array is of an enumerated type, the corresponding Enum instance (see The Enum class) is returned. If it is not of an enumerated type, a TypeError is raised.
Iterate over the rows of the array.
This method returns an iterator yielding an object of the current flavor for each selected row in the array. The returned rows are taken from the main dimension.
If a range is not supplied, all the rows in the array are iterated upon - you can also use the Array.__iter__() special method for that purpose. If you only want to iterate over a given range of rows in the array, you may use the start, stop and step parameters.
Examples
result = [row for row in arrayInstance.iterrows(step=4)]
Changed in version 3.0: If the start parameter is provided and stop is None then the array is iterated from start to the last line. In PyTables < 3.0 only one element was returned.
Get the next element of the array during an iteration.
The element is returned as an object of the current flavor.
Get data in the array as an object of the current flavor.
The start, stop and step parameters can be used to select only a range of rows in the array. Their meanings are the same as in the built-in range() Python function, except that negative values of step are not allowed yet. Moreover, if only start is specified, then stop will be set to start + 1. If you do not specify neither start nor stop, then all the rows in the array are selected.
The out parameter may be used to specify a NumPy array to receive the output data. Note that the array must have the same size as the data selected with the other parameters. Note that the array’s datatype is not checked and no type casting is performed, so if it does not match the datatype on disk, the output will not be correct. Also, this parameter is only valid when the array’s flavor is set to ‘numpy’. Otherwise, a TypeError will be raised.
When data is read from disk in NumPy format, the output will be in the current system’s byteorder, regardless of how it is stored on disk. The exception is when an output buffer is supplied, in which case the output will be in the byteorder of that output buffer.
Changed in version 3.0: Added the out parameter.
The following methods automatically trigger actions when an Array instance is accessed in a special way (e.g. array[2:3,...,::2] will be equivalent to a call to array.__getitem__((slice(2, 3, None), Ellipsis, slice(None, None, 2)))).
Get a row, a range of rows or a slice from the array.
The set of tokens allowed for the key is the same as that for extended slicing in Python (including the Ellipsis or ... token). The result is an object of the current flavor; its shape depends on the kind of slice used as key and the shape of the array itself.
Furthermore, NumPy-style fancy indexing, where a list of indices in a certain axis is specified, is also supported. Note that only one list per selection is supported right now. Finally, NumPy-style point and boolean selections are supported as well.
Examples
array1 = array[4] # simple selection
array2 = array[4:1000:2] # slice selection
array3 = array[1, ..., ::2, 1:4, 4:] # general slice selection
array4 = array[1, [1,5,10], ..., -1] # fancy selection
array5 = array[np.where(array[:] > 4)] # point selection
array6 = array[array[:] > 4] # boolean selection
Iterate over the rows of the array.
This is equivalent to calling Array.iterrows() with default arguments, i.e. it iterates over all the rows in the array.
Examples
result = [row[2] for row in array]
Which is equivalent to:
result = [row[2] for row in array.iterrows()]
Set a row, a range of rows or a slice in the array.
It takes different actions depending on the type of the key parameter: if it is an integer, the corresponding array row is set to value (the value is broadcast when needed). If key is a slice, the row slice determined by it is set to value (as usual, if the slice to be updated exceeds the actual shape of the array, only the values in the existing range are updated).
If value is a multidimensional object, then its shape must be compatible with the shape determined by key, otherwise, a ValueError will be raised.
Furthermore, NumPy-style fancy indexing, where a list of indices in a certain axis is specified, is also supported. Note that only one list per selection is supported right now. Finally, NumPy-style point and boolean selections are supported as well.
Examples
a1[0] = 333 # assign an integer to a Integer Array row
a2[0] = 'b' # assign a string to a string Array row
a3[1:4] = 5 # broadcast 5 to slice 1:4
a4[1:4:2] = 'xXx' # broadcast 'xXx' to slice 1:4:2
# General slice update (a5.shape = (4,3,2,8,5,10).
a5[1, ..., ::2, 1:4, 4:] = numpy.arange(1728, shape=(4,3,2,4,3,6))
a6[1, [1,5,10], ..., -1] = arr # fancy selection
a7[np.where(a6[:] > 4)] = 4 # point selection + broadcast
a8[arr > 4] = arr2 # boolean selection
This class represents homogeneous datasets in an HDF5 file.
The difference between a CArray and a normal Array (see The Array class), from which it inherits, is that a CArray has a chunked layout and, as a consequence, it supports compression. You can use datasets of this class to easily save or load arrays to or from disk, with compression support included.
CArray includes all the instance variables and methods of Array. Only those with different behavior are mentioned here.
Parameters : | parentnode :
name : str
atom :
shape :
title :
filters :
chunkshape :
byteorder :
|
---|
Examples
See below a small example of the use of the CArray class. The code is available in examples/carray1.py:
import numpy
import tables
fileName = 'carray1.h5'
shape = (200, 300)
atom = tables.UInt8Atom()
filters = tables.Filters(complevel=5, complib='zlib')
h5f = tables.open_file(fileName, 'w')
ca = h5f.create_carray(h5f.root, 'carray', atom, shape,
filters=filters)
# Fill a hyperslab in ``ca``.
ca[10:60, 20:70] = numpy.ones((50, 50))
h5f.close()
# Re-open a read another hyperslab
h5f = tables.open_file(fileName)
print h5f
print h5f.root.carray[8:12, 18:22]
h5f.close()
The output for the previous script is something like:
carray1.h5 (File) ''
Last modif.: 'Thu Apr 12 10:15:38 2007'
Object Tree:
/ (RootGroup) ''
/carray (CArray(200, 300), shuffle, zlib(5)) ''
[[0 0 0 0]
[0 0 0 0]
[0 0 1 1]
[0 0 1 1]]
This class represents extendable, homogeneous datasets in an HDF5 file.
The main difference between an EArray and a CArray (see The CArray class), from which it inherits, is that the former can be enlarged along one of its dimensions, the enlargeable dimension. That means that the Leaf.extdim attribute (see Leaf) of any EArray instance will always be non-negative. Multiple enlargeable dimensions might be supported in the future.
New rows can be added to the end of an enlargeable array by using the EArray.append() method.
Parameters : | parentnode :
name : str
atom :
shape :
title :
filters :
expectedrows :
chunkshape :
byteorder :
|
---|
Examples
See below a small example of the use of the EArray class. The code is available in examples/earray1.py:
import tables
import numpy
fileh = tables.open_file('earray1.h5', mode='w')
a = tables.StringAtom(itemsize=8)
# Use ``a`` as the object type for the enlargeable array.
array_c = fileh.create_earray(fileh.root, 'array_c', a, (0,),
"Chars")
array_c.append(numpy.array(['a'*2, 'b'*4], dtype='S8'))
array_c.append(numpy.array(['a'*6, 'b'*8, 'c'*10], dtype='S8'))
# Read the string ``EArray`` we have created on disk.
for s in array_c:
print 'array_c[%s] => %r' % (array_c.nrow, s)
# Close the file.
fileh.close()
The output for the previous script is something like:
array_c[0] => 'aa'
array_c[1] => 'bbbb'
array_c[2] => 'aaaaaa'
array_c[3] => 'bbbbbbbb'
array_c[4] => 'cccccccc'
Add a sequence of data to the end of the dataset.
The sequence must have the same type as the array; otherwise a TypeError is raised. In the same way, the dimensions of the sequence must conform to the shape of the array, that is, all dimensions must match, with the exception of the enlargeable dimension, which can be of any length (even 0!). If the shape of the sequence is invalid, a ValueError is raised.
This class represents variable length (ragged) arrays in an HDF5 file.
Instances of this class represent array objects in the object tree with the property that their rows can have a variable number of homogeneous elements, called atoms. Like Table datasets (see The Table class), variable length arrays can have only one dimension, and the elements (atoms) of their rows can be fully multidimensional. VLArray objects do also support compression.
When reading a range of rows from a VLArray, you will always get a Python list of objects of the current flavor (each of them for a row), which may have different lengths.
This class provides methods to write or read data to or from variable length array objects in the file. Note that it also inherits all the public attributes and methods that Leaf (see The Leaf class) already provides.
Parameters : | parentnode :
name : str
atom :
title :
filters :
expectedrows :
chunkshape :
byteorder :
.. versionchanged:: 3.0 :
|
---|
Examples
See below a small example of the use of the VLArray class. The code is available in examples/vlarray1.py:
import tables
from numpy import *
# Create a VLArray:
fileh = tables.open_file('vlarray1.h5', mode='w')
vlarray = fileh.create_vlarray(fileh.root, 'vlarray1',
tables.Int32Atom(shape=()),
"ragged array of ints",
filters=tables.Filters(1))
# Append some (variable length) rows:
vlarray.append(array([5, 6]))
vlarray.append(array([5, 6, 7]))
vlarray.append([5, 6, 9, 8])
# Now, read it through an iterator:
print '-->', vlarray.title
for x in vlarray:
print '%s[%d]--> %s' % (vlarray.name, vlarray.nrow, x)
# Now, do the same with native Python strings.
vlarray2 = fileh.create_vlarray(fileh.root, 'vlarray2',
tables.StringAtom(itemsize=2),
"ragged array of strings",
filters=tables.Filters(1))
vlarray2.flavor = 'python'
# Append some (variable length) rows:
print '-->', vlarray2.title
vlarray2.append(['5', '66'])
vlarray2.append(['5', '6', '77'])
vlarray2.append(['5', '6', '9', '88'])
# Now, read it through an iterator:
for x in vlarray2:
print '%s[%d]--> %s' % (vlarray2.name, vlarray2.nrow, x)
# Close the file.
fileh.close()
The output for the previous script is something like:
--> ragged array of ints
vlarray1[0]--> [5 6]
vlarray1[1]--> [5 6 7]
vlarray1[2]--> [5 6 9 8]
--> ragged array of strings
vlarray2[0]--> ['5', '66']
vlarray2[1]--> ['5', '6', '77']
vlarray2[2]--> ['5', '6', '9', '88']
VLArray attributes
The instance variables below are provided in addition to those in Leaf (see The Leaf class).
An Atom (see The Atom class and its descendants) instance representing the type and shape of the atomic objects to be saved. You may use a pseudo-atom for storing a serialized object or variable length string per row.
The type of data object read from this leaf.
Please note that when reading several rows of VLArray data, the flavor only applies to the components of the returned Python list, not to the list itself.
On iterators, this is the index of the current row.
The current number of rows in the array.
The index of the enlargeable dimension (always 0 for vlarrays).
The HDF5 library does not include a function to determine size_on_disk for variable-length arrays. Accessing this attribute will raise a NotImplementedError.
The size of this array’s data in bytes when it is fully loaded into memory.
Note
When data is stored in a VLArray using the ObjectAtom type, it is first serialized using pickle, and then converted to a NumPy array suitable for storage in an HDF5 file. This attribute will return the size of that NumPy representation. If you wish to know the size of the Python objects after they are loaded from disk, you can use this ActiveState recipe.
Add a sequence of data to the end of the dataset.
This method appends the objects in the sequence to a single row in this array. The type and shape of individual objects must be compliant with the atoms in the array. In the case of serialized objects and variable length strings, the object or string to append is itself the sequence.
Get the enumerated type associated with this array.
If this array is of an enumerated type, the corresponding Enum instance (see The Enum class) is returned. If it is not of an enumerated type, a TypeError is raised.
Iterate over the rows of the array.
This method returns an iterator yielding an object of the current flavor for each selected row in the array.
If a range is not supplied, all the rows in the array are iterated upon. You can also use the VLArray.__iter__() special method for that purpose. If you only want to iterate over a given range of rows in the array, you may use the start, stop and step parameters.
Examples
for row in vlarray.iterrows(step=4):
print '%s[%d]--> %s' % (vlarray.name, vlarray.nrow, row)
Changed in version 3.0: If the start parameter is provided and stop is None then the array is iterated from start to the last line. In PyTables < 3.0 only one element was returned.
Get the next element of the array during an iteration.
The element is returned as a list of objects of the current flavor.
Get data in the array as a list of objects of the current flavor.
Please note that, as the lengths of the different rows are variable, the returned value is a Python list (not an array of the current flavor), with as many entries as specified rows in the range parameters.
The start, stop and step parameters can be used to select only a range of rows in the array. Their meanings are the same as in the built-in range() Python function, except that negative values of step are not allowed yet. Moreover, if only start is specified, then stop will be set to start + 1. If you do not specify neither start nor stop, then all the rows in the array are selected.
The following methods automatically trigger actions when a VLArray instance is accessed in a special way (e.g., vlarray[2:5] will be equivalent to a call to vlarray.__getitem__(slice(2, 5, None)).
Get a row or a range of rows from the array.
If key argument is an integer, the corresponding array row is returned as an object of the current flavor. If key is a slice, the range of rows determined by it is returned as a list of objects of the current flavor.
In addition, NumPy-style point selections are supported. In particular, if key is a list of row coordinates, the set of rows determined by it is returned. Furthermore, if key is an array of boolean values, only the coordinates where key is True are returned. Note that for the latter to work it is necessary that key list would contain exactly as many rows as the array has.
Examples
a_row = vlarray[4]
a_list = vlarray[4:1000:2]
a_list2 = vlarray[[0,2]] # get list of coords
a_list3 = vlarray[[0,-2]] # negative values accepted
a_list4 = vlarray[numpy.array([True,...,False])] # array of bools
Iterate over the rows of the array.
This is equivalent to calling VLArray.iterrows() with default arguments, i.e. it iterates over all the rows in the array.
Examples
result = [row for row in vlarray]
Which is equivalent to:
result = [row for row in vlarray.iterrows()]
Set a row, or set of rows, in the array.
It takes different actions depending on the type of the key parameter: if it is an integer, the corresponding table row is set to value (a record or sequence capable of being converted to the table structure). If key is a slice, the row slice determined by it is set to value (a record array or sequence of rows capable of being converted to the table structure).
In addition, NumPy-style point selections are supported. In particular, if key is a list of row coordinates, the set of rows determined by it is set to value. Furthermore, if key is an array of boolean values, only the coordinates where key is True are set to values from value. Note that for the latter to work it is necessary that key list would contain exactly as many rows as the table has.
Note
When updating the rows of a VLArray object which uses a pseudo-atom, there is a problem: you can only update values with exactly the same size in bytes than the original row. This is very difficult to meet with object pseudo-atoms, because pickle applied on a Python object does not guarantee to return the same number of bytes than over another object, even if they are of the same class. This effectively limits the kinds of objects than can be updated in variable-length arrays.
Examples
vlarray[0] = vlarray[0] * 2 + 3
vlarray[99] = arange(96) * 2 + 3
# Negative values for the index are supported.
vlarray[-99] = vlarray[5] * 2 + 3
vlarray[1:30:2] = list_of_rows
vlarray[[1,3]] = new_1_and_3_rows