Dear R community, The next release of package ff is available on CRAN. With kind help of Brian Ripley it now supports the Win64 and Sun versions of R. It has three major functional enhancements: a) new fast in-memory sorting and ordering functions (single-threaded) b) ff now supports on-disk sorting and ordering of ff vectors and ffdf dataframes c) ff integer vectors now can be used as subscripts of ff vectors and ffdf dataframes a) is achieved by careful implementation of NA-handling and exploiting context information b) although permanently stored, sorting and ordering of ff objects can be faster than the standard routines in R c) applying an order to ff vectors and ffdf dataframes is substantially slower than in pure R because it involves disk-access AND sorting index positions (to avoid random access). There is still room for improvement, however, the current status should already be useful. I run some comparisons with SAS: - both could sort German census size (81e6 rows) on a 3GB notebook - ff sorts and orders faster on single columns - sorting big multicolumn-tables is faster in SAS Win64 binaries and version 2.2.1 supporting Sun should appear during the next days on CRAN. For the impatient: checkout from r-forge with revision 67 or higher. Non-Windows users: please note that you need to set appropriate values for options 'ffbatchbytes' and 'ffmaxbytes' yourself. Note that virtual window support is deprecated now because it leads to too complex code. Let us know if you urgently need this and why. Feedback, ideas and contributions appreciated. To those who offered code during the last months: please forgive us that integrating and documenting was not possible with this release. Jens & Daniel P.S. Below are some timings in seconds at 3e6, 9e6, 27e6 and 81e6 elements from a Lenovo 410s notebook (3GB RAM, i5 m520, 2 real cores, 4 hyperthreaded cores, SSD drive, Windows7 32bit) Legend for software ram: new in-ram inplace operations receiving enough RAM to optimize for speed, not for memory ff: new on-disk operations limiting RAM for this operation at ~500GB R: timings from standard sort() and order() SAS: timings from SAS 9.2 allowing for multithreaded sorting Legend for type of random data rboolean: bi-boolean with 50% FALSE and TRUE rlogical: tri-boolean with 33% NA, FALSE and TRUE rubyte: integers from 0..255 rbyte: 33% NA and 67% -127..127 rushort: integers from 0..65535 rshort: 33% NA and 67% -32767..32767 ruinteger: 50% NA and 50% integers rinteger: random integers rusingle: 50% NA and 50% singles rsingle: random singles rudouble: 50% NA and 50% doubles rdouble: doubles rfactor: factor with 64 levels of length 66 (being different at bytes 65 and 66) rchar: 64 strings of length 66 (being different at bytes 65 and 66) Legend for abbreviations OOM: out of memory OOD: out of disk NT: not timed because too slow NA: not available Results for sorting a single column ===================================== , , 3e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 0.02 0.03 0.02 0.04 0.02 0.02 0.17 0.11 0.66 0.36 0.66 0.36 0.03 NA ff 0.25 0.33 0.22 0.25 0.28 0.26 0.38 0.30 1.02 0.65 0.92 0.67 0.39 NA R NA 0.35 NA NA NA NA 0.83 0.54 NA NA 1.28 0.90 64.83 51.20 SAS NA NA NA NA NA NA 1.61 1.32 NA NA 1.57 1.29 NA 17.01 , , 9e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 0.04 0.07 0.03 0.08 0.03 0.07 0.50 0.31 1.88 0.97 1.87 0.97 0.04 NA ff 0.72 0.93 0.61 0.73 0.84 0.75 1.08 0.86 2.68 1.62 2.57 1.67 0.78 NA R NA 0.90 NA NA NA NA 2.84 1.78 NA NA 3.51 2.12 NA NT SAS NA NA NA NA NA NA 4.99 3.90 NA NA 4.91 4.48 NA 62.76 , , 27e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 0.10 0.24 0.09 0.23 0.11 0.23 1.58 1.00 6.06 3.15 6.00 3.23 0.16 NA ff 2.19 2.98 1.92 2.21 2.56 2.31 3.22 2.68 8.49 5.18 8.10 5.35 2.58 NA R NA 2.72 NA NA NA NA 9.69 5.80 NA NA 12.34 6.97 NA NT SAS NA NA NA NA NA NA 17.02 12.67 NA NA 17.05 14.07 NA 176.63 , , 81e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 0.27 0.67 0.28 0.67 0.33 0.72 5.58 3.23 NA NA NA NA 0.49 NA ff 6.56 9.06 5.93 6.88 8.52 7.15 10.70 8.54 51.35 28.98 70.20 44.13 7.91 NA R OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM SAS NA NA NA NA NA NA 61.45 44.94 NA NA 63.14 46.56 NA OOD Results for calculating the order on a single column ==================================================== , , 3e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 0.05 0.07 0.04 0.07 0.09 0.11 0.92 0.53 1.46 0.81 1.31 0.64 0.06 NA ff 0.14 0.19 0.77 0.58 0.87 0.67 1.04 0.60 1.66 0.81 1.43 0.85 0.74 NA R NA 3.23 NA NA NA NA 4.57 4.07 NA NA 5.27 4.61 4.59 193.75 SAS NA NA NA NA NA NA 1.86 1.48 NA NA 1.63 1.39 NA 16.83 , , 9e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 0.16 0.21 0.17 0.20 0.30 0.28 3.07 1.61 4.24 2.16 4.22 2.19 0.19 NA ff 0.48 0.51 2.45 1.84 2.91 2.15 3.38 1.92 4.72 2.48 4.54 2.45 1.91 NA R NA 12.31 NA NA NA NA 17.02 15.56 NA NA 16.96 15.47 NT NT SAS NA NA NA NA NA NA 6.71 5.97 NA NA 6.25 5.41 NA 59.27 , , 27e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 0.51 0.67 0.5 0.69 0.92 0.94 9.89 5.31 15.13 7.69 15.15 7.70 0.58 NA ff 1.33 1.51 7.6 5.77 9.25 6.79 10.72 6.12 15.98 8.53 15.96 8.92 5.80 NA R NA 46.37 NA NA NA NA 65.57 59.17 NA NA 63.74 58.37 NT NT SAS NA NA NA NA NA NA 21.41 18.77 NA NA 20.22 18.84 NA 182.74 , , 81e6 rboolean rlogical rubyte rbyte rushort rshort ruinteger rinteger rusingle rsingle rudouble rdouble rfactor rchar ram 1.49 2.03 1.5 2.06 3.15 2.98 34.33 17.89 NA NA NA NA 1.90 NT ff 3.98 4.65 22.9 17.42 30.33 21.82 36.68 20.36 77.16 49.55 125.01 59.27 17.39 NT R OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM OOM SAS NA NA NA NA NA NA 86.24 70.32 NA NA 84.40 68.66 NA NA Results for sorting all columns of a table with m columns of random double data (without NAs) ============================================================================================= , , 3e6 ncol 1 2 5 10 20 SAS 1.65 1.83 3.71 6.90 14.06 ff 1.97 2.37 3.75 6.21 10.86 R 4.70 5.67 5.65 6.46 8.06 , , 9e6 ncol 1 2 5 10 20 SAS 5.18 6.70 14.02 19.25 41.65 ff 6.38 7.96 12.12 19.58 45.43 R 18.86 19.20 20.58 OOM OOM , , 27e6 ncol 1 2 5 10 20 SAS 17.79 19.52 35.03 83.30 142.09 ff 22.68 25.79 46.25 87.55 157.62 R 65.56 OOM OOM OOM OOM , , 81e6 ncol 1 2 5 10 20 SAS 64.78 83.39 143.59 242.23 408.72 ff 167.52 220.03 324.03 502.42 884.03 R OOM OOM OOM OOM OOM