# README to parallel-string-sorting This source code archive contains our test framework for parallel and sequential string sorting algorithms, as described in our technical report and paper "Parallel String Sample Sort". See the project web page <http://panthema.net/2013/parallel-string-sorting/> for more information. ## Source Code Overview The main algorithm implementations are in the directory "`src/*/`", with only "`src/tools/`" containing auxiliary headers. `src/sequential/` is a collection of old and new sequential string sorting algorithms, including the original multikey-quicksort from Bentley and Sedgewick, various older radix sorts from McIlroy, Bostic, and McIlroy, as well as Stefan Nilsson. The CRadix and LCPMergesort algorithms by Waihong Ng and Ranjan Sinha's original array and list burstsorts are also included. In the `src/sequential/` directory we also included sequential versions of super scalar string sample sort (S^5), which were developed to test the algorithms before parallelizing them. Tommi Rantala's huge library of original string sorting implementation is located in `src/rantala/`, including extra tools. Ranjan Sinha's copy-burstsort variants are located in `src/sinha-copy-burstsort`. These are heavily modified and connected to the framework via the glue.cc source file. All parallel string sorting implementations, including our fastest one, **pS^5**, is located in `src/parallel/`. The new algorithm with highest speedups "pS^5-Unroll" is located in `src/parallel/bingmann-parallel_sample_sortBTC.cc` Only one binary program `psstest` is built when compiling with default options, which contains all implementations in the collection. The main source code file `psstest.cc` includes most simple algorithms via header files. Further implementations are separated either via C++ namespaces or individual compilation units (`.o`s). The framework combines all algorithms into a "Contest" using the classes in `src/tools/contest.h`. An implementation registers a sorting function via macros. All registered implementation can be selected using the "`-a`" command line switch of psstest. Psstest is used to run speed tests on the various input instances and automatically checks the calculated result. ## Using psstest Psstest is the main program used to run, check and measure performance of the string sorting algorithms. Running the program without arguments will print a command line help. The algorithms implementation are selected using the "`-a`" switch. Run `-a list` for a (long) list of string sorting algorithms. For example `-a mkqs` will select all algorithms *containing* "mkqs". You can use `-a` multiple times to select different sets (multiple `-a` are logically inclusive). The input is selected by giving a file name or artificial random source name. Available random inputs are "`randomASCII`", "`random10`", "`random4`" and "`random255`", where the number specifies the alphabet size and ASCII is described in our paper. The program will automatically decompress files ending in "`.gz`", "`.bz2`", "`.xz`" and "`.lzo`" by spawning the appropriate decompressors as a child program. However, the *decompressed file size* must be added to the file name for this to be efficient: you must rename a compressed input file like `test.gz` to `test.12345.gz`, where 12345 is the decompressed file size! One can *limit* the input size (number of bytes) using `-s <size>`, where size can be expressed with suffixes like "512mb". The input strings can be saved using `--input` and the sorted output is saved with `--output` (otherwise it is discarded). For parallel string sorters, the number of threads is specified by one of the arguments: `--threads`, `--some-threads`, `--all-threads` or `--thread-list <#>`. Check the command line help for details. To isolate multiple algorithms runs, use the `--fork` argument, which will load data only once, or the `--datafork` parameter to reload the input in each forked child program. These can be combined with `--timeout` to abort a child program after the specified time. ## Building psstest To build the program a recent gcc C++ compiler, "cmake" version 2.8 or higher, and the GNU GMP library are required. Having "parallel-string-sorting-X.Y.Z.tar" unpacked in the current directory, the following commands will compile and run the psstest program: ``` $ mkdir build $ cd build $ cmake -DCMAKE_BUILD_TYPE=Release .. $ make $ cd src/ $ ./psstest ``` If the cmake configuration routine does not finish successfully, you may need to install some development packages via your Linux distribution's package manager. Here is a simple example to sort random number strings (10 MiB total chars) using pS^5-Unroll, and write them unsorted and sorted to files: ``` $ ./psstest -a parallel_sample_sortBTCU2 -s 10mb -i unsorted.txt -o sorted.txt random10 ``` ## Further Information We have collected source code, papers, some test instances, and further information at <http://panthema.net/2013/parallel-string-sorting/> ## Exits Written 2013-05-02 by Timo Bingmann ## Appendix: List of String Sorting Implementations ``` Running parallel-string-sorting test on i10host Called as ./psstest -a list Available string sorting algorithms: akiba/parallel_radix_sort Parallel MSD radix sort by Takuya Akiba bingmann/msd_CE bingmann/msd_CE (rantala CE original) bingmann/msd_CE2 bingmann/msd_CE2 (CE with reused prefix sum) bingmann/msd_CE3 bingmann/msd_CE3 (CE2 with iterators) bingmann/msd_CE_nr bingmann/msd_CE_nr (CE non-recursive) bingmann/msd_CE_nr2 bingmann/msd_CE_nr2 (CE non-recursive) bingmann/msd_CI bingmann/msd_CI (rantala CI original with oracle) bingmann/msd_CI2 bingmann/msd_CI2 (CI without oracle) bingmann/msd_CI3 bingmann/msd_CI3 (CI2 with swap operations) bingmann/msd_CI4 bingmann/msd_CI4 (CI3 with swap cache) bingmann/msd_CI5 bingmann/msd_CI5 (CI4 with charcache) bingmann/msd_CI5_16bit bingmann/msd_CI5_16bit (CI5 with 16-bit radix) bingmann/msd_CI_nr bingmann/msd_CI_nr (CI non-recursive) bingmann/msd_CI_nr2 bingmann/msd_CI_nr2 (CI non-recursive) bingmann/msd_CI_nr3 bingmann/msd_CI_nr3 (CI non-recursive, charcache) bingmann/parallel_mkqs Parallel MKQS with blocks and cache8 bingmann/parallel_radix_sort_16bit Parallel MSD Radix sort with load balancing, 16-bit BigSorts bingmann/parallel_radix_sort_8bit Parallel MSD Radix sort with load balancing, 8-bit BigSorts bingmann/parallel_sample_sortBSC bingmann/parallel_sample_sortBSC: binary search, bktcache bingmann/parallel_sample_sortBTC bingmann/parallel_sample_sortBTC: binary tree, bktcache bingmann/parallel_sample_sortBTCE bingmann/parallel_sample_sortBTCE: binary tree, equality, bktcache bingmann/parallel_sample_sortBTCEU1 bingmann/parallel_sample_sortBTCEU1: binary tree, equality, bktcache, unroll tree bingmann/parallel_sample_sortBTCU1 bingmann/parallel_sample_sortBTCU1: binary tree, bktcache, unroll tree bingmann/parallel_sample_sortBTCU2 bingmann/parallel_sample_sortBTCU2: binary tree, bktcache, unroll tree and strings bingmann/sample_sortBS bingmann/sample_sortBS (binary search, no cache) bingmann/sample_sortBSC bingmann/sample_sortBSC (binary search, bkt cache) bingmann/sample_sortBSCA bingmann/sample_sortBSCA (binary search, assembler CMOV, bkt cache) bingmann/sample_sortBSCE bingmann/sample_sortBSCE (binary search equal, bkt cache) bingmann/sample_sortBSCEA bingmann/sample_sortBSCEA (binary search equal, assembler CMOV, bkt cache) bingmann/sample_sortBT bingmann/sample_sortBT (binary tree, no cache) bingmann/sample_sortBTC bingmann/sample_sortBTC (binary tree, bkt cache) bingmann/sample_sortBTCA bingmann/sample_sortBTCA (binary tree, asm CMOV, bkt cache) bingmann/sample_sortBTCE2 bingmann/sample_sortBTCE2 (binary tree equal, bkt cache) bingmann/sample_sortBTCE2A bingmann/sample_sortBTCE2A (binary tree equal, asm CMOV, bkt cache) bingmann/sample_sortBTCE2U bingmann/sample_sortBTCE2U (binary tree equal unroll, asm CMOV, bkt cache) bingmann/sample_sortBTCE3 bingmann/sample_sortBTCE3 (adapt binary tree equal, bkt cache) bingmann/sample_sortBTCE3A bingmann/sample_sortBTCE3A (adapt binary tree equal, asm CMOV, bkt cache) bingmann/sample_sortBTCE3U bingmann/sample_sortBTCE3U (adapt binary tree equal unroll, asm CMOV, bkt cache) bingmann/sample_sortBTCU bingmann/sample_sortBTCU (binary tree, unrolled, bkt cache) bingmann/sample_sortRBTCE bingmann/sample_sortRBTCE (adapt binary tree equal, bkt cache) bingmann/sample_sortRBTCEA bingmann/sample_sortRBTCEA (adapt binary tree equal, asm CMOV, bkt cache) bingmann/sequential_mkqs_cache8 multikey_cache with 8byte cache (non-recursive) bs/mkqsort bs_mkqs Original Multikey-Quicksort mbm/radixsort MSD Radix Sort by P. M. McIlroy, K. Bostic, and M. D. McIlroy ng/cradix CRadix Original by Waihong Ng and Katsuhiko Kakehi ng/lcpmergesort LCP-Mergesort Original by Waihong Ng and Katsuhiko Kakehi ng/rantala_cradix CRadix by Waihong Ng and Katsuhiko Kakehi modified by Rantala nilsson/adaptive_msd Adaptive MSD Radix Sort by Stefan Nilsson nilsson/forward16 Forward Radix Sort 16-bit by Stefan Nilsson nilsson/forward8 Forward Radix Sort 8-bit by Stefan Nilsson rantala/burstsort2_bagwell burstsort2 with vector_bagwell bucket type rantala/burstsort2_brodnik burstsort2 with vector_brodnik bucket type rantala/burstsort2_sampling_bagwell burstsort2 sampling with vector_bagwell bucket type rantala/burstsort2_sampling_brodnik burstsort2 sampling with vector_brodnik bucket type rantala/burstsort2_sampling_superalphabet_bagwell burstsort2 sampling superalphabet with vector_bagwell bucket type rantala/burstsort2_sampling_superalphabet_brodnik burstsort2 sampling superalphabet with vector_brodnik bucket type rantala/burstsort2_sampling_superalphabet_vector burstsort2 sampling superalphabet with std::vector bucket type rantala/burstsort2_sampling_superalphabet_vector_block burstsort2 sampling superalphabet with vector_block bucket type rantala/burstsort2_sampling_vector burstsort2 sampling with std::vector bucket type rantala/burstsort2_sampling_vector_block burstsort2 sampling with vector_block bucket type rantala/burstsort2_superalphabet_bagwell burstsort2 superalphabet with vector_bagwell bucket type rantala/burstsort2_superalphabet_brodnik burstsort2 superalphabet with vector_brodnik bucket type rantala/burstsort2_superalphabet_vector burstsort2 superalphabet with std::vector bucket type rantala/burstsort2_superalphabet_vector_block burstsort2 superalphabet with vector_block bucket type rantala/burstsort2_vector burstsort2 with std::vector bucket type rantala/burstsort2_vector_block burstsort2 with vector_block bucket type rantala/burstsort_bagwell burstsort with vector_bagwell bucket type rantala/burstsort_brodnik burstsort with vector_brodnik bucket type rantala/burstsort_mkq_recursiveburst_1 burstsort_mkq 1byte alphabet with recursiveburst rantala/burstsort_mkq_recursiveburst_2 burstsort_mkq 2byte alphabet with recursiveburst rantala/burstsort_mkq_recursiveburst_4 burstsort_mkq 4byte alphabet with recursiveburst rantala/burstsort_mkq_simpleburst_1 burstsort_mkq 1byte alphabet with simpleburst rantala/burstsort_mkq_simpleburst_2 burstsort_mkq 2byte alphabet with simpleburst rantala/burstsort_mkq_simpleburst_4 burstsort_mkq 4byte alphabet with simpleburst rantala/burstsort_sampling_bagwell burstsort sampling with vector_bagwell bucket type rantala/burstsort_sampling_brodnik burstsort sampling with vector_brodnik bucket type rantala/burstsort_sampling_superalphabet_bagwell burstsort sampling superalphabet with vector_bagwell bucket type rantala/burstsort_sampling_superalphabet_brodnik burstsort sampling superalphabet with vector_brodnik bucket type rantala/burstsort_sampling_superalphabet_vector burstsort sampling superalphabet with std::vector bucket type rantala/burstsort_sampling_superalphabet_vector_block burstsort sampling superalphabet with vector_block bucket type rantala/burstsort_sampling_vector burstsort sampling with std::vector bucket type rantala/burstsort_sampling_vector_block burstsort sampling with vector_block bucket type rantala/burstsort_superalphabet_bagwell burstsort superalphabet with vector_bagwell bucket type rantala/burstsort_superalphabet_brodnik burstsort superalphabet with vector_brodnik bucket type rantala/burstsort_superalphabet_vector burstsort superalphabet with std::vector bucket type rantala/burstsort_superalphabet_vector_block burstsort superalphabet with vector_block bucket type rantala/burstsort_vector burstsort with std::vector bucket type rantala/burstsort_vector_block burstsort with vector_block bucket type rantala/funnelsort_128way_bfs funnelsort 128way bfs rantala/funnelsort_128way_dfs funnelsort 128way dfs rantala/funnelsort_16way_bfs funnelsort 16way bfs rantala/funnelsort_16way_dfs funnelsort 16way dfs rantala/funnelsort_32way_bfs funnelsort 32way bfs rantala/funnelsort_32way_dfs funnelsort 32way dfs rantala/funnelsort_64way_bfs funnelsort 64way bfs rantala/funnelsort_64way_dfs funnelsort 64way dfs rantala/funnelsort_8way_bfs funnelsort 8way bfs rantala/funnelsort_8way_dfs funnelsort 8way dfs rantala/mergesort_2way mergesort with 2way merger rantala/mergesort_2way_parallel mergesort_parallel with 2way merger rantala/mergesort_2way_unstable mergesort 2way unstable rantala/mergesort_3way mergesort with 3way merger rantala/mergesort_3way_parallel mergesort_parallel with 3way merger rantala/mergesort_3way_unstable mergesort 3way unstable rantala/mergesort_4way mergesort with 4way merger rantala/mergesort_4way_parallel mergesort_parallel with 4way merger rantala/mergesort_4way_unstable mergesort 4way unstable rantala/mergesort_cache1_lcp_2way mergesort LCP with 2way merger and 1byte cache rantala/mergesort_cache1_lcp_2way_parallel mergesort_parallel LCP with 2way merger and 1byte cache rantala/mergesort_cache2_lcp_2way mergesort LCP with 2way merger and 2byte cache rantala/mergesort_cache2_lcp_2way_parallel mergesort_parallel LCP with 2way merger and 2byte cache rantala/mergesort_cache4_lcp_2way mergesort LCP with 2way merger and 4byte cache rantala/mergesort_cache4_lcp_2way_parallel mergesort_parallel LCP with 2way merger and 4byte cache rantala/mergesort_lcp_2way mergesort LCP with 2way merger rantala/mergesort_lcp_2way_parallel mergesort_parallel LCP with 2way merger rantala/mergesort_lcp_2way_unstable mergesort Unstable LCP with 2way merger rantala/mergesort_lcp_2way_unstable_parallel mergesort_parallel unstable LCP with 2way merger rantala/mergesort_losertree_1024way mergesort 1024way loser tree based rantala/mergesort_losertree_1024way_parallel mergesort parallel 1024way loser tree based rantala/mergesort_losertree_128way mergesort 128way loser tree based rantala/mergesort_losertree_128way_parallel mergesort parallel 128way loser tree based rantala/mergesort_losertree_256way mergesort 256way loser tree based rantala/mergesort_losertree_256way_parallel mergesort parallel 256way loser tree based rantala/mergesort_losertree_512way mergesort 512way loser tree based rantala/mergesort_losertree_512way_parallel mergesort parallel 512way loser tree based rantala/mergesort_losertree_64way mergesort 64way loser tree based rantala/mergesort_losertree_64way_parallel mergesort parallel 64way loser tree based rantala/msd_A msd_A rantala/msd_A2 msd_A2 rantala/msd_A2_adaptive msd_A2_adaptive rantala/msd_A_adaptive msd_A_adaptive rantala/msd_CE0 msd_CE0: baseline rantala/msd_CE1 msd_CE1: oracle rantala/msd_CE2 msd_CE2: oracle+loop fission rantala/msd_CE3 msd_CE3: oracle+loop fission+adaptive rantala/msd_CE4 msd_CE4: oracle+loop fission+adaptive+16bit counter rantala/msd_CE5 msd_CE5: oracle+loop fission+adaptive+16bit counter+prealloc rantala/msd_CE6 msd_CE6: oracle+loop fission+adaptive+16bit counter+prealloc+unroll rantala/msd_CE7 msd_CE7: oracle+loop fission+adaptive+16bit counter+prealloc+unroll+sortedness rantala/msd_CI msd_CI rantala/msd_CI_adaptive msd_CI: adaptive rantala/msd_DB msd_DB rantala/msd_D_std_deque msd_D_std_deque rantala/msd_D_std_deque_adaptive msd_D_std_deque_adaptive rantala/msd_D_std_list msd_D_std_list rantala/msd_D_std_list_adaptive msd_D_std_list_adaptive rantala/msd_D_std_vector msd_D_std_vector rantala/msd_D_std_vector_adaptive msd_D_std_vector_adaptive rantala/msd_D_vector_bagwell msd_D_vector_bagwell rantala/msd_D_vector_bagwell_adaptive msd_D_vector_bagwell_adaptive rantala/msd_D_vector_block msd_D_vector_block rantala/msd_D_vector_block_adaptive msd_D_vector_block_adaptive rantala/msd_D_vector_brodnik msd_D_vector_brodnik rantala/msd_D_vector_brodnik_adaptive msd_D_vector_brodnik_adaptive rantala/msd_D_vector_malloc msd_D_vector_malloc rantala/msd_D_vector_malloc_adaptive msd_D_vector_malloc_adaptive rantala/msd_D_vector_malloc_counter_clear msd_D_vector_malloc_counter_clear rantala/msd_D_vector_malloc_counter_clear_adaptive msd_D_vector_malloc_counter_clear_adaptive rantala/msd_D_vector_realloc msd_D_vector_realloc rantala/msd_D_vector_realloc_adaptive msd_D_vector_realloc_adaptive rantala/msd_D_vector_realloc_counter_clear msd_D_vector_realloc_counter_clear rantala/msd_D_vector_realloc_counter_clear_adaptive msd_D_vector_realloc_counter_clear_adaptive rantala/msd_D_vector_realloc_shrink_clear msd_D_vector_realloc_shrink_clear rantala/msd_D_vector_realloc_shrink_clear_adaptive msd_D_vector_realloc_shrink_clear_adaptive rantala/multikey_block1 multikey_block with 1byte alphabet rantala/multikey_block2 multikey_block with 2byte alphabet rantala/multikey_block4 multikey_block with 4byte alphabet rantala/multikey_cache4 multikey_cache with 4byte cache rantala/multikey_cache8 multikey_cache with 8byte cache rantala/multikey_dynamic_bagwell1 multikey_dynamic with vector_bagwell bucket type and 1byte alphabet rantala/multikey_dynamic_bagwell2 multikey_dynamic with vector_bagwell bucket type and 2byte alphabet rantala/multikey_dynamic_bagwell4 multikey_dynamic with vector_bagwell bucket type and 4byte alphabet rantala/multikey_dynamic_brodnik1 multikey_dynamic with vector_brodnik bucket type and 1byte alphabet rantala/multikey_dynamic_brodnik2 multikey_dynamic with vector_brodnik bucket type and 2byte alphabet rantala/multikey_dynamic_brodnik4 multikey_dynamic with vector_brodnik bucket type and 4byte alphabet rantala/multikey_dynamic_vector1 multikey_dynamic with std::vector bucket type and 1byte alphabet rantala/multikey_dynamic_vector2 multikey_dynamic with std::vector bucket type and 2byte alphabet rantala/multikey_dynamic_vector4 multikey_dynamic with std::vector bucket type and 4byte alphabet rantala/multikey_dynamic_vector_block1 multikey_dynamic with vector_block bucket type and 1byte alphabet rantala/multikey_dynamic_vector_block2 multikey_dynamic with vector_block bucket type and 2byte alphabet rantala/multikey_dynamic_vector_block4 multikey_dynamic with vector_block bucket type and 4byte alphabet rantala/multikey_multipivot_brute_simd1 multikey_multipivot brute_simd with 1byte alphabet rantala/multikey_multipivot_brute_simd2 multikey_multipivot brute_simd with 2byte alphabet rantala/multikey_multipivot_brute_simd4 multikey_multipivot brute_simd with 4byte alphabet rantala/multikey_simd1 multikey_simd with 1byte alphabet rantala/multikey_simd2 multikey_simd with 2byte alphabet rantala/multikey_simd4 multikey_simd with 4byte alphabet rantala/multikey_simd_parallel1 multikey_simd_parallel with 1byte alphabet rantala/multikey_simd_parallel2 multikey_simd_parallel with 2byte alphabet rantala/multikey_simd_parallel4 multikey_simd_parallel with 4byte alphabet shamsundar/lcp-merge-string-sort Parallelized LCP Merge sort by N. Shamsundar sinha/CPL_burstsort Original CPL-burstsort sinha/CP_burstsort Original CP-burstsort sinha/C_burstsort Original C-burstsort sinha/burstsortA burstsortA Original Burstsort with arrays sinha/burstsortL burstsortL Original Burstsort with linked-lists sinha/fbCPL_burstsort Original fbCPL-burstsort sinha/fbCP_burstsort Original fbCP-burstsort sinha/fbC_burstsort Original fbC-burstsort sinha/sCPL_burstsort Original sCPL-burstsort sinha/sCP_burstsort Original sCP-burstsort sinha/sC_burstsort Original sC-burstsort ```