aboutsummaryrefslogtreecommitdiff
path: root/src
Commit message (Collapse)AuthorAgeFilesLines
* rename output.h to trace_macros.hThorsten Töpper2025-08-313-3/+3
|
* split_for_sort: stdin mode flush output at endThorsten Töpper2025-08-301-0/+1
|
* split_for_sort: key colission with 63, switch to 128 listsThorsten Töpper2025-08-301-3/+5
|
* split_for_sort: performance improvementThorsten Töpper2025-08-301-36/+62
| | | | | | | | | In the background distribute the metadata across 63 lists instead of a single one. a-z A-Z 0-9 _ are the possible bytes on which decision via modulo happens. TODO: check whether this makes sense or wasting memory with 256 is more effective.
* split_for_sort: handle filename - as stdinThorsten Töpper2025-08-301-7/+125
|
* split_for_sort: Append mode implementedThorsten Töpper2025-08-291-16/+54
| | | | | | | There may be situations when not every input file is available at once, so those can't be handled in a single session. The append mode opens the files without overwriting the previous content so making the way the tool can be used in scripts more flexible.
* mem_internal_check: dump and map filter array onto/from FSThorsten Töpper2025-08-241-24/+221
|
* split_for_sort: set RLIMIT_NOFILE to maxThorsten Töpper2025-08-231-0/+25
|
* split_for_sort: make use of FS cacheThorsten Töpper2025-08-231-3/+0
|
* mem_internal_check: simplified towards qsort and bsearchThorsten Töpper2025-08-211-475/+156
|
* hex_conversion: introduce ishex_stringThorsten Töpper2025-08-201-2/+2
|
* mem_internal_check: switch reason for search cancelThorsten Töpper2025-08-201-1/+1
|
* mem_internal_check: alternative to tree_based_checkThorsten Töpper2025-08-191-0/+675
|
* tree_based_check: SPDX FlagThorsten Töpper2025-08-191-1/+4
|
* tree_based_check: fixes for debug build.Thorsten Töpper2025-08-141-11/+14
|
* tree_based_check: switch to unsigned valuesThorsten Töpper2025-08-141-11/+13
|
* tree_based_check: filter utility for hash listsThorsten Töpper2025-08-121-0/+684
|
* split_for_sort: add filename to warningThorsten Töpper2025-08-111-1/+1
|
* split_for_sort: switch from strncpy to memcpyThorsten Töpper2025-08-101-4/+4
|
* split_for_sort: Split a given file into bucketsThorsten Töpper2025-08-101-0/+367
The target bucket is decided based on the first X characters of a line. The bucket name gets a prefix defined as argument and can be sorted faster on weak hardware. Note: This is just a split alternative. Real world usage in a shell script with a file in which the first 10 characters are the equal in each line, the following 2 bytes are evaluated for splitting: split_for_sort TMPSFS 12 raw_data.txt for f in TMPSFS ; do sort -o "${f}_sorted" -u "${f}" done \# Rely on the argument resolution to go with lexical order cat TMPSFS*_sorted > sorted_data.txt rm TMPSFS*