From 9e2f3d59cf249403859916df9756c179753ea7e0 Mon Sep 17 00:00:00 2001 From: Thorsten Töpper Date: Sun, 10 Aug 2025 18:16:07 +0200 Subject: split_for_sort: Split a given file into buckets The target bucket is decided based on the first X characters of a line. The bucket name gets a prefix defined as argument and can be sorted faster on weak hardware. Note: This is just a split alternative. Real world usage in a shell script with a file in which the first 10 characters are the equal in each line, the following 2 bytes are evaluated for splitting: split_for_sort TMPSFS 12 raw_data.txt for f in TMPSFS ; do sort -o "${f}_sorted" -u "${f}" done \# Rely on the argument resolution to go with lexical order cat TMPSFS*_sorted > sorted_data.txt rm TMPSFS* --- .gitignore | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to '.gitignore') diff --git a/.gitignore b/.gitignore index c82077c..0047e56 100644 --- a/.gitignore +++ b/.gitignore @@ -1,4 +1,4 @@ -bin/* +out/* man/*.1 man/*.gz .gdbinit -- cgit v1.2.3-70-g09d2