| Commit message (Collapse) | Author | Age | Files | Lines |
| | |
|
| | |
|
| | |
|
| |
|
|
|
|
|
|
|
| |
In the background distribute the metadata across 63 lists instead of a
single one. a-z A-Z 0-9 _ are the possible bytes on which decision via
modulo happens.
TODO: check whether this makes sense or wasting memory with 256 is more
effective.
|
| | |
|
| |
|
|
|
|
|
| |
There may be situations when not every input file is available at once,
so those can't be handled in a single session. The append mode opens the
files without overwriting the previous content so making the way the
tool can be used in scripts more flexible.
|
| | |
|
| | |
|
| | |
|
| | |
|
|
|
The target bucket is decided based on the first X characters of a line.
The bucket name gets a prefix defined as argument and can be sorted
faster on weak hardware. Note: This is just a split alternative.
Real world usage in a shell script with a file in which the first 10
characters are the equal in each line, the following 2 bytes are
evaluated for splitting:
split_for_sort TMPSFS 12 raw_data.txt
for f in TMPSFS ; do
sort -o "${f}_sorted" -u "${f}"
done
\# Rely on the argument resolution to go with lexical order
cat TMPSFS*_sorted > sorted_data.txt
rm TMPSFS*
|