Readit News logoReadit News
Posted by u/prirun 4 years ago
Use LC_ALL=C for big ASCII sorts
Comparison of sorting 100MB of email:

  [jim@mbp ~]$ dd if=gmail.mbox of=gmail.sort bs=1m count=100
  100+0 records in
  100+0 records out
  104857600 bytes transferred in 0.084819 secs (1236253643 bytes/sec)

  [jim@mbp ~]$ /usr/bin/time -l sort <gmail.sort >/dev/null
         28.58 real        28.21 user         0.26 sys
   480497664  maximum resident set size
      118887  page reclaims
       11071  involuntary context switches

  [jim@mbp ~]$ LC_ALL=C /usr/bin/time -l sort <gmail.sort >/dev/null
          1.49 real         1.37 user         0.11 sys
   188440576  maximum resident set size
       54690  page reclaims
        1029  involuntary context switches

rathel · 4 years ago
rlucas · 4 years ago
++ COLLATE "C" is your friend, particularly if you find yourself using a GUID or other text-y PK field.
thorin1 · 4 years ago
Same with grep.