Your Daily Geekery

random lines from large files

ops

When working with big data taking samples is the only road to quick answers. Unfortunately that already posts a bigger hurdle than it should be. When you ask people how to get a random sample of lines from a file you most likely will get this as an answer:

cat file.txt | sort --random-sort | head -n 10

As you can imagine the 'sort' and big data do not mix that well. I found a couple scripts out there but none of them worked well enough. So I wrote my own script that picks random positions in the large file, seeks there and then moves to the next line marker.

lines-sample 10 file.txt

Simple and fast - even on big files.