random lines from large files
When working with big data taking samples is the only road to quick answers. Unfortunately that already posts a bigger hurdle than it should be. When you ask people how to get a random sample of lines from a file you most likely will get this as an answer:
cat file.txt | sort --random-sort | head -n 10
As you can imagine the 'sort' and big data do not mix that well. I found a couple scripts out there but none of them worked well enough. So I wrote my own script that picks random positions in the large file, seeks there and then moves to the next line marker.
lines-sample 10 file.txt
Simple and fast - even on big files.