Reservoir Based Sampling

I encountered a programming problem where a very large stream of data was coming through and I needed to get a decent sample of random values from the stream. I didn’t want to load it all in to memory so I opted for Reservoir based sampling. “Reservoir Sampling is an algorithm for sampling elements from a stream of data.” gregable.com. Using reservoir based sampling, I was able to efficiently return a set of random values pretty easily. Implemented with Java 8 Streams I was able to create a reusable generic sampler.

The results are below.

References

implementation