[−][src]Crate shardio
Serialize large streams of Serialize
-able structs to disk from multiple threads, with a customizable on-disk sort order.
Data is written to sorted chunks. When reading shardio will merge the data on the fly into a single sorted view. You can
also procss disjoint subsets of sorted data.
#[macro_use] extern crate serde_derive; extern crate shardio; extern crate failure; use shardio::*; use std::fs::File; use failure::Error; #[derive(Clone, Eq, PartialEq, Serialize, Deserialize, PartialOrd, Ord, Debug)] struct DataStruct { a: u64, b: u32, } fn main() -> Result<(), Error> { let filename = "test.shardio"; { // Open a shardio output file // Parameters here control buffering, and the size of disk chunks // which affect how many disk chunks need to be read to // satisfy a range query when reading. // In this example the 'built-in' sort order given by #[derive(Ord)] // is used. let mut writer: ShardWriter<DataStruct> = ShardWriter::new(filename, 64, 256, 1<<16)?; // Get a handle to send data to the file let mut sender = writer.get_sender(); // Generate some test data for i in 0..(2 << 16) { sender.send(DataStruct { a: (i%25) as u64, b: (i%100) as u32 }); } // Write errors are accessible by calling the finish() method writer.finish()?; } // Open finished file & test chunked reads let reader = ShardReader::<DataStruct>::open(filename)?; let mut all_items = Vec::new(); // Shardio will divide the key space into 5 roughly equally sized chunks. // These chunks can be processed serially, in parallel in different threads, // or on different machines. let chunks = reader.make_chunks(5, &Range::all()); for c in chunks { // Iterate over all the data in chunk c. let mut range_iter = reader.iter_range(&c)?; for i in range_iter { all_items.push(i?); } } // Data will be return in sorted order let mut all_items_sorted = all_items.clone(); all_items.sort(); assert_eq!(all_items, all_items_sorted); Ok(()) }
Re-exports
pub use range::Range; |
Modules
helper | |
pmap | |
range |
Structs
DefaultSort | Marker struct for sorting types that implement |
MergeIterator | Iterator over merged shardio files |
RangeIter | Iterator of items from a single shardio reader |
ShardReader | Read from a collection of shardio files. The input data is merged to give
a single sorted view of the combined dataset. The input files must
be created with the same sort order |
ShardSender | A handle that is used to send data to a |
ShardWriter | Write a stream data items of type |
Traits
SortKey | Specify a key function from data items of type |