What is a "data processing pipeline"?
I couldn't find a satisfying description on the web.
I have data stored on disk in files that are far too big to store in main memory.
I want to stream this data from the disk into a data processing pipeline via iconv, like this:
zcat myfile | iconv -f L1 -t UTF-8 | # rest of the pipeline goes here
Unfortunately, I'm seeing iconv buffer the entire file in memory until it's exhausted before outputting any data.
I have a matlab processing script located in the middle of a long processing pipeline running on linux.
The matlab script applies the same operation to a number N of datasets D_i (i=1,2,...,N) in parallel on (8 cores) via parfor.
Usually, processing the whole dataset takes about 2hours (on 8 cores).
Unfortunately, from time to time, looks like one of the matlab subprocesses crashes randomly.