00:26:18  * dap_quit (Quit: Leaving.)
01:20:01  * ed209quit (Remote host closed the connection)
01:20:08  * ed209joined
07:23:43  * bahamatjoined
08:37:55  * bahamatquit (Quit: Leaving.)
08:42:33  * bahamatjoined
09:19:00  * bahamatquit (Quit: Leaving.)
09:24:57  * bahamatjoined
09:34:22  * manytreesquit (Read error: Connection reset by peer)
10:09:56  * manytreesjoined
10:20:02  * ed209quit (Remote host closed the connection)
10:20:09  * ed209joined
11:39:33  * bahamatquit (Quit: Leaving.)
11:59:48  * bahamatjoined
12:23:16  * marselljoined
13:03:10  * bahamatquit (Quit: Leaving.)
13:12:08  * pmonsonjoined
13:23:29  * bahamatjoined
13:54:03  * chorrelljoined
14:10:47  * chorrellquit (Quit: Textual IRC Client: www.textualapp.com)
14:43:51  * bahamatquit (Quit: Leaving.)
14:55:57  * bahamatjoined
15:28:15  * chorrelljoined
17:16:15  * _Tenchi_quit (Excess Flood)
17:18:10  * _Tenchi_joined
17:30:04  * bahamatquit (Quit: Leaving.)
18:21:20  * pmooneyquit (Quit: WeeChat 1.1.1)
18:35:55  * bahamatjoined
18:36:23  * pmooneyjoined
18:40:46  * bahamatquit (Client Quit)
19:44:15  * dap_joined
20:20:01  * ed209quit (Remote host closed the connection)
20:20:08  * ed209joined
20:21:46  * chorrellquit (Quit: Textual IRC Client: www.textualapp.com)
20:22:02  * testing123joined
20:22:34  * testing123quit (Client Quit)
20:29:57  * ChrisRjoined
20:32:02  <ChrisR>Hello, I've got a question about writing a manta compute job that I'm hoping someone can give me some feedback on.
20:33:06  <ChrisR>We are trying to implement a chunked upload for a site we are developing, and would like a job to "stitch" the file chunks back together
20:33:34  <ChrisR>I was able to get it working with this, but I'm not sure if it's the best/most efficient way:
20:34:05  <ChrisR>echo <path to chunk manifest file> | mjob create -m 'xargs mget | cat | mpipe <output object path>'
20:34:30  <ChrisR>(The manifest file is just a newline-delimited list of the chunk manta objects
20:35:22  <ChrisR>Mostly I am concerned about the use of mget. I tried a million variations on using mcat but couldn't get it to work because the stitched file would always have the chunks in a non-deterministic order.
20:35:36  <ChrisR>(due to map phases being shuffled, presumably)
20:37:13  <ChrisR>I want to take advantage of stitching the chunks back together 'in-situ', and I'm worried that mget may be unnecessarily copying the chunks around
20:37:37  <ChrisR>(our uploads will often be very large files)
20:38:28  <pmooney>Given that the multiple chunks will almost never be all on one physical machine, there's going to be network travel, regardless.
20:39:16  <ChrisR>I see.
20:39:50  <pmooney>The amount of optimization for the job is limited, given that the final order of the concatenation matters. (AFAIK)
20:40:16  <ChrisR>Right, so I can't fully take advantage of the map-reduce archtitecture.
20:41:17  <pmooney>The constraints on your reduce phase make it difficult, so to speak.
20:41:37  <pmooney>All the data needs to be in one place at a certain point.
20:42:36  <ChrisR>Does this solution seem reasonable?
20:51:51  <pmooney>At first glance, sure
20:52:15  <pmooney>The 'cat' in the middle seems redundant
20:52:55  <ChrisR>I'm a unix noob :)
20:53:10  <ChrisR>Mostly just wanted confirmation that I wasn't "doing it wrong"
20:54:06  <pmooney>It would also probably be valuable to ensure that any failure in the pipeline is properly communicated up. (Say that one of the 'mget's encounters a problem)
20:54:32  <ChrisR>How could I achieve that?
20:55:21  <nahamu>Uh... what's in the manifest file?
20:55:32  <pmooney>I'm not a great reference when it comes to rigor on the shell. You might want to inquire w/ bahamas10.
20:55:35  <nahamu>is it just a list of objects?
20:56:48  <ChrisR>Yes
20:57:14  <ChrisR>I eventually plan on rewriting all this with the REST API, so hopefully I can just check the http status
20:57:21  <nahamu>sure.
20:57:41  <nahamu>is there an upper limit on how big these things will be?
20:57:47  <nahamu>(the final objects)
20:58:00  <ChrisR>In theory, no. In practice, we could limit it.
20:58:11  <ChrisR>Does that have bearing?
20:58:27  <nahamu>in theory, yes, in practice, maybe not. ;)
20:58:55  <ChrisR>Right now we limit ourselves to 2GB uploads but we hope to eliminate that cap
20:59:18  <ChrisR>Part of our goal with Manta is 'infinite' scale
21:00:07  <nahamu>because mget can fail (remember, consistent, not available) you have to be careful that you actually retrieve each chunk before continuing...
21:01:04  <ChrisR>That might be tricky with xargs...
21:01:10  <nahamu>yup.
21:01:31  <nahamu>this is why size might some day matter.
21:02:29  <nahamu>a background job that is patiently and carefully pulling down chunk after chunk, stitching them together, then uploading the result into the final object might be annoying to write.
21:02:41  <ChrisR>It might be "OK" to fail the entire upload if one chunk mget were to fail, though obviously not ideal
21:03:22  <nahamu>is there a checksum being calculated on the sending side that you could use for verification?
21:03:31  <ChrisR>Yeah
21:03:39  <nahamu>oh, so you're in pretty good shape.
21:04:09  <nahamu>as long as you can verify that you got it right at the end you have lots of options.
21:05:01  <nahamu>is the manifest separated with newlines?
21:05:08  <ChrisR>Yeah
21:05:57  <nahamu>gah... it's after 5pm in my timezone... I only have a few more minutes.
21:06:16  <ChrisR>No worries, I can pop back in next week if needed
21:07:21  <nahamu>let's be simplistic: echo /user/stor/manifest | mjob create -m '< $MANTA_INPUT_FILE while read object; do <MAGIC HERE>; done;
21:07:37  <ChrisR>Yep
21:08:18  <nahamu>the MAGIC HERE part is where you "mget $object" into a local file, verify the return code, then append it to the final file.
21:08:44  <nahamu>if the mget failed you try again with perhaps a small bit of exponential backoff if you're feeling fancy.
21:08:57  <nahamu>and if you exceed a reasonable number of tries you fail the job.
21:09:16  <nahamu>and at the very end you verify the checksum.
21:09:49  <ChrisR>Sounds good. The trick for me is knowing how to put all this logic in a single job. I guess I can try using a shell script or something?
21:09:58  <nahamu>yes.
21:10:08  <nahamu>create a shell script and use it as an asset.
21:10:31  <ChrisR>OK, I'll start down this path. Thanks!
21:11:00  <nahamu>so mjob create would get whatever it is for an asset, I forget off the top of my head, and the map job turns into "/assets/path/to/it" or whatever.
21:11:04  <nahamu>good luck!
21:11:19  <nahamu>let us know if you get it working or run into something really hard.
21:11:29  <ChrisR>Will do.
21:16:37  * jayschmidt1joined
21:32:08  * ChrisRpart
21:42:16  * jayschmidt1quit (Quit: Leaving.)
23:06:47  * pmooneyquit (Quit: WeeChat 1.1.1)
23:18:48  * pmooneyjoined