00:05:35  * isaacstopic: Manta: Big Data Unix | http://apidocs.joyent.com/manta/ | http://logs.libuv.org/manta/latest
00:20:53  * lloydde_quit (Remote host closed the connection)
00:50:08  * AvianFlujoined
00:54:11  * therealkoopaquit (Remote host closed the connection)
00:55:27  * therealkoopajoined
00:56:56  * abraxasjoined
01:00:13  * therealkoopaquit (Ping timeout: 265 seconds)
01:01:36  * abraxasquit (Ping timeout: 252 seconds)
01:04:21  * irajoined
01:09:45  * nfitchquit (Quit: Leaving.)
01:10:01  * paulfryzelquit (Remote host closed the connection)
01:10:20  * lloyddejoined
01:12:48  * fredkquit (Quit: Leaving.)
01:13:53  * fredkjoined
01:18:39  * fredkquit (Ping timeout: 252 seconds)
01:20:00  * ed209quit (Remote host closed the connection)
01:20:07  * ed209joined
01:22:40  * chorrelljoined
01:24:21  * chorrellquit (Client Quit)
01:24:33  * chorrelljoined
01:24:56  * therealkoopajoined
01:25:34  * chorrellquit (Client Quit)
01:25:48  * chorrelljoined
01:30:03  * therealkoopaquit (Ping timeout: 272 seconds)
01:39:09  * lloyddequit (Remote host closed the connection)
01:41:05  * paulfryzeljoined
01:41:32  * abraxasjoined
01:45:20  * __rockbot__quit (Quit: __rockbot__)
01:45:39  * paulfryzelquit (Ping timeout: 265 seconds)
01:49:46  * chorrellchanged nick to chorrell-away
01:50:37  * notmattquit (Remote host closed the connection)
01:50:58  * abraxasquit (Ping timeout: 265 seconds)
01:51:13  * notmattjoined
01:54:12  * chorrelljoined
01:55:30  * notmattquit (Ping timeout: 252 seconds)
01:56:27  * chorrell-awayquit (Quit: Textual IRC Client: www.textualapp.com)
02:02:34  * dap_quit (Quit: Leaving.)
02:25:45  * therealkoopajoined
02:29:57  * therealkoopaquit (Ping timeout: 248 seconds)
02:41:45  * paulfryzeljoined
02:46:06  * paulfryzelquit (Ping timeout: 252 seconds)
02:53:31  * chorrellquit (Quit: Textual IRC Client: www.textualapp.com)
03:04:12  * notmattjoined
03:08:38  * abraxasjoined
03:09:16  * notmattquit (Ping timeout: 265 seconds)
03:12:26  * notmattjoined
03:37:44  * paulfryzeljoined
03:37:50  * iraquit (Quit: Connection terminated.)
03:40:39  * therealkoopajoined
03:42:12  * paulfryzelquit (Ping timeout: 252 seconds)
03:45:03  * therealkoopaquit (Ping timeout: 265 seconds)
03:47:10  * papajuansjoined
03:50:06  * papajuansquit (Remote host closed the connection)
04:17:33  * abraxasquit (Remote host closed the connection)
04:32:32  * notmattquit (Remote host closed the connection)
04:33:05  * notmattjoined
04:37:45  * notmattquit (Ping timeout: 252 seconds)
04:38:28  * paulfryzeljoined
04:42:42  * paulfryzelquit (Ping timeout: 252 seconds)
05:12:03  * notmattjoined
05:17:13  * notmattquit (Remote host closed the connection)
05:17:48  * notmattjoined
05:22:40  * notmattquit (Ping timeout: 265 seconds)
05:30:23  * notmattjoined
05:39:19  * paulfryzeljoined
05:43:45  * paulfryzelquit (Ping timeout: 252 seconds)
05:48:48  * notmattquit (Remote host closed the connection)
05:53:34  * abraxasjoined
05:58:15  * therealkoopajoined
06:02:47  * therealkoopaquit (Ping timeout: 265 seconds)
06:30:26  * abraxas_joined
06:35:31  * abraxas_quit (Remote host closed the connection)
06:36:06  * abraxas_joined
06:38:44  * abraxasquit (*.net *.split)
06:40:11  * paulfryzeljoined
06:41:01  * abraxas_quit (Ping timeout: 272 seconds)
06:44:48  * paulfryzelquit (Ping timeout: 252 seconds)
06:48:57  * bsdguruquit (Remote host closed the connection)
06:51:44  * therealkoopajoined
06:58:51  * therealkoopaquit (Ping timeout: 265 seconds)
06:58:54  * abraxasjoined
07:00:10  * notmattjoined
07:05:05  * notmattquit (Ping timeout: 272 seconds)
07:32:06  * jperkinjoined
07:40:47  * paulfryzeljoined
07:45:04  * paulfryzelquit (Ping timeout: 250 seconds)
08:14:01  * AvianFluquit (Remote host closed the connection)
08:31:47  * mamashjoined
08:34:56  * abraxasquit (Remote host closed the connection)
08:40:34  * daviddiasjoined
08:42:15  * daviddiasquit (Read error: Connection reset by peer)
08:42:28  * daviddiasjoined
08:53:32  * therealkoopajoined
08:57:26  * mamashpart
08:57:50  * therealkoopaquit (Ping timeout: 246 seconds)
09:00:16  * mamashjoined
09:00:20  * abraxasjoined
09:42:21  * paulfryzeljoined
09:46:51  * paulfryzelquit (Ping timeout: 245 seconds)
10:28:38  * daviddiasquit (Read error: Connection reset by peer)
10:43:07  * paulfryzeljoined
10:47:29  * paulfryzelquit (Ping timeout: 250 seconds)
10:49:48  * irajoined
10:55:09  * therealkoopajoined
10:59:21  * therealkoopaquit (Ping timeout: 245 seconds)
11:12:57  * abraxasquit (Remote host closed the connection)
11:15:00  * abraxasjoined
11:20:24  * marselljoined
11:21:34  * abraxasquit (Remote host closed the connection)
11:28:57  * daviddiasjoined
11:34:12  * daviddiasquit (Remote host closed the connection)
11:34:22  * therealkoopajoined
11:34:41  * daviddiasjoined
11:40:29  * therealkoopaquit (Ping timeout: 272 seconds)
11:41:29  * therealkoopajoined
11:43:57  * paulfryzeljoined
11:45:50  * therealkoopaquit (Ping timeout: 246 seconds)
11:48:09  * paulfryzelquit (Ping timeout: 250 seconds)
11:49:38  * therealkoopajoined
11:54:25  * therealkoopaquit (Ping timeout: 272 seconds)
12:14:54  * mamashpart
12:44:34  * paulfryzeljoined
12:48:56  * paulfryzelquit (Ping timeout: 245 seconds)
12:59:30  * therealkoopajoined
13:03:53  * therealkoopaquit (Ping timeout: 246 seconds)
13:22:47  * therealkoopajoined
13:47:49  * AvianFlujoined
13:48:15  * mamashjoined
14:18:16  * mamashpart
15:39:38  * bsdgurujoined
15:47:03  * paulfryzeljoined
15:51:16  * paulfryzelquit (Ping timeout: 250 seconds)
15:53:05  * notmattjoined
15:59:35  * notmattquit (Remote host closed the connection)
16:00:09  * notmattjoined
16:05:13  * notmattquit (Ping timeout: 272 seconds)
16:10:33  * paulfryzeljoined
16:28:01  * bsdguruquit (Quit: bsdguru)
16:30:25  * bsdgurujoined
16:37:52  * paulfryzelquit (Read error: Connection reset by peer)
16:38:06  * paulfryzeljoined
16:44:39  * daviddia_joined
16:44:42  * daviddiasquit (Read error: Connection reset by peer)
16:50:10  * nfitchjoined
16:57:58  * __rockbot__joined
16:59:37  * lloyddejoined
17:04:45  * __rockbot__quit (Quit: __rockbot__)
17:09:29  * fredkjoined
17:11:49  * daviddia_quit (Remote host closed the connection)
17:12:18  * daviddiasjoined
17:13:10  * therealkoopaquit (Remote host closed the connection)
17:13:15  * __rockbot__joined
17:13:41  * bsdguruquit (Quit: bsdguru)
17:17:25  * daviddiasquit (Ping timeout: 272 seconds)
17:21:23  * dap_joined
17:24:45  * abraxasjoined
17:29:27  * abraxasquit (Ping timeout: 272 seconds)
17:36:35  * notmattjoined
17:37:24  * bsdgurujoined
17:41:01  * daviddiasjoined
17:45:36  * daviddiasquit (Ping timeout: 245 seconds)
17:54:46  * AvianFluquit (Remote host closed the connection)
18:02:44  * therealkoopajoined
18:22:35  * daviddiasjoined
18:22:40  * daviddiasquit (Remote host closed the connection)
18:23:20  * daviddiasjoined
18:58:49  * lloyddequit (Write error: Broken pipe)
18:58:56  * lloyddejoined
18:59:33  * dap_quit (Quit: Leaving.)
19:03:49  * lloyddequit (Ping timeout: 272 seconds)
19:11:18  * AvianFlujoined
19:11:18  * daviddiasquit (Remote host closed the connection)
19:11:47  * daviddiasjoined
19:15:19  * wanelojoined
19:16:01  * daviddiasquit (Ping timeout: 245 seconds)
19:25:32  * abraxasjoined
19:32:27  * abraxasquit (Ping timeout: 363 seconds)
19:33:26  * dap_joined
19:42:34  <wanelo>Is there a way to have 1 reduce phase produce multiple objects for use in N map phases?
19:42:53  * __rockbot__quit (Quit: __rockbot__)
19:43:14  <wanelo>We want to take a file with 10 lines, say, and split it into 10 jobs with one line per job. The goal is to download objects into manta from Amazon S3.
19:44:22  <mcavage_>wanelo: see msplit and mcat
19:44:39  <wanelo>https://gist.github.com/hjhart/8810869
19:44:57  <mcavage_>that won't be parallelized.
19:44:58  <mcavage_>1m
19:46:11  <dap_>wanelo: You can definitely do it. Sounds like Mark's cooking something up. Basically you just need to call "mpipe" multiple times — once for each output. That's what "msplit" does.
19:47:52  <mcavage_>wanelo: yah what dap said - I was just going to cook a thing that did (standard) "split(1)" and then just run mpipe on each of the resulting chunks.
19:47:53  <nfitch>wanelo: Are you trying to have the number of tasks in the second phase be reliant on the number of lines from the previous phase?
19:48:08  <mcavage_>if you want to do it as a stream msplit should "just work"
19:48:42  * daviddiasjoined
19:50:05  <wanelo>nfitch: Correct. Each line in $MANTA_INPUT_FILE in the reduce phase should equal one map phase, if that makes sense.
19:51:05  <dap_>wanelo: You're downloading URLs, you said? So is each line a URL?
19:51:22  <wanelo>Yes.
19:52:48  <mcavage_>wanelo i think you can just do the equivalent of this:
19:52:48  <mcavage_>for i in {1..100} ; do echo "item $i" ; done | while read line ; do echo $line | mpipe ; done
19:52:57  <mcavage_>where replace my for loop with cat $MANTA_INPUT_FILE
19:53:23  * daviddiasquit (Ping timeout: 246 seconds)
19:53:57  <mcavage_>wanelo: yeah i just tried that, i think that will do what you want.
19:54:02  <mcavage_>so:
19:54:12  <dap_>Can you say what's the rough total number? You may be way better off running that in a fixed set of reducers than a separate mapper for each one. There's a (small) overhead for each mapper, so if it's like a million, you may be way better off having 100 reducers and giving each 1% of the total
19:54:44  <wanelo>dap: ~26000 lines
19:54:58  <mcavage_>wanelo: then yes what dap said - you're better off feeding like 1k to each.
19:56:59  <mcavage_>i'd basically just do split $MANTA_INPUT_FILE ; for f in `ls` ; do mpipe < $f ; done // or something
19:57:15  <dap_>Something like: "msplit -n 25" should take its stdin and partition the lines randomly into 25 separate reducers.
19:57:51  * lloyddejoined
19:58:32  <dap_>Here's an example:
19:58:33  <dap_>https://us-east.manta.joyent.com/dap/public/jobshares/63ce8b04-c460-ec74-f4b9-9dd4bf600007/index.html
19:58:51  <dap_>That was:
19:58:52  <dap_>mjob create -w -r 'seq 1 100 | msplit -n 25' --count=25 -r wc
20:01:52  <wanelo>alright then msplit -n is what we're using. Thanks so much for your help guys. :)
20:04:08  * __rockbot__joined
20:05:29  <dap_>wanelo: If you have a script "import.sh" which takes as an argument your S3 URL, fetches it, and stores it into Manta with an appropriate name, then you should be able to parallelize the reducer, too: "… -r 'xargs -n1 -P 10 import.sh'" (for example).
20:06:40  <dap_>no problem. glad to help!
20:43:12  * daviddiasjoined
20:47:41  * daviddiasquit (Ping timeout: 245 seconds)
21:08:19  * therealkoopaquit (Write error: Broken pipe)
21:13:59  * lloyddequit (Remote host closed the connection)
21:14:08  * lloyddejoined
21:14:44  * therealkoopajoined
21:31:47  * lloyddequit (Remote host closed the connection)
21:37:27  * daviddiasjoined
21:37:54  * lloyddejoined
21:42:09  * daviddiasquit (Ping timeout: 272 seconds)
21:57:43  <mjn>this might be trivial but: one thing i'm trying to port to manta involves running N map tasks on resampled versions of the same input (machine-learning thing that does 'bagging', i.e. training models on resamplings of the input and then averaging them)
21:58:07  <mjn>the simplest way to do this seems to be to just feed the same input file to every map task, but will this get me some kind of degenerate contention for jobs to run on the machine(s) hosting the file?
21:58:38  <mjn>alternately i could resample ahead of time and temporarily store the resampled data to the object store; or i could resample in another phase and mpipe
21:59:21  <mcavage_>mjn: at the limit yes, but I wouldn't really worry about it until it's a problem (there's not much difference between a "hot" node with other tenants and "hot" node b/c of your own demands)
22:01:13  <mjn>makes sense, i guess likely -p values won't be super-high anyway
22:01:51  <mjn>though in an ideal world it would be neat (at least as a theoretical possibility) to do -p 1000 and build a huge model instantly
22:05:33  <wanelo>any reason why mpipe doesn't clean up /var/tmp after it's done?
22:06:11  <nfitch>wanelo: https://github.com/joyent/manta-compute-bin/issues/24
22:10:22  <wanelo>*cry* thanks, we'll clean up manually for now.
22:29:21  <mjn>kudos to whoever thought of mlogin, makes a surprisingly big aesethetic difference even beyond the actual usefulness (which is also there of course)
22:29:57  <mjn>kind of transforms 'submitting JSON over a REST API' to 'just a regular unix box'
22:29:59  <mcavage_>thank josh clulow - although i don't know which IRC handle is his.
22:30:42  <nahamu>He's LeftWing
22:30:48  <mcavage_>there you go ;)
22:30:50  <nahamu>(though he's not in this channel at the moment...)
22:31:24  <nahamu>but he can be stalked^Wfound in #joyent or #smartos :)
22:31:38  * daviddiasjoined
22:32:03  <mcavage_>mjn: anyway, yeah we agree - it's the REPL for manta.
22:32:12  <mjn>yeah i guess that's a good way to put it
22:32:31  <mjn>REPLs aren't magic either, but feel qualitatively different from "compile this bit of code and execute it and then inspect the result..."
22:36:26  * daviddiasquit (Ping timeout: 250 seconds)
22:40:51  * daviddiasjoined
23:00:51  * mjnquit (Quit: leaving)
23:04:13  * AvianFluquit (Remote host closed the connection)
23:04:58  * lloyddequit (Remote host closed the connection)
23:05:54  * lloyddejoined
23:08:03  * mjnjoined
23:10:29  * lloyddequit (Ping timeout: 245 seconds)
23:27:08  * __rockbot__quit (Quit: __rockbot__)
23:27:39  * paulfryzelquit (Remote host closed the connection)
23:30:26  * paulfryzeljoined
23:34:44  * __rockbot__joined
23:35:12  * paulfryzelquit (Remote host closed the connection)
23:52:53  * therealkoopaquit (Remote host closed the connection)