01:03:53  * trentmquit (Quit: Leaving.)
01:04:20  * chorrelljoined
01:09:12  * pmooneyquit (Ping timeout: 252 seconds)
01:14:23  * dap_quit (Quit: Leaving.)
01:18:38  * ryancnelsonquit (Quit: Leaving.)
01:20:01  * ed209quit (Remote host closed the connection)
01:20:08  * ed209joined
01:26:29  * nfitchquit (Ping timeout: 245 seconds)
01:26:47  * pgalequit (Quit: Leaving.)
01:26:55  * fredkquit (Quit: Leaving.)
01:39:17  * nfitchjoined
01:42:13  * chorrellquit (Read error: Connection reset by peer)
01:43:58  * chorrelljoined
01:45:51  * chorrellquit (Client Quit)
02:11:10  * nfitchquit (Remote host closed the connection)
03:10:46  * pmooneyjoined
03:15:22  * pmooneyquit (Ping timeout: 240 seconds)
03:48:49  * marsellquit (Quit: marsell)
06:21:29  * bahamas10quit (Ping timeout: 245 seconds)
06:53:06  * _Tenchi_joined
08:46:22  * bixu_joined
10:11:11  * bixu_quit (Remote host closed the connection)
10:11:37  * bixu_joined
10:12:56  * bixu_changed nick to bixu
10:14:51  * |woody|quit (Quit: ZNC - http://znc.in)
10:15:45  * |woody|joined
10:17:41  * bixu_joined
10:18:58  * bixu_quit (Client Quit)
10:19:16  * bixu_joined
10:20:00  * ed209quit (Remote host closed the connection)
10:20:07  * ed209joined
10:21:07  * bixuquit (Ping timeout: 245 seconds)
10:41:53  * bixu_changed nick to bixu
10:59:53  * xmerlinjoined
11:20:07  * bixuquit (Remote host closed the connection)
11:20:33  * bixujoined
11:23:52  * pmooneyjoined
11:24:51  * bixu_joined
11:28:01  * bixuquit (Ping timeout: 264 seconds)
11:28:12  * pmooneyquit (Ping timeout: 246 seconds)
12:18:57  * bixu_changed nick to bixu
12:32:49  * marselljoined
13:15:04  * happy-dudejoined
14:20:41  * xmerlinchanged nick to Guest19071
15:04:41  * pmooneyjoined
16:13:56  * pgalejoined
17:31:43  * trentmjoined
17:33:54  * ryancnelsonjoined
17:36:44  * dap_joined
17:37:53  * ryancnelsonpart
18:01:31  * fredkjoined
18:01:35  * dap_1joined
18:03:49  * dap_2joined
18:04:52  * dap_quit (Ping timeout: 240 seconds)
18:06:26  * dap_1quit (Ping timeout: 265 seconds)
18:31:02  * nfitch_joined
18:47:05  * nfitch_quit (Quit: Leaving.)
19:02:56  * dap_2quit (Quit: Leaving.)
19:03:27  * dap_joined
19:18:05  * codecaverjoined
19:43:54  * pmooneyquit (Quit: WeeChat 1.1.1)
19:53:01  * bixuquit (Remote host closed the connection)
19:53:28  * bixujoined
19:54:07  * bixuquit (Client Quit)
19:56:32  * pmooneyjoined
20:03:38  * pgalequit (Quit: Leaving.)
20:07:14  * pgalejoined
20:10:14  * nfitch_joined
20:12:25  <nfitch_>Probably off topic here, but the right audience is here, so sorry :/
20:12:41  <nfitch_>Anyone tried this lately? https://www.joyent.com/blog/mdb-and-linux
20:13:22  <rmustacc>nfitch_: In theory you shouldn't need any of the LD_PRELOAD bits in manta or on recent SmartOS bits.
20:13:25  <rmustacc>But what's up?
20:13:37  <nfitch_>I have 2 vms running on my mac, one smartos and the other ubuntu, and mdb says the core is "corrupt or missing required data".
20:13:50  <nfitch_>Ah, I'll try without that...
20:14:02  <rmustacc>Depends how recent they are.
20:14:25  <rmustacc>But you're supplying both the node binary and the core file?
20:14:29  <rmustacc>How'd you generate the core file?
20:15:14  <nfitch_>Here are all the commands (as copied when I was trying to repro):
20:15:15  <nfitch_>https://gist.github.com/nfitch/cd3e77be6d1851b6780d
20:15:21  <nfitch_>Ga, I hate emojis.
20:15:40  <nfitch_>:)
20:15:44  <nfitch_>Ok, much better.
20:16:37  <rmustacc>It looks like you didn't copy over the node binary from Ubuntu?
20:17:03  <rmustacc>When you specify the node binary, it's not the SmartOS one, you have to specify the one from Linux.
20:17:22  <nfitch_>No… ok… that makes more sense, now doesn't it...
20:17:27  <nfitch_>Trying...
20:18:58  <nfitch_>rm: Yup that worked. You rock, as always!
20:19:36  <rmustacc>No problem, glad it's working.
20:20:00  * ed209quit (Remote host closed the connection)
20:20:07  * ed209joined
20:25:33  <nfitch_>dap_: I was about to ask when the 0.12 debugging blog post was going to come out… but then I saw https://www.joyent.com/blog/debugging-enhancements-in-node-0-12
20:25:37  <nfitch_>Much thanks!
20:35:16  <dap_>nfitch_: glad it's helpful!
20:57:17  * pmooneyquit (Read error: Connection reset by peer)
20:58:12  * pmooneyjoined
21:07:08  * nfitch_quit (Quit: Leaving.)
21:22:28  * axisys_awayquit (Quit: leaving)
21:22:53  * axisysjoined
21:22:58  * axisysquit (Changing host)
21:22:58  * axisysjoined
21:46:25  * nfitch_joined
21:59:21  <dap_>nfitch_: Someone has ported the guts of manatee v2 to Go: https://github.com/flynn/flynn/pull/986
22:01:28  * pmooneyquit (Ping timeout: 250 seconds)
22:06:38  <nfitch_>dap_: Wow. I mean… just… wow.
22:08:02  <nfitch_>dap_: Speaking of which… any major manatee issues lately?
22:08:19  <nfitch_>(S'ok if you don't want to discuss :)
22:08:38  <dap_>We've had a few issues...
22:09:15  * pmooneyjoined
22:09:17  <dap_>There was this, although it didn't result in any actual problem before we found it:
22:09:17  <dap_>https://smartos.org/bugview/MANATEE-252
22:09:50  <dap_>We had this one: https://github.com/joyent/manatee/commit/2951be10d8a70b6aebeeb9bb93bc0eb6835a1612
22:12:20  <nfitch_>> The system is down at this point, and it's not exactly clear how to recover or finish the upgrade.
22:12:25  <nfitch_>You don't. You roll back.
22:12:29  <dap_>We've had some trouble trying to upgrade some ancient manatees, but the problems were only on the ancient bits
22:13:38  <dap_>nfitch_: We basically decided that it was better to just leave the cluster frozen. It can't *actually* recover from a primary failure at that point, so there's no point in allowing it to try, given that it can only result in you having to rollback.
22:13:38  <nfitch_>The upgrade procedure assumes you always have the sync to recover from. Maybe I'm missing something, though…
22:14:02  <dap_>Basically, when you reprovision the primary, there's no interlock in the procedure to ensure that the sync was ever actually caught up.
22:14:17  <dap_>The interlock in v2 is the initWal value, but that was made up by cluster-backfill.
22:15:03  <nfitch_>Right, which is why you have to roll back Z and Y should naturally take over as primary (which is correct). I thought that's why we had to add cluster state freezing.
22:15:27  <dap_>The problem is that the original procedure unfreezed the cluster before there's any guarantee that the sync is actually up to date.
22:15:56  <nfitch_>(so that Z wouldn't actually ever take over as primary since we're fibbing that Z could take over).
22:16:12  <dap_>Remember that the original async (now on v2) treats itself as the sync, but the *primary* (still on v1) was never actually configured to replicate synchronously to the original async.
22:17:25  <nfitch_>Until Y is reprovisioned. Then Z is promoted in X's eyes to be the sync. It should catch up. So the problem was there there was no guarantee that Z actually caught up to X?
22:18:02  <dap_>Correct
22:18:02  <nfitch_>I can see that… but I thought in the upgrade proceduer an operator was supposed to check cluster state before reprovisioning X...
22:18:28  <dap_>Well, the cluster *state* was fine
22:18:59  <nfitch_>No, I meant looking at the xlog positions.
22:19:33  <nfitch_>And no, I didn't explicitly state to make sure the sync (Z) was making forward progress as a sync.
22:20:12  <dap_>If that was there, I think we missed it.
22:20:24  <nfitch_>But yeah, I see the problem. I think the solution on failure is to roll back rather than unfreeze.
22:20:40  <nfitch_>No, I just checked. I didn't say it, only that it should "look normal":
22:20:52  <nfitch_>https://github.com/joyent/manatee/blob/master/docs/migrate-1-to-2.md#reprovision-sync
22:20:54  <dap_>The problem is it's not just a failure… you can get into trouble right up to the very last step when you reprovision X
22:21:14  <nfitch_>Right if it's not caught up.
22:21:22  <dap_>Yeah. Anyway, this didn't really cause a problem, and the fix was simple.
22:21:53  <dap_>LeftWing and I also decided it was safer to disable morays if possible, but I don't remember which cases he had identified that led to that.
22:22:03  <nfitch_>Ok, phew.
22:22:56  <dap_>nfitch_: The scariest problem we've run into was not really either of our faults, but it did lead to a considerable outage :( It was basically https://github.com/brianc/node-postgres/issues/718
22:23:08  <dap_>We had a health-check query black-hole
22:23:12  <nfitch_>Oh? That would be interesting since they were meant to be backward compatible…
22:24:03  <nfitch_>dap_: Oh, that postgres issue is scary… and unfortunate :(
22:24:11  <dap_>Yeah… it was totally a client issue.
22:25:23  <nfitch_>That's too bad about problems on upgrade… has regular manatee operation been fine otherwise?
22:28:30  <dap_>Basically, Manatee issued a health check query to the postgres client when postgres was down (I don't remember why), and then before that query completed, Manatee started a takeover. (That's all fine so far.) Eventually the first query failed with 'connect ETIMEDOUT' (again, all fine so far), but the postgres client just ate the second query. (Basically, the client queues queries and wasn't smart enough to abort the queued query, nor retry it.) That query was b
22:28:58  <nfitch_>https://github.com/flynn/flynn/pull/986 <— is crazy! I don't know whether to be impressed or… what...
22:29:28  <dap_>Anyways, the summary is that the first week or so after upgrading was a little rocky, but Manatee itself has been largely fine.
22:30:44  <nfitch_>Eating the query is scary… good to hear it's been fine since :)
22:31:56  <dap_>Yeah :) I think the Go thing is pretty cool.
22:32:41  <nfitch_>It's certainly cool to see people pick up the theory, even if they wanted another implementation.
22:34:40  <dap_>Yeah.
22:48:51  * pgalequit (Quit: Leaving.)
22:55:14  * marsellquit (Ping timeout: 245 seconds)
23:27:12  * pmooneyquit (Quit: WeeChat 1.1.1)
23:41:56  * pmooneyjoined
23:57:11  <nfitch_>Hrmmmm… I must be missing something fundamental on how dmods are delivered and loaded.
23:57:44  <nfitch_>I downloaded and ran a node program from the official node binaries: http://nodejs.org/download/
23:58:03  <nfitch_>For 0.12, that is.
23:58:21  <nfitch_>When I dumped a core and ::load v8, I got the "old" dmods.
23:58:45  <nfitch_>I had to load the .so bundled to get the "new" dmods:
23:58:52  <rmustacc>By default, when you don't specify an absolute path to the module, we search the built-in path.
23:58:57  <nfitch_>[root@dev-1 ~]# mdb core.node.8104
23:58:57  <nfitch_>Loading modules: [ libumem.so.1 libc.so.1 ld.so.1 ]
23:58:57  <nfitch_>> ::load ./node-0.12/node-v0.12.0/out/Release/mdb_v8.so
23:59:06  <rmustacc>Which includes a copy of mdb_v8; however, it's not quite the latest one yet.
23:59:28  <nfitch_>Is there a way to tell what .so the dmods are loaded from?