00:31:28  * evanlucasquit (Read error: Connection reset by peer)
00:32:04  * evanlucasjoined
01:16:47  <jbergstroem>just restarted
01:31:13  * node-ghjoined
01:31:14  * node-ghpart
01:35:41  * node-ghjoined
01:35:41  * node-ghpart
02:07:22  <thealphanerd>jbergstroem did you change the backups? CI seems snappier
02:07:36  <jbergstroem>thealphanerd: restart. no java memory pressure
02:07:48  <jbergstroem>thealphanerd: currently doing a full rsync
02:07:56  <jbergstroem>after that i'll enable the job history stuff
02:45:21  * jgiquit (Quit: jgi)
02:45:58  * evanluca_joined
02:46:06  * evanlucasquit (Read error: Connection reset by peer)
03:01:31  * jenkins-monitorquit (Remote host closed the connection)
03:01:38  * jenkins-monitorjoined
03:23:49  <jbergstroem>joaocgreis: not sure if we improvefd by lowering it. now i see freezes like these: https://ci.nodejs.org/job/node-test-commit-plinux/1057/nodes=ppcbe-fedora20/consoleFull
03:26:42  <Trott>ppcbe-fedora20 is failing to build...Jenkins client down or something?
03:27:33  <Trott>(I guess I really should learn about Jenkins so I can check these things myself rather than type plaintive cries for help all the time in IRC.)
03:33:42  <jbergstroem>not sure whats going on; cause the vm is online and responsive
03:34:00  <jbergstroem>i suspect lowering the ping interval could be culprit
04:05:59  * ofrobotsjoined
04:40:51  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
04:49:21  * node-ghjoined
04:49:21  * node-ghpart
05:00:34  * node-ghjoined
05:00:34  * node-ghpart
05:07:17  * node-ghjoined
05:07:18  * node-ghpart
05:21:30  * rmgquit (Remote host closed the connection)
05:22:03  * rmgjoined
05:22:06  * node-ghjoined
05:22:06  * node-ghpart
05:26:41  * rmgquit (Ping timeout: 265 seconds)
06:06:01  <Trott>jbergstroem: FWIW, *both* ppc hosts (ppcbe-fedora20 and ppcle-ubuntu1404) appear to be failing to build.
06:08:06  <jbergstroem>Trott: its likely a result of changing hte ping interval this mornign.
06:08:25  <jbergstroem>Trott: the slaves are up and jenkins slave says connected
06:08:28  <jbergstroem>brb
07:07:01  * ofrobotsjoined
07:18:41  <joaocgreis>jbergstroem: the contrast is absolutely clear https://ci.nodejs.org/job/node-test-commit-plinux/ vs https://ci.nodejs.org/job/node-test-binary-windows/ (everything after 773 are real test failures!)
07:21:36  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
07:24:45  <jbergstroem>joaocgreis: the test failures for ppc are related to connection issues
07:25:43  <joaocgreis>exactly, and connections issues for azure disappear completely at about the same time
07:27:59  <joaocgreis>jbergstroem: actually, didn't ppc start failing before you restarted with the new ping interval?
07:28:19  <jbergstroem>joaocgreis: not really; tehre were some network/vm hosting issues this morning but that's not related
07:28:28  <jbergstroem>joaocgreis: if you check the fails you will see "channel already closed" issues
07:28:40  <jbergstroem>so slave is connected but master slave protocol seems broken
07:33:07  <joaocgreis>jbergstroem: first failure (https://ci.nodejs.org/job/node-test-commit-plinux/1053/) was at 8:09pm utc, from the log here you've done the change at 8:54 and restarted somewhere after 11:37
07:34:32  <jbergstroem>i think that's a different error though
07:34:35  <jbergstroem> hudson.remoting.ChannelClosedException: channel is already closed
07:36:02  <joaocgreis>and after the restart jobs just hang, right?
07:37:21  <jbergstroem>https://ci.nodejs.org/job/libuv+any-pr+multi/nodes=ppcbe-fedora20/244/console
07:38:29  <joaocgreis>I plan to revert the ping interval to 5 min when I catch jenkins idle, then try everything I can on the windows slaves side during the day
07:42:14  <jbergstroem>log in and go to the bottom of settings; you'll find the "prepare for shutdown"
07:42:19  <jbergstroem>that way you won't get any new jobs
07:42:35  <jbergstroem>there's nothing going on rightn ow but a frozen job on the ppc stuff
07:42:44  <jbergstroem>want me to restart?
07:43:40  <jbergstroem>doing it now.
07:45:45  <jbergstroem>joaocgreis: you took test-rackspace-win2008r2-x64-2 offline, right?
07:58:17  <joaocgreis>jbergstroem: thanks
07:58:52  <joaocgreis>jbergstroem: yes, 6 new servers coming up today
08:00:24  <jbergstroem>joaocgreis: awesome.
08:00:35  <jbergstroem>joaocgreis: will we even need the fanned stuff any longer?
08:03:06  <joaocgreis>Yes, until I change the w10 servers in azure
08:23:39  * rmgjoined
08:28:35  * rmgquit (Ping timeout: 260 seconds)
10:21:14  <jbergstroem>this is how hte ppcbe-2 slave died: Exception in thread "Ping thread for channel hudson.remoting.Channel@11b9298f:channel" hudson.remoting.RequestAbortedException: hudson.remoting.Channel$OrderlyShutdown
10:21:43  <jbergstroem>..and since master thinks its still around we get "INFO: Trying protocol: JNLP2-connect"
10:24:26  * rmgjoined
10:28:56  * rmgquit (Ping timeout: 240 seconds)
10:50:13  * imjacobclarkjoined
11:49:59  <joaocgreis>jbergstroem: do you keep the iptables rules listed somewhere or is it just running the commands in the server and they'll get picked by backup?
11:51:02  <jbergstroem>joaocgreis: iptables init script has a serialisation event on stop/start. if you're unsure as to what commands to write, just use iptables-save > foo, edit and iptables-restore foo
11:51:22  <jbergstroem>(this is all automated in next iteration of our playbooks btw)
11:51:41  * evanluca_changed nick to evanlucas
11:51:48  <joaocgreis>thanks!
12:17:59  * imjacobclarkquit (Ping timeout: 276 seconds)
13:15:58  * ofrobotsjoined
13:24:07  * imjacobclarkjoined
13:52:26  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
13:57:56  * ofrobotsjoined
14:05:01  * imjacobc_joined
14:05:35  * imjacobclarkquit (Remote host closed the connection)
14:39:25  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
14:40:33  * ofrobotsjoined
14:44:37  * ofrobotsquit (Client Quit)
14:49:13  * ofrobotsjoined
14:51:18  * ofrobotsquit (Client Quit)
15:49:27  * Fishrock123joined
16:02:46  * rmgjoined
16:24:24  * saperquit (Read error: Connection reset by peer)
16:27:47  * saperjoined
16:58:24  * jgijoined
17:19:07  * ofrobotsjoined
17:41:55  * jgiquit (Quit: jgi)
17:48:42  <Trott>Still persistent problems with ppcbe-fedora20 but everything else is more-or-less oK?
17:48:57  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
17:51:00  * imjacobc_quit (Remote host closed the connection)
17:52:15  <joaocgreis>Trott: I believe the changes I made to window slaves had some effect, there are still failures but overall seems much better. About ppc I don't know what is happening, we were blaming the ping interval, jbergstroem changed it to a known good value (5m) but the failures didn't go away
17:58:11  * jgijoined
18:04:37  * ofrobotsjoined
18:23:17  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
18:25:38  * ofrobotsjoined
18:31:46  <Trott>GREEN! IT'S GREEN AGAIN! https://ci.nodejs.org/job/node-test-pull-request/1395/
18:31:54  <Trott>Sorry. I get pretty excited about that.
18:31:58  <Trott>Never gets old.
18:40:24  * jgiquit (Quit: jgi)
18:56:13  <Trott>This one is a little interesting (to me, anyway, because I've not seen this come up, maybe it does a lot and I just don't ever see it): https://ci.nodejs.org/job/node-test-commit-plinux/1066/nodes=ppcle-ubuntu1404/console
18:57:07  <Trott>https://www.irccloud.com/pastebin/vU63o6uY/
18:57:19  <Trott>The Java/Jenkins/Hudson build splat came right in the middle of the test runs.
19:09:06  * jgijoined
19:18:00  <Trott>Ouch, those Azure hosts... https://ci.nodejs.org/job/node-test-binary-windows/788/
19:20:50  <jbergstroem>joaocgreis: i'll have a look at ppc today; will restart clients
19:21:21  <jbergstroem>Trott: thats an early exit, no?
19:26:11  * jgiquit (Quit: jgi)
19:27:44  <joaocgreis>Trott: the build before was aborted, that never helps in jenkins
19:28:49  <joaocgreis>jbergstroem: master is way too slow, if you can get it faster by dropping the build history I suspect that may help
19:29:15  <jbergstroem>joaocgreis: yep; waiting for full rsync to complete
19:30:43  <Trott>joaocgreis: Oh, I'm a serial build-abort button presser. If it helps with reliability, I can just let tests run to their completion. I thought I was doing people a favor by freeing up resources once I see that a test failed and I need to make a change or re-run.
19:32:26  <jbergstroem>Trott: cancelling should be fine
19:34:50  <joaocgreis>Trott: should be fine! freeing resources is good! Sorry I was not clear, I'm just complaining about jenkins, it's not the first time I've seen failures right after aborts
19:35:03  * jgijoined
19:49:41  * saperquit (Changing host)
19:49:41  * saperjoined
19:56:12  <jbergstroem>rsync just finished. i'll run it a few more times and will look at the plugin next. just need coffee :)
19:59:42  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
20:11:03  * ofrobotsjoined
20:20:57  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
20:21:14  * ofrobotsjoined
20:28:51  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
21:03:27  * jgiquit (Quit: jgi)
21:23:05  * node-ghjoined
21:23:05  * node-ghpart
21:26:01  * ofrobotsjoined
21:30:48  * ofrobotsquit (Ping timeout: 264 seconds)
21:33:48  * ofrobotsjoined
21:38:25  * node-ghjoined
21:38:26  * node-ghpart
21:41:46  * jgijoined
21:42:35  <jbergstroem>So; I've done a full rsync of the current jenkins folder to our backup machine at joyent. I'll proceed with the job history plugin now.
21:44:04  * node-ghjoined
21:44:04  * node-ghpart
21:47:28  <jbergstroem>fyi, there's a bug with the "prepare for shutdown" -- post scripts will be queued on top and just sit there.
21:48:19  <jbergstroem>joaocgreis: good job with the windows slaves. i guess it'll be fair to say we have enough horsepower
22:02:24  * chorrelljoined
22:06:38  * node-ghjoined
22:06:38  * node-ghpart
22:11:05  <jbergstroem>just enabled it and now jenkins is "thinking"
22:15:56  * Fishrock123quit (Remote host closed the connection)
22:19:00  <jbergstroem>just implemented the rule on node-test-pr; will see what happens during post-build phase
22:19:32  <jbergstroem>I'm thinking it won't cascade.
22:23:56  * chorrellquit (Quit: My Mac has gone to sleep. ZZZzzz…)
22:36:37  * Fishrock123joined
22:42:44  <jbergstroem>i'm pretty sure that didn't work as intended. The node-test-pr now has 4 jobs in the history.
22:44:39  <Trott>The stress test I just submitted is just sitting there in Pending and there's a banner warning me that Jenkins will be shutting down.
22:44:51  <Trott>Which is cool if, you know, it's going to shut down. :-D
22:44:52  <jbergstroem>yeah i don't want too many things going on while i do this
22:44:58  <Trott>Gotcha.
22:45:25  <jbergstroem>just cancelled it; give me a bit
22:49:40  * node-ghjoined
22:49:40  * node-ghpart
23:07:29  * node-ghjoined
23:07:30  * node-ghpart
23:08:00  <jbergstroem>Trott: feel free to get going; need to write some scriptz
23:09:00  * node-ghjoined
23:09:00  * node-ghpart
23:12:21  * ofrobotsquit (Quit: My Mac has gone to sleep. ZZZzzz…)
23:21:58  * ofrobotsjoined
23:29:52  * node-ghjoined
23:29:53  * node-ghpart
23:30:19  <jbergstroem>^ thoughts anyone?
23:45:49  <jbergstroem>good old 200+ line import: https://github.com/jenkinsci/jenkins/blob/0a5b03688ba6884429bb3d2a06be8b7955694420/core/src/main/java/jenkins/model/Jenkins.java#L29
23:49:31  <jbergstroem>ok; running out of time. i'll have to continue some other day.
23:50:59  * node-ghjoined
23:50:59  * node-ghpart