00:24:02  * jasnellquit
00:43:21  * sgimenoquit (Ping timeout: 240 seconds)
01:28:22  * Fishrock123quit (Remote host closed the connection)
01:29:00  * Fishrock123joined
01:33:21  * Fishrock123quit (Ping timeout: 258 seconds)
02:03:03  * Fishrock123joined
02:04:28  * Fishrock123quit (Client Quit)
02:59:41  <Trott>rvagg: ^^^^^^^
03:00:42  <Trott>Looks like it's spread to all the Pi builds?
03:01:05  <Trott>https://www.irccloud.com/pastebin/2vnd9sWs/
03:01:15  <Trott>(https://ci.nodejs.org/job/node-test-binary-arm/10886/RUN_SUBSET=addons,label=pi1-raspbian-wheezy/console)
03:01:26  <Trott>Looks like NFS or other file system failure?
03:23:12  <rvagg>Looking
03:32:12  <rvagg>ahh, "No space left on device"
03:32:37  <rvagg>jasnell's first failure had only the Pi1's failing because the 2's and 3's finished fast enough and the 1's were still running when it started to fill up too much
03:32:52  <rvagg>need to figure out why it's taking up so much space though
03:38:27  <rvagg>joaocgreis: node-test-binary-arm/.git is getting really big on a couple of these, in the order of 12G, can you check what's going on there, also could you drop some notes in here about where/what you're looking at? I really feel clueless on this and am not sure how I'd go about fixing it if you weren't around
03:42:32  <rvagg>ok, I've cleaned out the workspaces on all of the test pi's, this might create some warmup challenges on the next few runs so I'll kick off a master test job now to help with that
05:04:05  * joyeejoined
05:17:57  <Trott>Thanks, rvagg!
05:20:41  <Trott>Getting build failures on Pi with `error: bad index file sha1 signature` and `fatal: index file corrupt`. See https://ci.nodejs.org/job/node-test-binary-arm/10891/RUN_SUBSET=6,label=pi3-raspbian-jessie/console for an example.
05:21:23  <Trott>rvagg: Regarding ^^^^ would it make sense for me to take the failing ones offline or is it unlikely to be host-specific? (I guess I can wait a while and see if there's a correlation to hosts or not...)
05:24:40  <Trott>Seems to definitely correlate to test-requireio_williamkapke-debian8-arm_pi3-1 at least. Taking that one offline. rvagg
05:32:56  <rvagg>Trott: ok, yah take them offline when you see any kind of pattern and let me know so I can pull them out, do a destructive test on the SD card and reprovision with or without a replacenent.
05:33:47  <rvagg>Could still be related to the disk full. The delete took quite a while for all ~100G so maybe we got stuck midway and it corrupted.
06:29:35  * joyeequit (Remote host closed the connection)
06:39:49  * joyeejoined
06:44:15  * joyeequit (Ping timeout: 248 seconds)
06:49:37  <Trott>rvagg: I just took test-requireio_securogroup-debian7-arm_pi1p-1 offline too. Example failure there is https://ci.nodejs.org/job/node-test-binary-arm/10891/RUN_SUBSET=3,label=pi1-raspbian-wheezy/console:
06:49:56  <Trott>https://www.irccloud.com/pastebin/R7VuzNQl/
08:38:42  * joyeejoined
08:40:04  * joyeequit (Read error: Connection reset by peer)
08:40:22  * joyeejoined
08:43:25  * joyee_joined
08:43:26  * joyeequit (Read error: Connection reset by peer)
08:54:29  * joyee_quit (Remote host closed the connection)
08:56:02  * joyeejoined
08:57:20  * joyeequit (Remote host closed the connection)
09:03:23  * seishunjoined
09:08:40  * joyeejoined
09:30:56  * joyeequit (Remote host closed the connection)
09:31:33  * joyeejoined
09:35:35  * joyeequit (Ping timeout: 240 seconds)
10:25:11  * mylesborinsquit (Quit: farewell for now)
10:25:41  * mylesborinsjoined
13:36:08  * joyeejoined
13:50:30  * joyeequit (Remote host closed the connection)
13:51:05  * joyeejoined
14:38:21  * joyeequit (Remote host closed the connection)
14:38:47  * joyeejoined
14:54:38  * joyee_joined
14:56:05  * joyeequit (Ping timeout: 240 seconds)
15:03:03  * node-ghjoined
15:03:03  * node-ghpart
15:20:35  * node-ghjoined
15:20:35  * node-ghpart
15:22:04  * joyee_quit (Remote host closed the connection)
15:22:31  * joyeejoined
15:32:12  * joyee_joined
15:34:02  * joyeequit (Ping timeout: 260 seconds)
15:39:48  <joaocgreis>rvagg some notes then: re the pis: there is git-rpi-clean to clean the workspaces weekly. I had it running daily but could not keep up with pis going offline, the jenkins backlog was getting huge, so I changed it to weekly. If someone else in the wg notices a `git-rpi-clean` stuck in the work queue, please abort it and update the job to not run on that worker. When someone brings a pi back (usually rvagg or me
15:39:48  <joaocgreis>I guess), please update the job to run on it again.
15:40:03  <joaocgreis>I've updated the job to run on everything and am running it now, but it might take a while because there are node-test-commit running
15:40:27  <joaocgreis>I mean, to run on every pi
15:41:56  <joaocgreis>Also note it does move then delete, to do something like atomic delete and not leave corrupted git repos if it aborts or fails (sometimes deleting files fail, possibly because of nfs)
15:42:36  <joaocgreis>not sure if we should document this somewhere, the process might still evolve.. perhaps I should add a note about this to the onboarding?
15:47:42  <joaocgreis>also one note about checking jobs: those of us with infra access can grep all the jenkins jobs configs, so anyone: feel free to ping us if you want to make sure something is not being used anywhere or configured the same for all jobs or something like that
15:48:14  <joaocgreis>Trott: when you see that, don't hesitate to mark as offline right away and figure out if it's host related later (if the failures stop, it probably is). We have many pis, so even with many offline the jobs will still run (even if much slower) and not fail, which is better. Thanks for noticing these things quickly and acting!
15:53:58  * node-ghjoined
15:53:58  * node-ghpart
15:54:35  * node-ghjoined
15:54:35  * node-ghpart
16:32:33  * seishunquit (Ping timeout: 248 seconds)
17:00:34  * seishunjoined
17:02:38  * joyee_quit (Remote host closed the connection)
17:03:13  * joyeejoined
17:07:45  * joyeequit (Ping timeout: 248 seconds)
17:54:30  * joyeejoined
17:59:17  * joyeequit (Ping timeout: 260 seconds)
18:07:23  * seishunquit (Quit: ChatZilla 0.9.93 [Firefox 56.0.1/20171002220106])
18:12:36  * seishunjoined
18:37:49  * joyeejoined
18:42:28  * joyeequit (Ping timeout: 240 seconds)
19:29:29  * joyeejoined
19:34:05  * joyeequit (Ping timeout: 240 seconds)
20:09:13  * node-ghjoined
20:09:13  * node-ghpart
20:18:43  * node-ghjoined
20:18:43  * node-ghpart
20:20:15  * node-ghjoined
20:20:15  * node-ghpart
20:31:59  * seishunquit (Read error: Connection reset by peer)
20:38:08  * node-ghjoined
20:38:08  * node-ghpart
20:38:50  * node-ghjoined
20:38:50  * node-ghpart
20:40:30  * node-ghjoined
20:40:30  * node-ghpart
20:42:03  * node-ghjoined
20:42:03  * node-ghpart
20:43:33  * node-ghjoined
20:43:33  * node-ghpart
20:55:42  * node-ghjoined
20:55:42  * node-ghpart
21:02:42  * node-ghjoined
21:02:42  * node-ghpart
21:06:42  * node-ghjoined
21:06:42  * node-ghpart
21:10:55  * node-ghjoined
21:10:55  * node-ghpart
21:11:38  * joyeejoined
21:12:40  * node-ghjoined
21:12:40  * node-ghpart
21:15:56  * joyeequit (Ping timeout: 246 seconds)
21:18:25  * node-ghjoined
21:18:25  * node-ghpart
21:18:40  * node-ghjoined
21:18:40  * node-ghpart
21:27:21  * node-ghjoined
21:27:21  * node-ghpart
22:12:43  * joyeejoined
22:17:37  * joyeequit (Ping timeout: 248 seconds)
23:01:03  * joyeejoined
23:05:27  * joyeequit (Ping timeout: 240 seconds)
23:40:09  * node-ghjoined
23:40:09  * node-ghpart
23:49:35  * node-ghjoined
23:49:35  * node-ghpart