00:03:29  * pgale61joined
01:00:49  * bahamatquit (Quit: Leaving.)
01:03:53  * bahamatjoined
01:06:42  * trentmquit (Quit: Leaving.)
01:20:02  * ed209quit (Remote host closed the connection)
01:20:09  * ed209joined
01:21:45  * chorrelljoined
01:26:41  * jhendricksquit (Quit: Leaving.)
01:44:06  * dap_1quit (Quit: Leaving.)
02:18:08  * chorrellquit (Quit: My MacBook has gone to sleep. ZZZzzz…)
02:18:29  * chorrelljoined
02:28:58  * chorrellquit (Quit: My MacBook has gone to sleep. ZZZzzz…)
03:03:57  * mellocquit (Quit: Leaving.)
03:58:00  * mellocjoined
03:59:36  * pgale61quit (Quit: Leaving.)
04:01:37  * pgale61joined
04:35:23  * jhendricksjoined
04:39:00  * jhendricksquit (Client Quit)
04:39:43  * jhendricksjoined
04:41:02  * jhendricksquit (Client Quit)
05:21:45  * pgale61quit (Quit: Leaving.)
05:26:10  * pgale61joined
05:56:02  * pgale61quit (Quit: Leaving.)
05:59:41  * pgale61joined
06:03:57  * jhendricksjoined
06:05:24  * jhendricksquit (Client Quit)
06:53:40  * pgale61quit (Quit: Leaving.)
16:09:51  * rmustacctopic: Manta: Big Data Unix | Now Open Source! -- https://github.com/joyent/manta | http://apidocs.joyent.com/manta/ | http://logs.libuv.org/manta/latest
16:45:06  * mellocjoined
16:52:51  * jhendricksjoined
16:54:02  * mellocquit (Quit: Leaving.)
16:55:47  * bsmithx10joined
17:19:40  * trentmjoined
17:38:52  * mellocjoined
18:05:11  * elijahZ24joined
18:14:17  * glasspelicanquit (Ping timeout: 246 seconds)
18:18:04  * pmooneyquit (Quit: WeeChat 1.7)
18:20:55  * pmooneyjoined
18:23:13  * glasspelicanjoined
19:43:05  <bsmithx10>my 3rd Storage Node is being ignored, and is reporting the wrong amount of storage? Any ideas where to start figuring out whats wrong here?
19:51:14  * Technoquit (Ping timeout: 246 seconds)
19:54:07  <bahamat>Does it have a storage zone running on it?
19:58:22  <bsmithx10>bahamat: yea
19:58:31  <bsmithx10>it shows up in the marlin-dashboard etc
20:01:26  <bsmithx10>https://paste.ec/paste/TFk2obGD#DppH07OcTAzOe6MEpwxRqNk-1Smk9aP52YRYWVa+2mv
20:02:33  <bsmithx10>hmmmm what is Disk Slop Used?
20:04:39  <bsmithx10>reading https://github.com/joyent/rfd/blob/master/rfd/0020/README.md
20:07:37  <bahamat>What about madtom?
20:07:53  <bsmithx10>my goal is to have copies of a mput
20:08:13  <bsmithx10>that i can spead up map reduce jobs by having more zones run agianst the data
20:08:19  <bahamat>I think slop is the available spare capacity that marlin zones can expand into.
20:09:10  <bahamat>i.e., doing mjob create --disk 32 will create a job having 32GB of disk space instead of the default.
20:09:59  <bsmithx10>bahamat: I assume thats good for ETL
20:11:14  <bahamat>It depends.
20:11:35  <bahamat>For jobs that use temp files, being able to take extra disk space is necessary.
20:12:11  <bahamat>Other jobs do everything in memory, and with large objects being able to allocate more RAM is necessary.
20:12:26  <bsmithx10>hmmmm supervisor is red
20:12:28  <bahamat>But there's no need to allocate 128GB and 1.5T to every marlin task.
20:12:40  <bahamat>How many supervisors do you have?
20:12:40  <bsmithx10>1 of the manatee instances and postgres is red....
20:12:43  <bsmithx10>just 1
20:13:00  <bahamat>The manatee is probably deposed.
20:13:23  <bahamat>You'll need to figure out what's going on with that supervisor as well then.
20:13:40  <bahamat>That's not going to have anything to do with whether or not your other storage instance gets used.
20:13:59  <bahamat>So, let's dig deeper.
20:14:02  <bsmithx10>yea 1 instance is depossed
20:14:08  <bsmithx10>is there a command to undeposs it?
20:14:10  <bahamat>1. Why do you think the amount of storage isn't being reported correctly?
20:14:25  <bsmithx10>well iwas stupid and thought SLOP
20:14:29  <bsmithx10>was how it was being measured :(
20:14:36  <bsmithx10>input /frownie face
20:14:45  <bsmithx10>so its probably fine
20:14:52  <bahamat>Log into the manatee zone, then run `svcadm disable manatee-sitter; sleep 60; manatee-adm rebuild`
20:18:10  <bsmithx10>hmm supervisor is Unhealthy CheckTimeout
20:18:23  <bahamat>Check SMF inside the zone
20:18:28  <bsmithx10>all up
20:19:17  <bsmithx10>cool, that pg instance is rebuilt clean
20:19:27  <bahamat>Did you verify that there are no required services that are disabled?
20:20:01  * ed209quit (Remote host closed the connection)
20:20:09  * ed209joined
20:20:14  <bahamat>It could also be that madtom is out of date. It doesn't automatically get updated when zones change.
20:20:22  <bsmithx10>I was able to mlogin
20:20:33  <bahamat>You may need to reprovision madtom if the jobsupervisor zone you have is not the one that madtom thinks you have.
20:21:31  <bsmithx10>whats the best way to reprov ?
20:21:33  <bsmithx10>sdcadm?
20:21:40  <bsmithx10>or manta-adm?
20:21:43  <bahamat>manta-adm
20:21:48  <bahamat>So do this:
20:21:59  <bahamat>1. manta-adm show -js > current.json
20:22:11  <bahamat>2. edit current.json and set the madtom count to 0
20:22:19  <bahamat>3. manta-adm update current.json
20:22:32  <bahamat>4. Edit current.json and set the madtom count to 1
20:22:39  <bahamat>5. manta-adm update current.json
20:24:56  <bsmithx10>bahamat: lol...you must love engineering for that gem :P
20:25:34  <bahamat>Well, it has its advantages. But there are edge cases where it makes things more difficult.
20:25:58  <bahamat>It's something we're aware of, and would like to improve.
20:30:18  <bsmithx10>bahamat: do I gain any perf from having more jobsupervisors?
20:32:47  <bsmithx10>bahamat: that cleared up the supervisor alert
20:32:52  <bahamat>bsmithx10: No, I don't think so
20:33:50  <bahamat>But it does give you redundance.
20:33:58  <bahamat>We have two per datacenter
20:34:09  <bahamat>s/redundance/redundancy/
20:51:37  <bsmithx10>so, how can i force / make sure that a mput is put on all nodes in the cluster
20:51:49  <bsmithx10>I did copies=3 or whatever
20:52:10  <bsmithx10>but when i did a map task... dashboard only lit up 2 of the 3 nodes
21:01:47  <bahamat>That doesn't necessarily mean that your objects are not on the 3rd storage node.
21:05:17  <bahamat>You can log into the storage zone on your storage node and look in /manta for file objects.
21:57:50  * pgale61joined
21:59:02  <bsmithx10>bahamat: I saw that.... so is there a procedure to increase the speed my jobs by spreading the dataset accross more nodes?
21:59:43  <bahamat>It'll parallelize across inputs as it deems necessary.
21:59:58  <bahamat>The best way to spread the load is to make sure that your objects are spread out.
22:00:12  <bahamat>Object index keys are sharded on the directory that they're in.
22:00:47  <bahamat>So if you put all of your stuff into /me/my/logs/*.log, that's one directory. They'll all be sharded to the same location.
22:01:28  <bahamat>So for example, when we do log files it's /me/stor/logs/service/datacenter/year/mo/dy/hr/node.log
22:20:52  <bsmithx10>ahhhh
22:21:05  <bsmithx10>so.... because i have /me/stor/subtitles/files
22:21:34  <bsmithx10>only 2 SNs can access those files?
22:21:45  <bsmithx10>I only have 1 shard
22:26:07  <bahamat>It's not that only two can access the files, it's that the files are physically stored only on two.
22:26:26  <bahamat>marlin tasks are routed to the CN where the files are physically stored.
22:30:03  * pgale61quit (Quit: Leaving.)
22:40:25  <bsmithx10>bahamat: even if I mput with copies 3?
22:43:28  <bahamat>bsmithx10: That I'm not sure of.
23:18:11  <bsmithx10>anyway to see where my copies are stored?
23:19:31  <bahamat>Well, like I said you can log into the storage zones and look in /manta for object files.
23:19:58  <bahamat>Those are the actual raw files, so if it's ascii you can cat it, etc.
23:20:12  <bahamat>You can dig around through there to see if you find it.
23:43:20  * jayschmidtquit (Quit: Leaving.)
23:45:26  * pgale61joined
23:59:06  <bsmithx10>bahamat: there isnt a way to check the in the metadata tier?
23:59:30  * pgale61quit (Ping timeout: 240 seconds)
23:59:35  <bahamat>bsmithx10: I'm not sure how. It's certainly in moray somewhere.
23:59:52  <bahamat>Not that it's unknowable, I just never went looking.