00:10:07  * mikealquit (Quit: Leaving.)
00:10:39  * werlejoined
00:12:16  * mikealjoined
00:15:54  * timoxleyquit (Remote host closed the connection)
00:16:10  * timoxleyjoined
00:33:48  * kenansulaymanquit (Quit: ≈ and thus my mac took a subtle yet profound nap ≈)
00:38:30  * werlequit (Ping timeout: 264 seconds)
01:21:21  <levelbot>[npm] level-sleep@0.3.0 <http://npm.im/level-sleep>: Database for storing, cloneing and replicating SLEEP compatible data stores. (@mikeal)
01:46:43  <mikeal>haha
01:46:49  <mikeal>someone doesn't like typos in this channel :)
01:48:38  * st_lukejoined
01:53:12  * thlorenzjoined
02:02:15  * st_lukequit (Read error: Connection reset by peer)
02:02:43  * st_lukejoined
03:01:30  * davidstraussquit (Quit: No Ping reply in 180 seconds.)
03:01:53  * davidstraussjoined
03:08:30  * thlorenzquit (Remote host closed the connection)
04:06:24  <mikeal>is there a number of keys at which leveldb just falls over?
04:29:58  <rvagg>not that I'm aware of
04:32:06  <mbalho>mikeal: whatever teh max number of files in a folder is would be the limit
04:32:17  <mbalho>mikeal: 2mb per file
04:32:39  <mbalho>http://stackoverflow.com/questions/7722130/what-is-the-max-number-of-files-that-can-be-kept-in-a-single-folder-on-win7-mac
04:38:50  <mikeal>after a repair it works much better
04:38:55  <mikeal>was seeing some slowness
05:01:23  * st_lukequit (Remote host closed the connection)
05:42:18  * st_lukejoined
06:05:48  * mcollinaquit (Remote host closed the connection)
06:13:22  * mcollinajoined
06:28:50  <levelbot>[npm] level-assoc@0.8.0 <http://npm.im/level-assoc>: relational foreign key associations (hasMany, belongsTo) for leveldb (@substack)
06:32:52  <levelbot>[npm] level-batcher@0.0.1 <http://npm.im/level-batcher>: stream designed for leveldb that you write objects to, and it emits batches of objects that are under a byte size limit (@maxogden)
06:33:11  <mbalho>yayay
06:33:13  <mbalho>o/
06:41:07  <substack>nice
06:52:51  * wolfeidauquit (Remote host closed the connection)
06:53:06  * wolfeidaujoined
06:53:15  * wolfeidauquit (Remote host closed the connection)
06:58:56  * mcollinaquit (Remote host closed the connection)
07:06:13  * Acconutjoined
07:08:53  * Acconutquit (Remote host closed the connection)
07:09:36  * st_lukequit (Remote host closed the connection)
07:26:07  * wolfeidaujoined
07:28:43  * dominictarrjoined
07:38:08  * substackquit (Remote host closed the connection)
07:41:48  * jcrugzzquit (Ping timeout: 245 seconds)
07:48:11  * mcollinajoined
07:49:24  * wolfeidauquit (Remote host closed the connection)
07:51:48  * wolfeidaujoined
07:54:19  * mcollinaquit (Read error: Connection reset by peer)
07:54:25  * mcollina_joined
07:56:54  * mcollina_quit (Read error: Connection reset by peer)
07:57:39  * dominictarrquit (Quit: dominictarr)
08:00:54  * mcollinajoined
08:08:04  * mcollinaquit (Read error: No route to host)
08:08:20  * mcollinajoined
08:22:24  * dominictarrjoined
08:31:19  * jcrugzzjoined
08:38:34  * mcollina_joined
08:42:14  * mcollinaquit (Ping timeout: 264 seconds)
08:45:45  * mcollina_quit (Remote host closed the connection)
09:01:45  * mcollinajoined
09:17:32  * kenansulaymanjoined
09:33:26  * mcollinaquit (Remote host closed the connection)
09:50:26  * jcrugzzquit (Ping timeout: 245 seconds)
10:03:15  * wolfeidauquit (Remote host closed the connection)
10:27:19  * mcollinajoined
10:52:31  * mcollinaquit (Ping timeout: 245 seconds)
10:59:06  * mcollinajoined
11:02:14  * rudquit (Quit: rud)
11:41:25  * wolfeidaujoined
11:56:45  * werlejoined
11:57:46  * mcollinaquit (Remote host closed the connection)
12:20:25  * mcollinajoined
12:27:31  * mcollinaquit (Ping timeout: 245 seconds)
12:34:28  * thlorenzjoined
12:52:34  * rudjoined
13:06:15  * mcollinajoined
13:12:11  * mcollinaquit (Read error: Connection reset by peer)
13:12:34  * mcollinajoined
13:12:43  * mcollinaquit (Remote host closed the connection)
13:33:03  * kenansulaymanquit (Quit: ≈ and thus my mac took a subtle yet profound nap ≈)
13:35:16  * kenansulaymanjoined
13:35:30  * kenansulaymanquit (Client Quit)
13:49:42  * tmcwjoined
13:50:00  * tmcwquit (Remote host closed the connection)
13:50:15  * tmcwjoined
13:52:35  * julianduquequit (Quit: leaving)
13:56:18  * fallsemojoined
14:36:15  * rudquit (Quit: rud)
15:01:59  * jerrysvjoined
15:08:44  * ramitosjoined
15:12:44  * mikeal1joined
15:13:35  * mikealquit (Ping timeout: 256 seconds)
15:16:06  * rickbergfalkjoined
15:19:11  * rudjoined
15:19:11  * rudquit (Changing host)
15:19:11  * rudjoined
15:21:03  * mikeal1quit (Quit: Leaving.)
15:21:51  * mikealjoined
15:25:19  * jcrugzzjoined
15:26:35  * rudquit (Quit: rud)
15:34:23  * jerrysvquit (Read error: Connection reset by peer)
15:34:26  * jerrysv_joined
15:50:03  * dominictarrquit (Quit: dominictarr)
15:58:19  <levelbot>[npm] tik@0.0.5 <http://npm.im/tik>: command line key/value store (@jarofghosts)
15:59:23  * ryan_ramagejoined
16:03:10  * esundahljoined
16:06:32  * rudjoined
16:43:03  * jerrysv_changed nick to jerrysv
16:46:57  * ednapiranhajoined
16:47:05  * ednapiranhaquit (Remote host closed the connection)
16:47:18  * ednapiranhajoined
16:56:18  * dominictarrjoined
16:57:56  * dguttmanjoined
16:58:03  * substackjoined
17:05:17  * mikealquit (Quit: Leaving.)
17:05:18  * jerrysvquit (Read error: Connection reset by peer)
17:10:54  * jxsonjoined
17:11:57  * jxsonquit (Remote host closed the connection)
17:12:04  * jxsonjoined
17:43:36  * dguttmanquit (Quit: dguttman)
17:54:41  * dguttmanjoined
18:01:01  * mikealjoined
18:01:26  * jxsonquit (Remote host closed the connection)
18:06:28  * jxsonjoined
18:06:40  * jxsonquit (Remote host closed the connection)
18:07:46  * jxsonjoined
18:11:36  * mikealquit (Quit: Leaving.)
18:12:24  * ryan_ramagequit (Quit: ryan_ramage)
18:14:39  * mikealjoined
18:19:43  * dguttmanquit (Ping timeout: 260 seconds)
18:24:53  * rudquit (Quit: rud)
18:59:17  * rudjoined
18:59:18  * rudquit (Changing host)
18:59:18  * rudjoined
19:10:29  * ryan_ramagejoined
19:39:06  <mbalho>muahaha https://github.com/maxogden/level-batcher/blob/master/index.js#L15
19:46:55  * julianduquejoined
19:47:13  <rescrv>mbalho: does level-batcher block until the batch is full?
19:47:49  <mbalho>rescrv: it buffers until the batch is full
19:48:24  <mbalho>rescrv: also, and i am not sure this is the best behavior, it wont emit another batch until youve called batcher.next()
19:48:53  <mbalho>rescrv: the goal was to make sure backpressuve events propagate, so when it emits a batch it then pauses itself so writes from the source stream will receive backpressure signals
19:49:28  <mbalho>rescrv: then when you call .next() it calls .resume() internally and will emit another batch when it fills up (which may be immediately if there was enough buffered data while it was paused)
19:50:03  <mbalho>rescrv: while it is paused anything that writes to it will receive a return value of false which is an advisory 'stop sending me data' signal
19:52:37  <mbalho>mikeal: do you have a recommendation on how to do batches with read on write semantics?
19:53:05  <mikeal>level-mutex shows pretty much how to do read on write locks properly
19:53:16  <mikeal>it batches *everything* pending together
19:53:22  <mikeal>if you want to break it up by the ideal length
19:53:35  <mikeal>or wait some static number of ms before write waiting for more to buffer
19:53:44  <mikeal>i would add those semantics to the write in level-mutex
19:53:55  <mikeal>there are a lot of "gotchas" in the mutex
19:54:02  <mbalho>mikeal: if i have 1000 documents to write and i need to check if they exist first and compare revs do i need to do 1000 gets before i create my batch?
19:54:06  <mikeal>which is why is say "just look at level-mutex"
19:54:15  <mikeal>not saying you can't write your own, but you should steal most of the code there first
19:54:44  <mikeal>so, level-mutex doesn't care if the reads are *required*, some higher order library is readings before writing and making that judement
19:54:47  <mbalho>mikeal: i have a thing that breaks up an object into arrays of objects at optimal batch sizes https://github.com/maxogden/level-batcher
19:55:02  <rescrv>from my experience with HyperLevelDB, you should write what you have when you have it. batch only what comes in while you're writing. Based on the internals, it'll not be a loss of performance.
19:55:04  <mikeal>the trick is, all the reads need to be done at once, then all the writes, to insure that no write happens between my read and my write
19:56:41  <mbalho>mikeal: so do you get one at a time until youve got them all and then write all puts in one batch before doing anything else/
19:57:00  <mikeal>no no no
19:57:16  <mikeal>all the reads happen in a block, then return them in the order the mutex got them, then *all* the writes
19:57:48  <mikeal>so if you want atomicity at the document level you write that in your library, the mutex just insures that writes don't happen between your last read and the write you're scheduling
19:58:05  <mikeal>keep in mind that, you'll still need to proactively decline any more writes to the same document after you've scheduled one
19:58:20  <mikeal>as an example you can look at couchup
19:58:48  <mikeal>but you'll have to sort through all of couch's revision checking/setting and newWrites:false logic
19:59:30  <mbalho>my .put is here https://github.com/maxogden/dat/blob/master/lib/storage.js#L82
20:00:27  <mikeal>that is really similar to mine
20:00:37  <mikeal>yeah, you need to protect against concurrent writes in the same mutex lock
20:01:49  <mikeal>mbalho: https://github.com/mikeal/couchup/blob/master/index.js#L128
20:02:01  <mikeal>https://github.com/mikeal/couchup/blob/master/index.js#L311
20:02:01  <mbalho>mikeal: i think what i will do is have 2 code paths, one for data without _ids (where i know they dont exist yet) that just inserts as batches as fast as possible
20:02:07  <mbalho>and another for ones that need read on write
20:02:11  <mikeal>https://github.com/mikeal/couchup/blob/master/index.js#L328
20:02:19  <mbalho>mikeal: oh right
20:03:55  <mbalho>mikeal: i wonder if ids can be optimized for read on write insert speed
20:03:55  <ednapiranha>mbalho: !
20:04:00  <mbalho>ednapiranha: YO
20:04:03  <ednapiranha>mbalho: I AM IN PDX
20:04:06  <mbalho>ednapiranha: wassup pdx hipsster
20:04:08  <ednapiranha>in the moz office
20:04:11  <ednapiranha>lol
20:04:19  <mbalho>ednapiranha: high five dietrich for me
20:04:21  <mbalho>ednapiranha: and mikeal
20:04:25  <ednapiranha>i dont think he's in today
20:04:27  <mbalho>(for me and mikeal)
20:04:28  <mikeal>mbalho: what do you mean?
20:04:28  <mbalho>ah
20:04:37  <ednapiranha>haha
20:04:45  <mikeal>how would you optimize the id?
20:05:08  <mbalho>mikeal: so you can use a read stream to get the ids more efficiently than a buncha random reads
20:05:09  <mikeal>i store the sequence in the key, so a get() is really a peekLast
20:05:22  <mbalho>mikeal: but i guess you dont know how sparse the bulk insert data is
20:05:40  * jcrugzzquit (Ping timeout: 260 seconds)
20:05:57  <mikeal>oh i see, so like, rather than do a bunch of gets for level keys in the mutex block you want to get a readstream?
20:06:13  * Acconutjoined
20:06:19  <mikeal>i talked with rvagg about optimizing reads at one point
20:06:32  <mbalho>yea but full table scans are slow on large datasets so i dont think that would make sense for batches where you are inserting two rows like ['a', 'z']
20:06:36  <mikeal>and i think i remember him saying that it's best to just do them concurrently
20:07:05  <mikeal>i wonder how well that is optimized
20:07:08  <mbalho>mikeal: like i if i sort the batch and then use a series of readstreams...
20:07:37  <mbalho>hmm but theres still no way to know how sparse the data is
20:08:01  <mikeal>i'm willing to bet that 1) it is still faster to do them concurrently 2) it is probably faster to sort them before asking for them
20:08:14  <mikeal>so long as you return them in order out of the mutex
20:08:32  <mbalho>like if i sort and then do all the individual .gets in sorted order?
20:08:34  <mikeal>if you don't return them in out of the mutex you aren't being "fair" about who wins in a concurrent udpate
20:08:52  <mikeal>mbalho: yeah, i'm willing to bet it's faster
20:09:44  <mbalho>rescrv: do you think that random reads executed in order sorted by key are faster than random reads executed in random order?
20:10:43  <rescrv>mikeal: I was not talking about a r/w case. mbalho has made a batcher based on my feedback, and I was suggesting that he not wait for a full batch size unless he has to
20:11:11  <rescrv>mbalho: I don't think it'll matter unless doing so keeps things fresh in cache (e.g., adjacent keys land in an SST).
20:11:20  <rescrv>I certainly don't think it'll be like it was for writes
20:11:24  <rescrv>that's just an extreme case
20:12:00  <mbalho>gotcha
20:12:11  <mbalho>rescrv: also good point on the batcher semantics
20:14:56  <mikeal>right
20:15:35  <thlorenz>so my level writes are getting backed up and cause memory to grow somehow
20:15:36  <mikeal>rescrv: what i'm interested in this for is level-mutex, which already holds on to a batch until the reads are completed, and it would be best to restrict the writes to a certain size if it is faster to wait
20:15:54  <mikeal>thlorenz: yeah, same issue, we need streams2 pull style streams
20:15:59  <thlorenz>switching to leveldown-hyper brought only small relief
20:16:09  <mbalho>thlorenz: how are you writing
20:16:16  <thlorenz>mbalho: batches
20:16:25  <mbalho>thlorenz: one at a time or multiple, also how big are the batches
20:16:31  <thlorenz>but pretty big ones and lots - downloading half of github ;)
20:16:55  <mbalho>thlorenz: read through https://github.com/maxogden/level-bulk-load/issues/1
20:16:58  <thlorenz>haven't measured, but they'd contain data for all the repos of a user
20:17:00  <mikeal>thlorenz: i'm downloading *so* much of GitHub :)
20:17:13  <mikeal>i'm 300K through 3M commit details i need to get
20:17:20  <mikeal>i don't think it'll complete before my talk actually
20:17:23  * Acconutquit (Quit: Acconut)
20:17:27  <thlorenz>mikeal: :) it's for valuepack I need all that data to make decisions about how good a package is
20:17:59  <thlorenz>mikeal: funny thing is I blamed request since in the heapdump all those ClientRequests were retained
20:18:17  <mikeal>i've identfified a little over 3M commits in the larger node community
20:18:26  <thlorenz>so I switched to hyperquest, but then realized that they were retained due to level being backed up ;)
20:18:38  <mikeal>getting the basic commit data you can get 30 in a response
20:18:45  <mikeal>but commit details are a request per commit
20:18:47  <thlorenz>so not request's fault as far as I can tell
20:18:49  * ednapiranhaquit (Read error: Connection reset by peer)
20:18:49  <mikeal>and i don't think it'll finish in time
20:18:56  <mikeal>thlorenz: you should use requestdb
20:19:00  * ednapiranhajoined
20:19:04  <mbalho>thlorenz: biggest problem is probably your batch size
20:19:08  <mikeal>it'll just make sure you never ask github for the same thing twice
20:19:15  <mbalho>thlorenz: but if you read that thread you'll understand all the nuance
20:19:16  <thlorenz>mbalho: you mean too big?
20:19:23  <mbalho>thlorenz: yes
20:19:36  <thlorenz>ok will read it and look into requestdb as well
20:19:55  <thlorenz>mbalho: I thought the less batches you do, the better? therefore the bigger the better?
20:20:48  <mbalho>thlorenz: nope
20:21:26  * Acconutjoined
20:21:54  <thlorenz>mbalho: so level-batcher will chunk them properly?
20:22:12  <mbalho>its a work in progress, but thats the idea
20:22:49  <thlorenz>ah, ok, well I'm getting most of my data down over the course of a few hours, but there is no way I could run this as a job on some server
20:23:22  <mbalho>what?
20:23:24  <thlorenz>so I'm gonna move on to analyze the data for now and maybe we can sit together at nodeconf and look at some heapdumps, etc. in order to find the best solution
20:23:39  <mbalho>oh youre talkign to mikeal
20:23:50  * Acconutquit (Client Quit)
20:23:57  <thlorenz>to whoever is interested in helping to fix these issues ;)
20:24:10  <mbalho>thlorenz: the memory usage + crashing issues?
20:24:18  <mbalho>thlorenz: the fix is to not do huge batches
20:24:47  <mbalho>thlorenz: also upgrade to the newest level release, there were memory leak fixes a couple of days ago
20:24:56  <thlorenz>mbalho: so really just split one up into say ten and batch them right after another?
20:25:18  <mbalho>thlorenz: dont just pick random values, do it based on the write buffer size
20:26:12  <thlorenz>mbalho: of course, i.e. try to get as close to some ideal buffer size -- which is what btw?
20:27:09  <mbalho>the size is less important than making sure your batches arent bigger than the size
20:27:19  <thlorenz>runnning leveldown 0.8.0 should I upgrade to 0.8.2? (actually I was using leveldown-hyper 0.8.2)
20:27:32  <mbalho>whatever newest level uses
20:27:35  <thlorenz>ok
20:28:24  <thlorenz>mbalho: yeah, I'm on those versions
20:28:56  <thlorenz>mbalho: still not clear which size my batches should not exceed
20:29:13  <mbalho>the write buffer size
20:30:22  <mbalho>https://github.com/rvagg/node-leveldown/#leveldownopenoptions-callback
20:30:46  * kenansulaymanjoined
20:30:56  <thlorenz>mbalho: thanks got it
20:34:37  * Acconutjoined
20:36:37  <thlorenz>mikeal: btw the lastModified github things only work moderately
20:37:23  <thlorenz>I feel like I'm getting modified when I know it wasn't, so I end up pulling down lots of data I already pulled during a previous run :(
20:38:05  <thlorenz>works good enough to pass it along though - still saves some requests at least ;)(
20:43:36  <thlorenz>mikeal: what are you using to follow the paged data? I wrote https://github.com/thlorenz/request-all-pages for that
20:43:57  <thlorenz>really annoying you can get max 100 only and each time it costs you a request
20:44:18  * wolfeidauquit (Remote host closed the connection)
20:44:26  * wolfeidaujoined
20:51:37  * kenansulaymanquit (Quit: ≈ and thus my mac took a subtle yet profound nap ≈)
20:55:18  * Acconutquit (Quit: Acconut)
21:03:21  <mikeal>thlorenz: i'm not doing lastModified, i'm literally holding the whole response and returning it if i have it.
21:04:39  <thlorenz>mikeal: you mean you to ETAG?
21:05:15  <thlorenz>cause I tried that before I think with even worse results and it doesn't work on paged data (unless you do it per page which gets messy)
21:05:41  <mikeal>thlorenz: https://gist.github.com/mikeal/6429576
21:06:05  <mikeal>thlorenz: no no no, requestdb stores the response, i never make the same request twice, if i got a response, ever, i just return that.
21:06:21  * ryan_ramagequit (Quit: ryan_ramage)
21:06:32  <thlorenz>mikeal: well I have to make same request twice to update my data, i.e. if someone starred another repo
21:06:40  <thlorenz>like a day later or so
21:06:45  <mikeal>right, i'm not doing any of those yet
21:07:05  <mikeal>i'm going to time bound all my data to exclude the last month anyway
21:07:37  <mikeal>requestdb needs an option to re-fetch using cache headers
21:07:42  <mikeal>but some things you shouldn't use it on
21:07:47  <thlorenz>ah, that makes sense, but I wanna stay more up to date, otherwise people will not get the results they expect
21:07:47  <mikeal>like urls to git commits
21:08:32  <thlorenz>mikeal: I'm munging data before I store it (i.e. remove things I don't need), so I don't think requestdb as it is would work for me
21:08:50  <thlorenz>I also index stuff while I store it
21:09:57  <mikeal>i do both
21:10:10  <mikeal>i use requestdb to get data, but then i store what i want in a secondary store
21:10:26  <mikeal>that way, if i want to build a new dataset from requests i've already made i can do it very quickly
21:10:50  <mikeal>i don't use requestdb as my primary store, i just use it as a drop-in replacement for request to get the data
21:23:54  * jcrugzzjoined
21:26:12  <thlorenz>mikeal: that makes a lot of sense actually
21:27:00  <thlorenz>I had figured that my bottleneck would be github and the requests, so munging data in the meantime made sense
21:27:13  <thlorenz>now level turns out to be my bottleneck :)
21:29:10  <mikeal>it all depends
21:29:23  <mikeal>i hit write limitations when 1 github request equalled 30 level writes
21:29:34  <mikeal>that was enough, over time, to overload it
21:29:52  <mikeal>but now, i'm averaging like 2 requests a second
21:30:04  <mikeal>and each one is only a single write, so github is my bottleneck :)
21:30:22  <mikeal>i hit some kinda special rate limiting
21:30:36  <mikeal>they are actually restricting the number of requests they let in from me
21:34:33  * ryan_ramagejoined
21:55:56  * dguttmanjoined
22:03:22  * ryan_ramagequit (Quit: ryan_ramage)
22:08:28  * esundahlquit (Remote host closed the connection)
22:08:54  * esundahljoined
22:10:05  * esundahl_joined
22:13:33  * esundahlquit (Ping timeout: 276 seconds)
22:19:08  * jcrugzzquit (Ping timeout: 260 seconds)
22:27:52  * thlorenzquit (Remote host closed the connection)
22:30:51  * disordinaryjoined
22:44:19  <levelbot>[npm] tik@0.0.6 <http://npm.im/tik>: command line key/value store (@jarofghosts)
22:45:36  * thlorenzjoined
22:55:59  <mikeal>if you write an identical key/value is level smart enough to disregard it?
22:56:34  <mbalho>it will override
22:56:36  <thlorenz>mikeal: I'd say it would overwrite it
22:56:39  <thlorenz>:)
22:56:48  <mikeal>overwrite it with the exact same data?
22:57:05  <mbalho>yea
22:57:19  <mikeal>ok
22:57:32  <mikeal>so if there's a high probability of writing same data twice
22:57:39  <brycebaril>that's a good question, given the compaction stuff it might be an optimization they considered
22:57:40  <mikeal>it's faster to read-before-write
22:58:08  <mikeal>easy enough to write a benchmark for
22:58:23  <brycebaril>I bet rescrv might know
22:58:51  <rescrv>mikeal: it depends how much data is in the "hot" set. If you're likely to write the same key twice within a write buffer, don't bother doing the read-before-write.
22:59:07  <mikeal>hrm....
22:59:21  <mikeal>what is the scope of the write buffer?
22:59:27  <mikeal>like, the last hundred keys? thousand keys?
22:59:41  <rescrv>I'm sure there's a sweet spot where read-before-write will win, but for extremely large datasets (10x ram), and extremely skewed datasets, you should just write and not try to read before write.
22:59:59  <rescrv>mikeal: the last write_buffer/avg-key-size keys
23:00:16  <mikeal>my keys tend to be large since they are binary
23:00:33  <mikeal>with this i could get away with strings tho
23:00:46  <mikeal>i'm writing a set implementation
23:00:48  <mikeal>kind of
23:00:52  <levelbot>[npm] valuepack-core@0.3.14 <http://npm.im/valuepack-core>: Core utils and configurations for valuepack, not at all useful by itself. (@thlorenz)
23:01:07  <thlorenz>mbalho: btw I was wondering if I use json encoding for my values, how would I know what size my batch will have?
23:01:16  * ednapiranhaquit (Remote host closed the connection)
23:01:54  <rescrv>mikeal: how often are you overwriting things?
23:02:00  <mbalho>thlorenz: https://github.com/maxogden/level-batcher/blob/master/index.js#L51
23:02:19  <thlorenz>thanks :)
23:02:32  <mikeal>well, with a set you can't put the same thing in twice
23:02:49  <mikeal>so the way i was thinking of working it is to encode the bucket and value in to the key
23:03:04  <mikeal>so ["bucket", "key1"]
23:03:08  <mikeal>no value
23:03:25  <thlorenz>mbalho: this https://github.com/maxogden/level-batcher/blob/master/index.js#L15 LOL :)
23:03:40  <mikeal>the easiest thing to do would be to just call write() no matter what
23:03:51  <mbalho>thlorenz: ;)
23:04:01  <brycebaril>mikeal: for a set it might be handy to do the read first anyway, e.g. with Redis sets, the return value is bool/false as to whether the member was added or not.
23:04:02  <rescrv>so you want to take a 2-tuple and insert it into a set iff it doesn't already exist; else do nothing
23:04:24  <mikeal>good call
23:04:47  <rescrv>brycebaril: unless he's doing overwrites often within a short time frame, or his set is so small he could keep it in memory, or he wants the true/false semantics, he'd be best to just "write"
23:05:34  <mikeal>i can optimize the reads in a short time frame just as easily tho
23:05:36  <brycebaril>right, the true/false thing would be a feature that might be useful enough to pay the cost of the read
23:05:42  <mikeal>in fact, i could just keep an LRU around
23:05:55  <rescrv>mikeal: level does that by default (sort-of)
23:05:56  <mikeal>that would be faster than relying on level to optimize the noop writes
23:06:32  <rescrv>under some situations, yes
23:06:36  <rescrv>under others, no
23:06:47  <rescrv>most workloads I use leveldb for, the answer would be no
23:07:12  <rescrv>unless you have a very skewed distribution
23:07:26  <mikeal>the key distribution is basically random
23:07:37  <rescrv>then an LRU won't take much off the top
23:07:38  <mikeal>it's not predictable
23:07:58  <mikeal>umn, no, that's not accurate
23:08:17  <mikeal>some keys will show up more often than others, but the sorting of those keys are going to be wide
23:08:31  <mikeal>that's what i meant by the distribution being random
23:08:34  <brycebaril>how would a LRU interact with multilevel? that could be problematic
23:08:49  <rescrv>mikeal: I read "uniformly random". Ugh, too much staring at a monitor
23:09:28  <mikeal>my assumption is always that within the scope of my object i'm the only person manipulating whatever i need a consistent state on
23:09:41  <mikeal>everyone must go through me
23:09:52  <rescrv>that's a good assumption
23:09:55  <mikeal>that's the only way that level-mutex can actually guarantee anything
23:10:30  * dominictarrquit (Quit: dominictarr)
23:10:33  <mikeal>you can stick multiple mutexes around a level instance, the assumption is that they are managing individual and separate consistency guarantees.
23:10:34  <brycebaril>ahh, ok, I had the impression you were making a more generic level-set implementation that people might want to use in a broader sense
23:10:36  <thlorenz>mbalho: do I also have to ensure I only got one batch going at a time, i.e do them in series?
23:10:49  <mikeal>you could use it to store data in sets :)
23:11:00  <mikeal>i'm not entirely sold on mutli-level TBH
23:11:44  <mikeal>it only really works if you don't have a data model above the level layer
23:11:47  <mbalho>thlorenz: not 100% sure but i think so
23:12:02  <mikeal>cause almost any data model will need a consistency scheme
23:12:08  * fallsemoquit (Quit: Leaving.)
23:12:30  <mikeal>that's why all of my modules expose data and transfer it over SLEEP
23:13:24  <thlorenz>mbalho: ah, you'd think leveldown would do this for you if that is the case - maybe it is rvagg do you know anything about that?
23:13:34  * fallsemojoined
23:14:04  <mbalho>thlorenz: leveldown isnt optimized for this stuff yet
23:14:37  <thlorenz>ok, thanks - I'll keep that in mind
23:15:00  <rescrv>what consistency guarantees are you guys looking for?
23:15:17  <mikeal>i think i'm going to start using sublevel for my internal slicing instead of bytewise slices
23:17:49  * tmcwquit (Remote host closed the connection)
23:18:22  * tmcwjoined
23:18:52  <mikeal>if i don't care about the value i'm setting, what is the most efficient one to use?
23:22:35  * disordinaryquit (Quit: Konversation terminated!)
23:22:45  * tmcwquit (Ping timeout: 248 seconds)
23:24:54  * jcrugzzjoined
23:27:10  * soldairjoined
23:28:15  * esundahl_quit (Remote host closed the connection)
23:28:42  * esundahljoined
23:29:36  * jcrugzzquit (Ping timeout: 276 seconds)
23:29:49  <levelbot>[npm] level-batcher@0.0.2 <http://npm.im/level-batcher>: stream designed for leveldb that you write objects to, and it emits batches of objects that are under a byte size limit (@maxogden)
23:30:27  <mbalho>just updated it to always write batches when any buffered data exists
23:32:53  * esundahlquit (Ping timeout: 248 seconds)
23:34:15  * rickbergfalkquit (Quit: http://www.kiwiirc.com/ - A hand crafted IRC client)
23:35:02  <mbalho>rvagg: throw this on your todo list after lmdb https://github.com/spotify/sparkey/blob/master/README.md#performance
23:35:54  * mikealquit (Quit: Leaving.)
23:36:09  <mbalho>rvagg: also https://github.com/bureau14/leveldb
23:37:37  * mikealjoined
23:40:19  <rescrv>mbalho: does bureau14's fork do anything more than windows support?
23:41:28  <mbalho>rescrv: not sure
23:42:17  <mbalho>lol "Is not 100% compliant with ugly Google coding style;"
23:42:19  * fallsemoquit (Quit: Leaving.)
23:47:09  <rescrv>it sounds to me like they added windows support and decided to change the code to C++11 for the cargo-cult of it. The description doesn't say why they changed things, except for things like adding c++11 support (which doesn't actually change the function and likely doesn't make it faster).
23:59:20  <levelbot>[npm] dat@0.2.1 <http://npm.im/dat>: data sharing and replication tool (@maxogden)