00:01:42  * brianloveswordsjoined
00:17:13  * brianloveswordsquit (Quit: Computer has gone to sleep.)
00:19:46  * brianloveswordsjoined
00:30:55  * brianloveswordsquit (Quit: Computer has gone to sleep.)
00:34:38  * pbw__joined
00:44:22  * thlorenzjoined
00:47:36  * mikealquit (Quit: Leaving.)
01:16:05  * ramitosquit (Remote host closed the connection)
01:17:03  * ramitosjoined
01:29:09  * jerrysvquit (Remote host closed the connection)
02:06:33  * ednapiranhaquit (Quit: Leaving...)
02:06:59  * thlorenzquit (Remote host closed the connection)
02:07:31  * thlorenzjoined
02:08:54  * thlorenzquit (Read error: Connection reset by peer)
02:09:15  * thlorenzjoined
02:13:47  * thlorenzquit (Remote host closed the connection)
02:14:21  * ednapiranhajoined
02:15:21  * thlorenzjoined
02:18:50  * binocarlosquit (Read error: Connection reset by peer)
02:28:43  * thlorenzquit (Ping timeout: 240 seconds)
02:37:42  * daviddiasjoined
02:42:03  * daviddiasquit (Ping timeout: 252 seconds)
02:42:33  * pbw__quit (Quit: Connection closed for inactivity)
02:53:08  * contrahaxjoined
02:53:22  * contrahaxchanged nick to _contrahax
02:53:40  * tec27changed nick to _tec27
03:15:28  * saibotvisadquit (Quit: Leaving.)
03:19:21  * saibotvisadjoined
03:20:43  * saibotvisadquit (Client Quit)
03:31:56  * daviddiasjoined
03:32:15  * _contrahaxchanged nick to contrahax
03:32:23  * _tec27changed nick to tec27
03:36:17  * daviddiasquit (Ping timeout: 258 seconds)
03:51:27  * ednapiranhaquit (Quit: Leaving...)
04:01:52  * contrahaxquit (Quit: Sleeping)
04:26:04  * daviddiasjoined
04:30:29  * daviddiasquit (Ping timeout: 245 seconds)
04:34:34  * Sorellaquit (Quit: It is tiem!)
04:37:14  * ramitosquit (Ping timeout: 240 seconds)
04:38:01  * ramitosjoined
04:42:36  * ramitosquit (Ping timeout: 258 seconds)
04:44:57  * ramitosjoined
04:46:13  <JasonSmith>How do I stream a value? e.g. I want the value for "video.mp4" and I know it's 50TB but I just want to stream it over http?
04:46:41  <JasonSmith> /cc jcrugzz Any thoughts? I don't want to stream the db keyvals, just the binary blob of one specific value
04:49:41  <jcrugzz>JasonSmith: if you are saying the video.mp4 is in the leveldb database there is no way to stream from a db.get persay
04:49:45  <jcrugzz>id use something like https://github.com/dominictarr/content-addressable-store
04:49:52  <jcrugzz>and just store the hash and metadata in the leveldb
04:50:09  <jcrugzz>and just use the filesystem for the video or binary content
04:50:24  <jcrugzz>so you can stream it straight over http
04:53:05  <JasonSmith>Oh bummer
04:53:31  <JasonSmith>I guess I'm spoiled by couch attachments.
04:53:46  <JasonSmith>Thanks jcrugzz
04:54:10  <jcrugzz>JasonSmith: yea totally. i think using that type of store you could build a nice wrapper using leveldb and the filesystem
04:54:27  <jcrugzz>JasonSmith: to make the same type of experience
04:54:33  <jcrugzz>and np
04:57:14  <JasonSmith>Surely it's not a limitation of level down
04:57:45  <JasonSmith>I should think the c++ API let's you store values which are larger than system memory
04:58:48  <JasonSmith>Maybe I could build a stream that just stores 1MB chunks in a sublevel
05:13:57  * saibotvisadjoined
05:17:31  * blessYahuquit (Ping timeout: 252 seconds)
05:20:14  * daviddiasjoined
05:22:07  <jcrugzz>JasonSmith: yea i think theres just no API for it. But the filesystem is still better than leveldb is at storing files iirc cc juliangruber. Due to how the GC works
05:24:14  * daviddiasquit (Ping timeout: 245 seconds)
05:52:25  * fritzyjoined
06:04:20  * saibotvisad1joined
06:07:55  * saibotvisadquit (Ping timeout: 276 seconds)
06:16:45  * mikealjoined
06:19:36  * daviddiasjoined
06:23:53  * daviddiasquit (Ping timeout: 252 seconds)
06:51:03  * saibotvisad1quit (Quit: Leaving.)
07:36:10  * fritzyquit (Remote host closed the connection)
07:39:11  * fritzyjoined
08:07:44  * fritzyquit (Remote host closed the connection)
08:14:23  * daviddiasjoined
08:18:49  * daviddiasquit (Ping timeout: 245 seconds)
08:37:19  * sygijoined
08:38:31  * contrahaxjoined
09:06:03  * contrahaxquit (Quit: Sleeping)
10:02:21  * daviddiasjoined
10:06:49  * daviddiasquit (Ping timeout: 252 seconds)
12:36:58  <rescrv>JasonSmith: you'd be best to put anything over 2MB onto the filesystem and store a pointer to it.
12:37:19  <JasonSmith>rescrv: why?
12:37:28  <rescrv>JasonSmith jcrugzz: my recommendation is to not store anything over 4K in LevelDB values
12:37:35  <JasonSmith>why not?
12:37:39  <rescrv>internally, the SSTs are 2MB
12:37:56  <rescrv>if you have a file that is bigger than 2MB, you're going to end up with one kv pair per SST
12:38:08  <rescrv>with all the overhead of the SST, and none of the flexibility of a file (like streaming)
12:38:08  <JasonSmith>ok, I thought leveldb was some kind of spiritual replacement for bigtable, and so I thought it would be comfortable with very large values
12:38:41  <rescrv>the reason I recommend 4K is because you get a couple hundred values into each SST
12:39:06  <JasonSmith>yeah
12:39:11  <JasonSmith>4k is really tiny, that's just one disk block
12:39:18  <JasonSmith>Not disagreeing with you
12:39:26  <rescrv>JasonSmith: it's built using many ideas from BigTable. One idea they have in BigTable is to put large files in GFS (now Colossus), and refer to them from the BigTable value.
12:39:36  <JasonSmith>I would imagine people are storing small binary files like images in level though, no problem, right?
12:39:41  <rescrv>yes
12:40:29  <JasonSmith>Oh, this channel is general leveldb? I thought it was all Node.js people :)
12:40:38  <JasonSmith>rescrv: Ok, thanks for the perspective
12:41:13  <rescrv>there's a lot of confusion on that. I'm one of the non-Node.js devs (I've used it; not my primary platform), and the message in the main channel blindly directs people here, so I'm here to help out when they end up here.
12:42:06  <JasonSmith>That's good. I don't know the code, but based on how abstract LevelUp is (it now has LDAP and MySQL back-ends) I imagine you lose some lowe-level features with that abstraction
12:42:58  <rescrv>I don't think you lose many. rvagg did a pretty good job, and levelup came from the leveldb API if I'm not mistaken. The only thing I know is missing is explicit snapshot support.
12:43:19  * thlorenzjoined
12:43:23  <rescrv>you also lose the ability to manage your own memory, but if you're using JS, you want to lose that anyway
12:43:25  <JasonSmith>I think for the Node.js world, leveldb was "good enough" (don't get me wrong, it's quite good) but basically it was right time right place when the community needed to build momentum around a very simple get/put/del API
12:44:15  <JasonSmith>rescrv: Do you know of any replication projects out there that have gained much community momentum?
12:44:42  <rescrv>JasonSmith: others can say that better. I'm partial to HyperDex, but I wrote it, and it's not embeddable. It does have Node bindings though.
12:44:50  <JasonSmith>That is what I'm working on. I'm using Node because of the convenient interception points of updates
12:45:29  <JasonSmith>My current theory is a "sublevel" (sorry to get levelup-specific) with a change log
12:45:42  <JasonSmith>I intercept writes and add a change entry to the batch
12:46:41  <JasonSmith>I am basing it on the CouchDB model (I'm a CouchDB developer) so I may have a hammer-nail problem. But I think it will be simple; however I haven't figured out how to replicate batch updates yet (transactions)
12:47:39  <rescrv>You'll have difficulty there because LevelDB doesn't keep it as a batch internally
12:47:59  <JasonSmith>rescrv: Oh really, I thought I could get an all-or-nothing semantic from a batch call?
12:48:39  <JasonSmith>I'm looking at https://github.com/rvagg/node-levelup#batch
12:48:53  <JasonSmith>In Node, there is a way to intercept and amend batch operations before they are sent to leveldown
12:49:37  <rescrv>You do. But unless you take a snapshot immediately after the batch completes, with no intervening writes, you won't be able to read those items as they were in the batch, because someone else could have overwrote them.
12:49:40  <JasonSmith>I would be comfortable with no batch guarantees though. That's how Couch does it because it's "impossible" to guarantee correct replication in general
12:49:52  <JasonSmith>rescrv: Right, I see
12:50:18  <rescrv>I would disagree on that "impossible", unless we're going to the absurdity that it's impossible to write applications in general.
12:50:24  <JasonSmith>rescrv: Yes that is quite similar to my batch replication problem
12:50:44  <JasonSmith>Yes I quoted "impossible," it is doable but in the Couch model it's considered not worth it. It sacrifices too much architectural simplicity
12:51:03  <rescrv>hyperleveldb provides a replay iterator that will give you most of what you want
12:51:12  <JasonSmith>ok, thanks
12:51:16  <rescrv>whatever consumes the replay iterator will move forward in time
12:51:30  <rescrv>this is not wrapped by levelup, so you'll have to add that
12:51:36  <JasonSmith>right
12:52:06  <JasonSmith>in couch, the change log is very simple. You have an index of changes (first change gets id 1, second gets id 2, etc.)
12:52:32  <JasonSmith>The trick is, every key appears in the change log exactly once, guaranteed
12:52:48  <JasonSmith>so to get the latest state, you just pull down changes from your last checkpoint (e.g. "show me changes since id 1234)
12:53:19  <rescrv>how do you guarantee a key won't appear twice, e.g., if you update its value?
12:54:15  <rescrv>the replay iterator works much the same way
12:54:27  <JasonSmith>Your index is (conceptually) change ids (integers) to keys, so suppose you do put("foo","I am foo"); put("bar", "I am bar");put("foo", "I am the 2nd foo")
12:55:03  <JasonSmith>the index would be [1="foo"] then [1="foo", 2="bar"] then finally [2="bar", 3="foo"]
12:55:23  <rescrv>got it
12:55:26  <JasonSmith>So if I scan that index starting from 0, I will get "bar", "foo"
12:55:31  <rescrv>so you have a secondary structure looking into the log
12:55:32  <JasonSmith>I will look up those values and I've got my replica
12:55:37  <JasonSmith>yes
12:55:57  <rescrv>leveldb uses sequence numbers internally to order writes
12:55:58  <JasonSmith>Obviously though, integrity between values is not supported though. That's the price. Hey, it's NoSQL. Big whoop.
12:56:08  <rescrv>the replay iterator works on those sequence numbers
12:56:13  <JasonSmith>Right, it is actually called the by_sequence tree internally
12:56:15  <rescrv>there's no way to control them on the other end though
12:56:44  <JasonSmith>in CouchDB there is a REST API to read it. In fact, confidentially, this is the only nice thing about couch. Ha, ha I kid!
12:56:47  * thlorenzquit (Ping timeout: 255 seconds)
12:56:56  <rescrv>we use the replay iterator to give us integrity between values in HyperDex (with a lot of logic on top)
12:57:17  <JasonSmith>There is actually no replication "protocol." You just query that index and then fetch your values by key (or fetch "documents" by "id" in couch terminology)
12:57:31  <rescrv>yeah. That's probably a lot simpler to implement.
12:57:40  <JasonSmith>Hyperdex is an alternative to leveldb?
12:57:42  <JasonSmith>or a fork?
12:57:57  <JasonSmith>Forgive me, I'm expanding my database horizons :)
12:58:03  <JasonSmith>Been steeped in CouchDB since forever
12:58:05  <rescrv>HyperDex is a distributed key-value store that uses LevelDB internally as its backend (really, HyperLevelDB).
12:58:23  <JasonSmith>HyperLevelDB is a fork then?
12:58:26  <rescrv>yes
12:58:27  <JasonSmith>I've heard that there are a few forks out there
12:58:35  <JasonSmith>rescrv: And you're a HyperDex developer?
12:58:37  <rescrv>it's the fork of LevelDB that we use in HyperDex
12:58:38  <rescrv>yes
12:58:56  <JasonSmith>Cool! nice to meet you.
12:59:08  <JasonSmith>Does HyperDex sort of compete with Redis then?
12:59:16  <ogd>JasonSmith: check out dat, it's my NIH version of couchdb :P
12:59:29  <ogd>JasonSmith: im actually working on putting npm into dat now
12:59:40  <JasonSmith>ogd: Hey Max. Long time.
13:00:07  <JasonSmith>ogd: Yeah I am trying to make a SLEEPY feature in Levelup
13:00:27  <ogd>JasonSmith: nice, theres https://www.npmjs.org/package/sleep-ref which is okay
13:00:42  <ogd>just handles the http/tcp part
13:01:06  * thlorenzjoined
13:01:52  <rescrv>Nice to meet you as well! We're not directly competing with any one system. We have many of Redis's data structures, but we persist to disk, while they are in main memory. We offer document support like MongoDB, and with better fault tolerance too. It's a mix of things we think people would need.
13:01:57  <JasonSmith>ogd: "Syncasble" - is that a typo or some term of art?
13:02:04  <ogd>lol
13:02:07  <ogd>typo!
13:02:19  <ogd>or danish for 'it synchronizes'
13:02:39  <JasonSmith>Right it sounds like some jargon invented in Oakland
13:02:41  * thlorenzquit (Remote host closed the connection)
13:02:42  <JasonSmith>:p
13:03:16  <JasonSmith>rescrv: Cool! Yes I have heard of HyperDex. I've used Redis but I fear it has some scaling issues
13:05:03  <rescrv>most people I know that use Redis for more than a local cache (like you'd use memcached) have written their own network stack for it.
13:07:14  <rescrv>I should clarify that it's most people I've had long serious discussions with. Not causal blog posts or the like
13:09:05  <JasonSmith>Yes. It's true that every site that scales up needs to address its own specific problems in its own specific ways
13:09:20  <JasonSmith>but making a custom network stack for Redis sounds like a major major yak shave to me
13:10:28  <JasonSmith>The tragedy of scaling is that it's not even "one size doesn't fit all." It's that "every size fits at most one"
13:12:21  <rescrv>that's not quite true. People very much prefer to do things the way they are comfortable, and would much rather incrementally improve the design they have than wholesale switch to a better one. It's why you see people layering memcache on top of cassandra on top of postgres, rather than adopting a backend with better throughput and latency.
13:12:51  <JasonSmith>Yes that's true.
13:12:55  <JasonSmith>I used to work at Grindr.
13:13:02  <JasonSmith>which is a pretty big social network
13:13:18  <JasonSmith>They were on Google App Engine and they were paying easily 10x "too much" for hosting
13:13:27  <JasonSmith>it was topping over $50k per month by the time I left
13:14:33  <rescrv>could they have done better with 4 in house engineers to write a replacement?
13:14:50  <JasonSmith>App Engine is a very tempting platform to prototype but wow it gets expensive if you grow, and you are very much locked-in. I must admit though, as far as the technology, it did scale as promised
13:15:15  <JasonSmith>rescrv: In my opinion yes. I was just a consultant and I wanted to get more involved but the owner did not agree
13:15:28  <rescrv>and that 50K per month would have gotten you at most 4 additional engineers and no hosting.
13:15:52  <JasonSmith>They were only seeing about 20-50 hits a second
13:16:11  <rescrv>Average? Median? Peak?
13:16:15  <JasonSmith>that's not very expensive hosting, even if you multiply for high-availibility
13:16:17  <JasonSmith>peak
13:16:21  <JasonSmith>median was about 20
13:16:32  <JasonSmith>this was 3 years ago though
13:16:39  <rescrv>then that is a little pricey.
13:16:47  <JasonSmith>Yes, I ran npm after that
13:16:50  <JasonSmith>the node.js npm registry
13:17:06  <JasonSmith>It's apples-oranges perhaps. But we were processing 900 HTTP hits per second
13:17:09  <JasonSmith>peak
13:17:24  <JasonSmith>on just a mid-low-end SoftLayer box
13:17:45  <JasonSmith>The only reason we had replication and load balancing is we needed high-availability anyway, so we may as well balance the load while we were at it
13:18:38  <rescrv>it's all about what's being done. I don't know Grindr's internals, but Facebook can issue 1000s of hits against memcache per wall view. Imagine if that were not cached, or were charged per request.
13:19:18  <JasonSmith>Right, I think they added a memcache API after I left. When I was there they basically only had the basic compute service and data store API
13:20:02  <JasonSmith>Grindr was very simple, there were login/logout and stuff. But 99% of the cost was processing proximity searches
13:20:06  <JasonSmith>(the idea is finding other users nearby)
13:20:14  <JasonSmith>On a NoSQL database that can get tricky
13:21:01  <JasonSmith>There's a really brilliant video and I think writeup out there about "geoboxes." It's IMO a very clever way to do proximity search in a key-value database
13:21:35  <JasonSmith>Basically you break the earth into lat-long squares, and every square has a unique ID
13:21:46  <JasonSmith>The trick though, is you do that at several different resolutions
13:21:59  <JasonSmith>A small square might be a few meters, a large one might be tens of degress of latitude
13:22:36  <JasonSmith>To find people nearby, you find the ID of your squares ("geo boxes") at every resolution. Say there are 5 resolution levels
13:22:59  <JasonSmith>Then you do 5 queries, basically "show all users registered for geobox with ID <foo>"
13:23:02  <JasonSmith>Then you sort in memory
13:23:05  <JasonSmith>quite clever
13:23:17  <JasonSmith>using pythagorean or whatever
13:24:12  <JasonSmith>If you prefer fewer lookups but possibly longer latency, you can query for the smallest boxes first, and only follow up with larger ones if you haven't found enough people
13:40:03  * Sorellajoined
13:43:22  * thlorenzjoined
14:08:17  <rescrv>JasonSmith: sounds like an rtree. And HyperDex's hyperspace hashing could be applied to that problem quite well to make it efficient.
14:08:52  <JasonSmith>Oh, nice. I used to work with a guy who built an r-tree index for CouchDB
14:23:49  * thlorenzquit (Remote host closed the connection)
14:33:59  * daviddiasjoined
14:38:18  * daviddiasquit (Ping timeout: 240 seconds)
15:19:30  * mikealquit (Quit: Leaving.)
15:22:08  * mikealjoined
15:28:00  * daviddiasjoined
15:30:13  * brianloveswordsjoined
15:32:09  * daviddiasquit (Ping timeout: 245 seconds)
16:11:43  * brianloveswordsquit (Quit: Computer has gone to sleep.)
16:16:55  * brianloveswordsjoined
16:19:30  * ednapiranhajoined
16:20:22  * ednapiranhaquit (Client Quit)
16:56:36  * thefoxisjoined
17:08:49  * mikealquit (Quit: Leaving.)
17:15:00  * ednapiranhajoined
17:16:15  * daviddiasjoined
17:18:55  * mikealjoined
17:20:35  * daviddiasquit (Ping timeout: 252 seconds)
17:21:41  * ednapiranhaquit (Quit: Leaving...)
17:37:27  * blessYahujoined
17:59:56  * mafintoshquit (Ping timeout: 258 seconds)
18:00:19  * daleharveyquit (Ping timeout: 276 seconds)
18:02:10  * mafintoshjoined
18:02:36  <ralphtheninja>level-* should only be for pure level modules right?
18:02:57  <ralphtheninja>and not for applications that is ..
18:05:18  * daleharveyjoined
18:20:08  * brianloveswordsquit (Quit: Computer has gone to sleep.)
18:51:40  * fritzyjoined
18:52:55  * fritzy_joined
18:53:08  * fritzyquit (Remote host closed the connection)
18:53:22  * thlorenzjoined
19:03:17  * thefoxisquit (Quit: Connection closed for inactivity)
19:04:32  * daviddiasjoined
19:08:56  * daviddiasquit (Ping timeout: 258 seconds)
19:30:48  * fritzy_quit (Remote host closed the connection)
19:38:07  * mikealquit (Quit: Leaving.)
19:58:36  * daviddiasjoined
20:02:56  * daviddiasquit (Ping timeout: 255 seconds)
20:04:03  * daviddiasjoined
20:11:19  * daviddiasquit (Ping timeout: 245 seconds)
20:27:24  * fritzyjoined
20:55:05  * fritzyquit (Remote host closed the connection)
21:44:57  * fritzyjoined
21:51:07  * thlorenzquit (Remote host closed the connection)
22:13:32  * mafintoshquit (*.net *.split)
22:13:32  * wolfeidauquit (*.net *.split)
22:13:32  * dstokesquit (*.net *.split)
22:13:32  * ggreerquit (*.net *.split)
22:13:32  * jez0990quit (*.net *.split)
22:13:32  * gyaresuquit (*.net *.split)
22:13:32  * chiltsquit (*.net *.split)
22:13:33  * ramitosquit (*.net *.split)
22:13:33  * chapelquit (*.net *.split)
22:13:33  * book`quit (*.net *.split)
22:13:35  * jaynequit (*.net *.split)
22:19:36  * mafintoshjoined
22:19:36  * ramitosjoined
22:19:36  * wolfeidaujoined
22:19:36  * chiltsjoined
22:19:36  * gyaresujoined
22:19:36  * jez0990joined
22:19:36  * ggreerjoined
22:19:36  * dstokesjoined
22:19:36  * book`joined
22:19:36  * chapeljoined
22:19:36  * jaynejoined
22:52:28  * sygiquit (Quit: Connection closed for inactivity)
23:05:07  * fritzyquit (Remote host closed the connection)
23:24:20  * dguttmanjoined