00:00:01  * ircretaryquit (Remote host closed the connection)
00:00:08  * ircretaryjoined
00:13:32  * domanicjoined
00:27:22  * tixzquit (Remote host closed the connection)
00:32:45  * anvakajoined
00:45:40  * Bsonyjoined
00:50:30  * Bsonyquit (Ping timeout: 272 seconds)
00:59:32  * phatedquit (Remote host closed the connection)
01:00:34  * phatedjoined
01:04:59  * phatedquit (Ping timeout: 246 seconds)
02:14:38  * anandthakkerquit (Ping timeout: 246 seconds)
02:20:04  * domanicquit (Ping timeout: 255 seconds)
02:20:53  <mikolalysenko>I wonder which of the $+1b start ups are most vulnerable to peer to peer/free replacements
02:21:15  <substack>dropbox
02:21:57  <mikolalysenko>for sure
02:22:00  <mikolalysenko>slack
02:22:02  <mikolalysenko>github
02:22:18  <mikolalysenko>pinterest
02:22:32  <mikolalysenko>spotify
02:24:08  <mikolalysenko>the harder ones to replace are the ones which require strict moderation/human management
02:24:09  * anandthakkerjoined
02:24:41  <mikolalysenko>uber and airbnb would be tough to turn into something distributed
02:24:57  <mikolalysenko>also things that require physical infrastructure for shipping and storage of goods
02:25:19  * DamonOehlmanjoined
02:25:24  <mikolalysenko>like amazon or whatever online reatilers
02:26:24  <mikolalysenko>biotech/research also don't really scale with p2p infrastructure
02:27:02  <mikolalysenko>but pretty much any social media or storage company would get hosed
02:29:01  <substack>and things with more regulation, like airbnb and uber
02:29:14  <substack>youtube is going to be so dead
02:29:42  <substack>I don't think uber would be so hard to make distributed
02:30:01  <substack>it's just a reputation/trust network that connects demand with supply
02:30:35  <substack>it's just the regulatory side of things that they're better at solving
02:31:03  <substack>because they can buy influence in city and regional governments
02:31:32  <mikolalysenko>also I wonder if in the limit public transportation/robot cars will just kill them off
02:31:52  <substack>hopefully
02:32:18  <substack>mikolalysenko: have you ever heard of relational databases using something like multidimensional segment trees?
02:32:27  <mikolalysenko>yes
02:32:34  <mikolalysenko>segment trees are kind of like range trees
02:32:39  <mikolalysenko>but they store intervals, not points
02:32:54  <mikolalysenko>basically you take a segment tree and then do the range-tree trick to make it multidimensional
02:32:58  <mikolalysenko>but storage costs are expensive
02:33:31  <substack>yes, it seems somewhat ideal for the relational use case of select * from users where age < 25 and salary > 2000
02:33:36  <mikolalysenko>actually this algorithm uses segment trees (implicitly) https://github.com/mikolalysenko/box-intersect
02:33:47  <substack>oh neat
02:33:52  <mikolalysenko>for that special case you can even use range trees for that
02:34:08  <mikolalysenko>basically range trees ~ points as segment trees ~ intervals
02:34:24  <mikolalysenko>but the trick with that algorithm is that it doesn't build the segment tree explicitly
02:34:38  <mikolalysenko>it basically recursively sorts the boxes in place to detect all the intersections
02:35:02  <mikolalysenko>this saves the extra memory involved in build a full segment tree which is O(n log^d(n))
02:35:10  <mikolalysenko>and instead just uses O(n) memory
02:35:23  <substack>neat
02:35:49  <mikolalysenko>the one draw back though is that it won't work for an index
02:35:55  <substack>I'm just thinking of how this approach would serialize on disk
02:35:59  <substack>ah yeah :(
02:36:05  <mikolalysenko>it might just be r-trees which are the best
02:36:27  <mikolalysenko>r-trees seem to be the best in breed of the linear space data structures
02:36:41  <substack>is that only for 2d or 3d though?
02:36:49  <mikolalysenko>you can use them in whatever dimension you like
02:37:01  <mikolalysenko>but the drawback is that they don't have strong theoretical gaurantees
02:37:09  <mikolalysenko>and they can fail pretty catastrophically in the worst case
02:37:40  <mikolalysenko>also r-tree construction algorithms are a lot of heuristics and bullshit, again because there isn't a good theory for them
02:38:05  <mikolalysenko>so it can be hard to sort out what is going on with them
02:38:24  <mikolalysenko>I would look at r-bush though, whatever it does seems to work pretty well
02:38:51  <substack>thanks, will do
02:39:36  <substack>it seems like there should be more overlap with how query planners in relational databases generate indexes and spatial indexes
02:39:52  <mikolalysenko>they are closely related problems
02:40:09  <mikolalysenko>and there has been a lot of work in studying range searching in the context of query planning
02:41:41  <mikolalysenko>I think the best results for range searching use something like O(n log^{d-2} n ) space
02:41:47  <mikolalysenko>which is still not great for higher dimensions
02:42:27  <mikolalysenko>there is also this lower bound: https://www.cs.princeton.edu/~chazelle/pubs/LBOrthoRangeSearchReporting.pdf
02:42:36  <mikolalysenko>but there are ways to get around it
02:48:42  <substack>I think a big problem for these new distributed systems will be how to build indexes
02:50:14  <mikolalysenko>in principle you might be able to just reuse current ideas using bittorrent/webtorrent + seeking
02:50:29  <mikolalysenko>but there are probably ways that things could be even better
02:50:44  <mikolalysenko>since you do have the ability to launch multiple queries in parallel
02:51:03  <substack>I think that would be a good place to start
02:51:13  * anandthakkerquit (Quit: anandthakker)
02:52:35  <substack>I haven't seen anything about good ways to serialize these structures that minimize seeks and partial delivery though
02:53:12  * phatedjoined
02:54:07  <mikolalysenko>the correct answer is always use a b-tree
02:54:15  <mikolalysenko>but the question is then what value of b do you pick?
02:54:25  <substack>hah
02:54:32  <mikolalysenko>there are adaptive methods like van emde boas layout that work for any b, but are complicated
02:54:48  <mikolalysenko>in practice though you can often do well by just guessing B ~ one page or so
02:54:57  <substack>even if the data is higher dimensional?
02:55:01  <mikolalysenko>yes
02:55:10  <mikolalysenko>you just use b-kdtree or b-r-tree
02:55:14  <mikolalysenko>or b-range tree even!
02:55:17  <mikolalysenko>it is the same thing
02:55:19  <substack>hmm
02:55:27  <mikolalysenko>all you need to do to make things disk aware is store them in a b-tree, that is it
02:55:42  <mikolalysenko>and if you just pick a reasonable size of b, everything should just work
02:56:29  <mikolalysenko>the model for disk and network access is blocked transfers
02:56:38  * anandthakkerjoined
02:56:45  <mikolalysenko>so you just want to make sure that whenever you read a block you get as much out of it as you possibly can
02:57:04  <mikolalysenko>van emde boas is a kind of neat trick since you can think of it as something like an adaptive b tree
02:57:44  <substack>reading
02:57:47  <mikolalysenko>where you don't need to know b in advance, but it in some sense does a search on b when you read in a block
02:58:01  <mikolalysenko>actually, there is a good video on this
02:58:21  <mikolalysenko>https://courses.csail.mit.edu/6.851/spring14/lectures/L07.html
02:58:44  <mikolalysenko>but the terrible secret is that van emde boas is usually way slower and more complicated than just picking a large enough b
02:58:53  <mikolalysenko>even though it is in some sense the right thing to do in the limit
02:59:10  <mikolalysenko>it is just that in practice computers today have pretty standard block sizes, like 4k or so
02:59:46  <mikolalysenko>and you can make some reasonable assumptions and just use reasonably large b and everything should just work
03:01:36  <mikolalysenko>still the ideas are good and it is worth understanding them
03:02:24  <mikolalysenko>and I could still imagine maybe some settings where the adaptivity of van emde boas is useful, like when your block size is highly variable or you have some sort of variable/streaming block size
03:03:14  <mikolalysenko>ah another thing I should mention is that for laying out the nodes of a b-tree you want to use level order
03:03:22  <mikolalysenko>so that the higher nodes of the tree will all stay in cache
03:06:44  * thlorenzquit (Remote host closed the connection)
03:08:36  <substack>ok I think I have a good place to start poking around, thanks!
03:16:43  * DamonOehlmanquit (Ping timeout: 256 seconds)
03:45:25  * pfraze_quit (Remote host closed the connection)
03:47:56  * pfrazejoined
03:52:24  * rannmannjoined
04:01:39  * fotoveritequit (Quit: fotoverite)
04:07:34  * thlorenzjoined
04:12:06  * thlorenzquit (Ping timeout: 256 seconds)
04:21:41  * anandthakkerquit (Quit: anandthakker)
04:22:12  * anandthakkerjoined
04:24:17  <substack>mikolalysenko: would the b tree take care of the first dimension, then the kd-tree picks up from there?
04:38:45  <mikolalysenko>substack: you would probably turn the kdtree into a btree by collapsing say log(b) levels of the tree into one node
04:38:59  <mikolalysenko>so instead of having each node be a binary tree, just squish them all into one b-ary node
04:39:13  <mikolalysenko>you can use the same trick with range trees/r-trees too
04:39:55  <substack>ah that makes sense
05:00:00  <jjjohnny_>uber is perfect target
05:00:19  <jjjohnny_>we dont need a compant to allow people to give rides for money
05:00:33  <jjjohnny_>the regulations are the target, not uber
05:01:29  * phatedquit (Remote host closed the connection)
05:01:47  <jjjohnny_>companies*
05:02:05  * phatedjoined
05:02:28  <jjjohnny_>so, a secure network, with perhaps trust elements, is all we need to get around regulations and just allow people to give rides for money
05:06:11  * phatedquit (Ping timeout: 246 seconds)
05:11:20  * DamonOehlmanjoined
05:26:07  * harrow`joined
05:27:22  * harrowquit (Ping timeout: 245 seconds)
05:31:44  * domanicjoined
05:40:33  * phatedjoined
05:41:55  * pfrazequit (Remote host closed the connection)
05:49:54  * contrahaxjoined
05:56:21  * thlorenzjoined
06:00:53  * thlorenzquit (Ping timeout: 252 seconds)
06:29:45  * DamonOehlmanquit (Ping timeout: 264 seconds)
06:54:30  * piliquit (Remote host closed the connection)
06:54:58  * freeallquit (Remote host closed the connection)
07:33:29  * thealphanerdquit (Quit: thealphanerd)
07:45:06  * thlorenzjoined
07:49:28  * thlorenzquit (Ping timeout: 255 seconds)
07:50:16  * DamonOehlmanjoined
08:23:32  * domanicquit (Ping timeout: 246 seconds)
08:29:10  * contrahaxquit (Quit: Sleeping)
08:34:18  * freealljoined
08:42:48  * pfrazejoined
08:44:18  * freeallquit (Remote host closed the connection)
08:45:52  * thlorenzjoined
08:47:30  * pfrazequit (Ping timeout: 256 seconds)
08:50:34  * thlorenzquit (Ping timeout: 272 seconds)
08:53:53  * phatedquit (Remote host closed the connection)
08:54:28  * phatedjoined
08:58:50  * phatedquit (Ping timeout: 256 seconds)
09:19:52  * tixzjoined
09:46:26  * DamonOehlmanquit (Ping timeout: 265 seconds)
10:16:37  * thlorenzjoined
10:21:09  * thlorenzquit (Ping timeout: 256 seconds)
10:36:50  * Bsonyjoined
11:04:52  * ins0mniajoined
11:07:52  * ins0mniaquit (Remote host closed the connection)
11:21:33  * yoshuawuytsjoined
11:34:15  * tixzquit
11:40:54  * yoshuawuytsquit (Ping timeout: 256 seconds)
11:45:38  * yoshuawuytsjoined
12:02:21  * thlorenzjoined
12:05:43  * yoshuawuytsquit (Ping timeout: 252 seconds)
12:06:52  * thlorenzquit (Ping timeout: 255 seconds)
12:46:14  * yoshuawuytsjoined
12:54:41  * phatedjoined
12:59:09  * phatedquit (Ping timeout: 250 seconds)
13:00:28  * fotoveritejoined
13:03:08  * thlorenzjoined
13:07:37  * thlorenzquit (Ping timeout: 255 seconds)
13:35:32  * peutetrejoined
13:40:24  * posejoined
13:48:47  * peutetrequit (Quit: ...)
13:49:56  * sz0quit (Quit: My computer has gone to sleep. ZZZzzz…)
13:51:40  * anandthakkerquit (Quit: anandthakker)
13:55:47  * pfallenopjoined
13:58:46  * posequit (Remote host closed the connection)
13:59:24  * posejoined
14:02:49  * oncenulljoined
14:03:51  * posequit (Ping timeout: 256 seconds)
14:15:01  * anandthakkerjoined
14:20:58  * collypopsquit (Ping timeout: 255 seconds)
14:23:46  * reqshark_joined
14:25:11  * reqsharkquit (Read error: Connection reset by peer)
14:29:18  * oncenullquit (Remote host closed the connection)
14:32:15  * joshhartiganjoined
14:33:54  * thlorenzjoined
14:38:15  * thlorenzquit (Ping timeout: 252 seconds)
14:38:46  * pfrazejoined
14:41:59  * collypop_quit (Quit: My MacBook Pro has gone to sleep. ZZZzzz…)
14:44:31  * collypopsjoined
14:49:33  * yoshuawuytsquit (Ping timeout: 264 seconds)
14:51:53  * pfrazequit (Remote host closed the connection)
14:56:00  * joshhartiganquit (Remote host closed the connection)
14:56:32  * joshhartiganjoined
14:57:28  * joshhartiganquit (Remote host closed the connection)
14:58:45  * joshhartiganjoined
15:00:07  * pfrazejoined
15:04:09  * thlorenzjoined
15:22:42  * therealkoopajoined
15:31:57  * therealkoopaquit (Remote host closed the connection)
15:39:44  * joshhartiganquit
16:05:56  * anandthakkerquit (Quit: anandthakker)
16:11:59  <isaacs>substack: how do you feel about this? https://github.com/TestAnything/Specification/issues/16
16:12:36  <isaacs>substack: i really like how mocha/lab output the test timing info, because when you have like 1000 tests, you might want to only highlight the slow ones so you can make the tests take less time overall.
16:13:29  <isaacs>substack: but the existing tap-spec just kinda lies about that. you need the timing data in the actual TAP, or else, you can't analyze it later, you know? Like, it should be the time it takes to actually generate the tap, but if you're reading it from a file or TCP stream later, you're just measuring network and parsing time.
16:14:17  <isaacs>substack: eg, i wanna dump all the tests from a dozen projects to a single TAP file, and then store those, and keep track of what tests got slower or faster over time.
16:17:20  * anandthakkerjoined
16:17:35  <isaacs>substack: i'm asking you because frankly, tape and tap are the only tap producers/parsers that I care about, and the world will follow JavaScript's example.
16:18:08  * posejoined
16:22:55  * posequit (Ping timeout: 255 seconds)
16:58:46  * joepie91___changed nick to joepie91
17:17:41  * therealkoopajoined
17:19:46  * anandthakkerquit (Ping timeout: 272 seconds)
17:31:42  * thlorenzquit (Remote host closed the connection)
17:32:03  * yoshuawuytsjoined
17:34:04  * therealkoopaquit (Remote host closed the connection)
17:43:18  <substack>isaacs: test output over time would be fantastic!
17:43:47  <substack>and you could do something like git-bisect to find performance regressions
17:45:32  * posejoined
17:46:27  * pfrazequit (Remote host closed the connection)
17:51:14  * freealljoined
18:10:54  * phatedjoined
18:15:14  <jjjohnny_>ircretary: tell dominic http://burakkanber.com/blog/machine-learning-full-text-search-in-javascript-relevance-scoring/
18:15:14  <ircretary>jjjohnny_: I'll be sure to tell dominic
18:32:29  * thlorenzjoined
18:37:24  * thlorenzquit (Ping timeout: 256 seconds)
19:07:07  * pfrazejoined
19:14:54  * posequit (Remote host closed the connection)
19:23:18  * posejoined
19:25:08  * therealkoopajoined
19:25:21  * contrahaxjoined
19:37:38  * thealphanerdjoined
19:42:54  * thlorenzjoined
19:48:52  <isaacs>substack: inorite!?
19:49:41  <isaacs>substack: also, it's part of my master plan to disabuse the world of mocha forever.
19:49:56  <isaacs>substack: "but mocha shows slow tests" is a relevant and valid criticism.
19:50:31  <isaacs>substack: but omg mocha and lab both are the paradigm example of non-modularity taken to pathological extremes.
19:50:43  <isaacs>everything is 100% tightly coupled to everythign else, to the point where you cannot use it for novel purposes.
19:51:00  <isaacs>without modifying the core to support that new use-case, that is, which is a huge impediment
19:58:57  * posequit (Remote host closed the connection)
20:12:23  * sz0joined
20:20:34  * oncenulljoined
20:35:55  * therealkoopaquit (Remote host closed the connection)
20:41:34  * oncenullquit (Remote host closed the connection)
20:54:23  * anvakaquit (Remote host closed the connection)
21:07:22  * oncenulljoined
21:07:32  * domanicjoined
21:14:46  * DamonOehlmanjoined
21:24:04  * oncenullquit (Remote host closed the connection)
21:25:10  * oncenulljoined
21:29:30  * oncenullquit (Ping timeout: 244 seconds)
21:30:00  * thlorenzquit (Remote host closed the connection)
21:55:19  * Bsonyquit (Ping timeout: 252 seconds)
21:58:37  * thealphanerdquit (Quit: thealphanerd)
22:01:49  * dguttmanjoined
22:09:33  * phatedquit (Remote host closed the connection)
22:12:57  * yoshuawuytsquit (Ping timeout: 264 seconds)
22:30:50  * thlorenzjoined
22:36:07  * thlorenzquit (Ping timeout: 256 seconds)
22:37:57  * saijanai_joined
22:44:35  * thlorenzjoined
22:46:51  * thlorenzquit (Remote host closed the connection)
22:49:50  * domanicquit (Ping timeout: 246 seconds)
22:56:45  * pfrazequit (Remote host closed the connection)
22:58:06  * pfrazejoined
23:01:34  * Bsonyjoined
23:06:21  * Bsonyquit (Ping timeout: 265 seconds)
23:08:29  * anandthakkerjoined
23:20:37  * anandthakkerquit (Quit: anandthakker)
23:25:44  * anandthakkerjoined
23:28:57  * oncenulljoined
23:32:34  * phatedjoined
23:47:41  * thlorenzjoined
23:49:14  * anandthakkerquit (Quit: anandthakker)
23:52:26  * thlorenzquit (Ping timeout: 272 seconds)