00:20:06  * davidstraussquit (Remote host closed the connection)
00:33:42  * davidstraussjoined
00:40:20  * jjmalinajoined
00:40:34  * jjmalinaquit (Client Quit)
00:43:59  * i_m_cajoined
00:50:01  * kenansulaymanquit (Quit: ≈ and thus my mac took a subtle yet profound nap ≈)
00:50:17  * kenansulaymanjoined
00:50:22  * kenansulaymanquit (Client Quit)
01:06:36  * timoxleyjoined
01:24:54  * eugenewarejoined
01:25:53  * eugenewarequit (Remote host closed the connection)
01:26:20  * eugenewarejoined
01:30:51  * eugenewarequit (Ping timeout: 260 seconds)
01:44:37  * esundahljoined
01:49:41  * ryan_ramagejoined
01:52:40  * ryan_ramagequit (Client Quit)
02:07:11  * eugenewarejoined
02:07:58  * ryan_ramagejoined
02:08:14  <juliangruber>http://queue.acm.org/detail.cfm?id=1961297
02:08:34  <juliangruber>relational algebra can be expressed through monads
02:08:35  * eugenewarequit (Read error: Connection reset by peer)
02:08:45  <juliangruber>as well as nosql queries
02:08:59  <juliangruber>or coSQL as they say
02:09:15  * thlorenzjoined
02:09:49  * ryan_ramagequit (Client Quit)
02:11:46  * eugenewarejoined
02:14:26  * timoxleyquit (Remote host closed the connection)
02:16:41  * eugenewa_joined
02:16:42  * eugenewarequit (Read error: Connection reset by peer)
02:26:09  * jxsonquit (Remote host closed the connection)
02:48:29  * thlorenzquit (Remote host closed the connection)
03:00:27  * eugenewa_quit (Remote host closed the connection)
03:10:12  * fallsemojoined
03:12:53  * eugenewarejoined
03:24:53  * eugenewarequit
03:26:29  * jxsonjoined
03:29:49  * eugenewarejoined
03:31:05  * esundahlquit (Remote host closed the connection)
03:31:18  * jxsonquit (Ping timeout: 264 seconds)
03:31:50  * i_m_caquit (Ping timeout: 240 seconds)
03:34:19  * esundahl_joined
03:48:31  * fallsemoquit (Quit: Leaving.)
03:50:41  * i_m_cajoined
04:04:53  * timoxleyjoined
04:23:49  * dguttmanquit (Quit: dguttman)
04:43:06  * i_m_caquit (Ping timeout: 264 seconds)
04:46:03  * eugenewarequit (Remote host closed the connection)
04:47:25  * eugenewarejoined
05:27:32  * timoxleyquit (Remote host closed the connection)
05:28:50  * esundahl_quit (Remote host closed the connection)
05:29:24  * esundahljoined
05:33:50  * esundahlquit (Ping timeout: 264 seconds)
05:59:57  * esundahljoined
06:08:30  * esundahlquit (Ping timeout: 264 seconds)
06:49:21  * eugenewarequit (Remote host closed the connection)
07:04:51  * esundahljoined
07:09:16  * esundahlquit (Ping timeout: 246 seconds)
07:43:37  * dominictarrjoined
07:49:34  * eugenewarejoined
07:50:23  * eugenewarequit (Remote host closed the connection)
07:50:30  * eugenewarejoined
08:05:29  * esundahljoined
08:09:49  * esundahlquit (Ping timeout: 246 seconds)
08:14:43  * timoxleyjoined
08:22:42  * jcrugzzquit (Ping timeout: 264 seconds)
08:23:02  * ehdquit (Ping timeout: 240 seconds)
08:26:59  * ehdjoined
08:34:56  * eugenewarequit (Remote host closed the connection)
08:46:48  * kenansulaymanjoined
08:53:13  * fergusmcdowalljoined
09:05:22  * eugenewarejoined
09:06:23  * fergusmcdowallquit (Quit: fergusmcdowall)
09:21:45  * fergusmcdowalljoined
09:33:31  * eugenewarequit (Remote host closed the connection)
09:33:38  * eugenewarejoined
09:38:50  * dominictarrquit (Quit: dominictarr)
09:45:09  * fergusmcdowallquit (Quit: fergusmcdowall)
09:50:14  * dominictarrjoined
10:13:39  * kenansulaymanquit (Quit: ≈ and thus my mac took a subtle yet profound nap ≈)
10:40:04  <mbalho>wtf people are getting 500k writes/sec with my gist, and i can only get 100k
10:43:15  <rescrv>mbalho: what filesystem are you using?
10:43:21  <rescrv>on what kind of hard disk?
10:43:52  <mbalho>ssd, mac os extended formatting
10:44:09  <rescrv>are they using OS X too?
10:44:43  <mbalho>rescrv: one of the guys who got 500k was on a mac 15.4"/2.4 Quad-core i7/8GB/256-Flash ssd
10:45:08  <mbalho>same node version, 0.10.17
10:45:20  <rescrv>does that match yours, or do you have a laptop?
10:45:22  <mbalho>I'm on node 0.10.17 w/ a SSD + a Haswell i5 1.3ghz CPU and 4GB RAM
10:45:45  <mbalho>rescrv: i have a 13" screen laptop, the above one is the 15.4" laptop
10:46:08  <rescrv>were they on battery and you were not? If both on battery, were your power saving settings the same?
10:46:17  <mbalho>hmm good question, i was on battery
10:46:33  <rescrv>On my T530 laptop (which I would imagine is a similar build as they are based on the reference boards), I see a 4X improvement on AC over battery
10:46:55  <mbalho>wow that could be it
10:47:15  <rescrv>toss in the 1.3 vs 2.4 and you've got your culprit. LevelDB is CPU-heavy for writes
10:47:29  <rescrv>plus the i5 and i7 likely have different cache sizes
10:47:40  <rescrv>which impacts the speed of the skiplist
10:47:53  <mbalho>ahhh
10:48:13  <rescrv>that's a secondary effect. It's more than likely battery and power saving
10:50:07  * dominictarrquit (Quit: dominictarr)
10:50:38  <mbalho>rescrv: are there any techniques for determining how often compaction runs and how many file descriptors leveldb is opening at once?
10:51:04  <rescrv>file descriptors? not really. compaction? read the LOG file
10:51:24  <rescrv>you'll get one compaction per Moved/Compacted line
10:51:32  <mbalho>ah nice
10:51:34  <rescrv>with stock
10:51:44  <rescrv>HyperLevelDB will do many Moved lines for just one compaction
10:52:51  <mbalho>im trying to make sure im doing as much optimization as possible on the node side before i jump to hyperleveldb, but i plan to
10:59:17  <mbalho>weird, I can't seem to get above 100k/sec, even while plugged in. though I don't know how mac os handles power management
10:59:58  <rescrv>maybe it's just CPU limited
11:00:05  <rescrv>what's your write buffer
11:00:21  <gildean>i can't get anywhere near 500k/s on my t530, hovering around 130-140k
11:00:53  <mbalho>rescrv: ive tried 4mb, 16mb, 64mb and it doesnt seem to change much
11:00:59  <mbalho>rescrv: could definitely be cpu limited
11:01:45  <mbalho>rescrv: though i ran it on a 3GHZ i7 just now (a friends) while plugged in and also got around 100k/s
11:01:49  <gildean>i have ac connected and it's a i7 quad with ht @ 2.4GHz
11:02:25  <gildean>how are you timing the benchmark?
11:02:33  <gildean>i just added a console.time in there to time it
11:02:33  <mbalho>time node import.js
11:02:58  <mbalho>then just divide 5200000 by the number of seconds
11:03:22  <gildean>yeah, that's what i did except with console.time in the script itself
11:03:29  <mbalho>ah cool
11:03:49  <gildean>so starting node won't affect the numbers
11:04:00  <gildean>that's how i get around 140k/s
11:04:29  <mbalho>good point. the whole thing could definitely be more scientific :)
11:04:30  <rescrv>mbalho: are you using a fresh db each time
11:04:33  <gildean>and this is with an intel ssd
11:04:35  <mbalho>rescrv: yea
11:05:08  <mbalho>hmm im not sure what brand my ssd is, it just says APPLE SSD SM0256F
11:07:59  <mbalho>well at least now i have a white whale
11:08:56  <rescrv>mbalho: what's the total size of the data you're inserting?
11:14:03  <rescrv>and how fast can you write that to a file
11:15:04  <gildean>to me it seems that the disk is accessed only periodically, it's not writing the whole time the script runs
11:17:26  * dominictarrjoined
11:23:31  <rescrv>gildean: that's often the case, but we want to know what the disk can do, so we know when we're almost there
11:25:52  * Acconutjoined
11:27:03  * Acconutquit (Client Quit)
11:30:47  * Acconutjoined
11:37:11  * esundahljoined
11:41:26  * esundahlquit (Ping timeout: 240 seconds)
11:51:59  * eugenewarequit (Remote host closed the connection)
12:10:37  * Acconutquit (Quit: Acconut)
12:22:21  * eugenewarejoined
12:24:36  * kenansulaymanjoined
12:30:14  * fallsemojoined
12:39:32  * fergusmcdowalljoined
12:48:09  <mbalho>rescrv: time cp 1994.csv 1994.csv.copy takes 0m0.650s
12:48:26  <mbalho>rescrv: 5180049 lines, 501558665 bytes
12:48:27  * rudquit (Quit: rud)
12:48:53  <mbalho>rescrv: code im using https://gist.github.com/maxogden/6551333
12:52:12  <mbalho>rescrv: if i turn off writing and just parse the file and split into batches (but never do anything with the batches once they emit) it takes 8 seconds
13:21:09  <mbalho>inserting a 52694400 line file (5181411517 bytes) takes 11m4.321s
13:23:01  <rescrv>mbalho: I'm correct in seeing that as a 5GB file? How many keys per line?
13:23:33  <mbalho>each line is a value, the key is just an incrementing counter integer
13:23:47  <mbalho>rescrv: yes its 5.2GB, https://gist.github.com/maxogden/6551333#comment-907375
13:24:05  <rescrv>mbalho: if the key is just lineno, why import to leveldb at all?
13:25:00  <mbalho>rescrv: in my actual app i store a uuid, revision sequence number as well
13:25:15  <mbalho>rescrv: but for the purposes of this benchmark I wanted to simplify
13:26:37  <mbalho>that should read uuid, revision and a sequence number
13:26:45  <rescrv>I suspect that if you profile it you'll find it spends most of its time in "sleep"
13:27:20  <mbalho>rescrv: would that be the notorious sleep(1) that i've heard about in vanilla leveldb?
13:27:43  * fergusmcdowallquit (Quit: fergusmcdowall)
13:29:03  <rescrv>here's a quick hack: Figure out how to alternate calls to the underlying leveldb such that you do Write(opts, your_batch); Write(opts, NULL);
13:29:20  <rescrv>you'll avoid the sleep
13:29:32  <rescrv>then increase your batch size within your scripte
13:29:53  <mbalho>haha wow
13:31:52  <rescrv>then once you do that, switch to HyperLevelDB with a large batch size, larger than the underlying LevelDB batch size
13:32:14  <rescrv>the quick hack will just demonstrate what the problem is, the second part is the true fix
13:32:22  <mbalho>gotcha
13:33:05  <rescrv>gotta run. if you take my suggestion, please do let me know how it works out
13:33:05  <mbalho>rescrv: am i remembering correctly that write buffer sizes above 64mb shouldnt have any effect?
13:33:10  <mbalho>rescrv: will do
13:33:18  * fergusmcdowalljoined
13:33:56  <rescrv>mbalho: you're correct, but here's the secret: the write buffer size is only checked after it overflows
13:34:09  <mbalho>hah
13:34:22  <rescrv>set a 4MB write buffer, and put a 64MB batch in, and even if it's 3.9999MB, you'll end up with a 67.9999MB memtable that's flushed
13:34:48  <rescrv>so feel free to make your batches larger, so long as you alternate with NULL (forcing it to give you a new WB each time)
13:34:57  <mbalho>i see, thank you
13:36:20  * rudjoined
13:36:20  * rudquit (Changing host)
13:36:20  * rudjoined
13:38:54  <mbalho>trevnorris: does llprof show which c++ functions are being executed? or does it just show syscalls? i cant remember
13:39:15  <mbalho>trevnorris: im wondering about https://gist.github.com/maxogden/6551333 and am trying to gain insight into what leveldb is actually doing when under bulk load
13:47:55  * fallsemoquit (Ping timeout: 246 seconds)
13:52:57  * fergusmcdowallquit (Quit: fergusmcdowall)
13:56:17  * kenansulaymanquit (Quit: ≈ and thus my mac took a subtle yet profound nap ≈)
13:57:11  * fallsemojoined
13:57:36  * i_m_cajoined
14:01:52  * fallsemoquit (Ping timeout: 256 seconds)
14:08:40  * eugenewarequit (Remote host closed the connection)
14:08:47  * eugenewarejoined
14:30:30  * i_m_caquit (Ping timeout: 264 seconds)
15:07:12  * Acconutjoined
15:07:22  * Acconutquit (Client Quit)
15:09:28  * rudquit (Quit: rud)
15:15:49  * fergusmcdowalljoined
15:35:23  <rescrv>mbalho: https://github.com/rescrv/rprof will periodically attach GDB, dump all stack traces w/ symbols, and then allow you to see stats about the dumped stacks
15:35:49  <rescrv>compile leveldb w/ -ggdb and you'll get some symbols
15:37:54  * fallsemojoined
15:38:08  * fergusmcdowallquit (Quit: fergusmcdowall)
15:44:38  * fallsemoquit (Ping timeout: 264 seconds)
15:57:25  * rudjoined
16:02:17  * dominictarrquit (Quit: dominictarr)
16:07:24  * esundahljoined
16:12:28  * i_m_cajoined
16:15:34  * dguttmanjoined
16:15:41  * esundahlquit (Remote host closed the connection)
16:21:19  * timoxleyquit (Remote host closed the connection)
16:21:42  * timoxleyjoined
16:26:00  * fallsemojoined
16:30:18  * fergusmcdowalljoined
16:30:19  * fallsemoquit (Ping timeout: 246 seconds)
16:44:25  * eugenewarequit (Remote host closed the connection)
16:53:11  * esundahljoined
16:53:11  * esundahlquit (Remote host closed the connection)
16:53:13  * Acconutjoined
16:53:53  * Acconutquit (Client Quit)
16:54:18  * fergusmcdowallquit (Quit: fergusmcdowall)
16:59:15  * jcrugzzjoined
17:05:58  * Acconutjoined
17:15:27  * eugenewarejoined
17:15:51  <levelbot>[npm] level-sec@0.0.1 <http://npm.im/level-sec>: High-level API for creating secondary indexes (@juliangruber)
17:17:51  * thlorenzjoined
17:20:31  * eugenewa_joined
17:20:32  * eugenewarequit (Read error: Connection reset by peer)
17:24:57  * dominictarrjoined
17:25:18  * eugenewa_quit (Ping timeout: 256 seconds)
17:26:07  * tmcwjoined
17:33:22  <levelbot>[npm] level-sec@1.0.0 <http://npm.im/level-sec>: High-level API for creating secondary indexes (@juliangruber)
17:36:36  * tmcwquit (Remote host closed the connection)
17:37:03  * tmcwjoined
17:42:06  * tmcwquit (Ping timeout: 264 seconds)
17:48:20  * i_m_caquit (Quit: Lost terminal)
17:50:36  * Acconutquit (Quit: Acconut)
18:30:10  * thlorenzquit (Remote host closed the connection)
18:32:14  * thlorenzjoined
18:33:43  * thlorenzquit (Remote host closed the connection)
18:35:46  * Acconutjoined
18:35:52  * thlorenzjoined
18:35:59  * Acconutquit (Client Quit)
18:38:27  * thlorenzquit (Remote host closed the connection)
18:47:39  * tmcwjoined
18:52:27  * tmcwquit (Ping timeout: 260 seconds)
19:01:20  * missinglinkjoined
19:03:40  * Acconutjoined
19:04:00  * Acconutquit (Client Quit)
19:16:22  <levelbot>[npm] wiki@0.0.4 <http://npm.im/wiki>: A Federated Wiki Server (@ward, @nrn)
19:20:44  * ganglerijoined
19:35:21  * timoxleyquit (Remote host closed the connection)
19:50:09  * thlorenzjoined
19:55:01  * timoxleyjoined
19:58:16  * timoxleyquit (Remote host closed the connection)
20:10:44  * Acconutjoined
20:12:04  * Acconutquit (Client Quit)
20:29:06  * timoxleyjoined
20:34:01  * timoxleyquit (Ping timeout: 256 seconds)
20:43:45  * chiltsquit (Ping timeout: 245 seconds)
20:45:00  * chiltsjoined
20:59:19  * gangleriquit (Quit: Leaving)
21:25:38  * eugenewarejoined
21:29:52  * timoxleyjoined
21:30:02  * eugenewarequit (Ping timeout: 256 seconds)
21:34:39  * timoxleyquit (Ping timeout: 256 seconds)
22:08:48  * fergusmcdowalljoined
22:12:37  * fergusmcdowallquit (Client Quit)
22:27:13  * eugenewarejoined
22:30:36  * timoxleyjoined
22:31:54  * eugenewarequit (Ping timeout: 264 seconds)
22:35:17  * timoxleyquit (Ping timeout: 256 seconds)
22:51:15  * dominictarrquit (Quit: dominictarr)
22:57:41  * eugenewarejoined
23:02:24  * eugenewarequit (Ping timeout: 254 seconds)
23:12:12  * dominictarrjoined
23:31:23  * timoxleyjoined
23:35:26  * timoxleyquit (Ping timeout: 240 seconds)
23:58:43  * eugenewarejoined