02:04:49  <Domenic>bterlson: this seems pretty terrible to me tbh as an author. But maybe if I never have to do it for my own Ecmarkup specs and don't have to do it before sending PRs to 262 it won't affect me.
02:05:45  <Domenic>I haven't found a need to link to specific steps in a way that remains stable over time (besides intra-document references)
02:06:47  <Domenic>In HTML for intra-document references we just label the step (visibly, i.e. 3. Validate: perform validation on the foo; ... NOTE: the [validate] step is important because...)
03:43:52  * jmdyckjoined
03:57:07  * jmdyckquit (Quit: Leaving.)
06:28:51  * not-an-aardvarkquit (Quit: Connection closed for inactivity)
06:39:12  * gskachkov_joined
08:32:40  * gskachkov_quit (Quit: gskachkov_)
08:34:53  * gskachkov_joined
09:54:32  * gskachkov_quit (Quit: gskachkov_)
10:02:00  * gskachkov_joined
10:25:09  * mylesborinsquit (Quit: farewell for now)
10:25:40  * mylesborinsjoined
10:50:38  * gskachkov_quit (Quit: gskachkov_)
10:54:17  * gskachkov_joined
11:06:10  * jmdyckjoined
14:29:41  * Fishrock123joined
14:36:58  * brabjoined
14:41:56  * gskachkov_quit (Quit: gskachkov_)
15:12:34  * Fishrock123quit (Remote host closed the connection)
15:18:48  * Fishrock123joined
15:24:58  * brabquit (Ping timeout: 264 seconds)
15:49:43  * TehShrikejoined
15:59:08  * brianloveswordsquit (Read error: Connection reset by peer)
16:00:34  * brianloveswordsjoined
16:09:45  * gskachkov_joined
16:19:00  * gskachkov_quit (Quit: gskachkov_)
16:19:32  * gskachkov_joined
17:00:45  * bradleymeckjoined
17:02:50  * Fishrock123quit (Remote host closed the connection)
17:10:32  <bterlson>Domenic: making all steps linkable is more for spec consumers than authors, and more for external links than inner references
17:10:40  <bterlson>I see your point though
17:12:14  <jmdyck>Is there much call for external links to arbitrary steps?
17:16:57  <TabAtkins>Oh yeah, when I'm discussing algos I have to point to specific steps all the time.
17:17:19  <TabAtkins>And usually am stuck with "<link to section>, step 9.5.2, second bullet point"
17:17:39  <TabAtkins>And that's if I'm lucky and there's only one algo in the section.
17:19:16  <jmdyck>and you need those references to be robust to spec changes?
17:21:18  <TabAtkins>Ideally, yes. There've definitely been confusing times where I've been reading old messages to see what was decided on, and gotten to such a reference, and whoops it's been moved around now and I have to hunt for hopefully what they intended.
17:22:33  <jmdyck>ok, thx
17:28:08  * bradleymeckquit (Quit: bradleymeck)
17:28:11  <jmdyck>If a spec-refactoring replaces block-of-steps A and block-of-steps B with block-of-steps C, would you expect/want anchors in A and B to be reproduced in C (as feasible), to support old inbound links?
17:29:42  * Fishrock123joined
17:30:42  <bterlson>jmdyck: echoing TabAtkins, I want to link to steps all the time when discussing an alg with anyone
17:31:07  <bterlson>and if I'm not alone, then the links will exist on the internet, end up on stack overflow, etc., and therefore should be reasonably stable if we do it at all
17:31:46  <bterlson>jmdyck: I haven't thought through the link preservation policy
17:32:26  <bterlson>could have one list of "dead" step links so users get a message like "Sorry, that step is no longer in the spec", for example
17:33:03  <bterlson>oh Domenic, did you realize that these step links are auto-generated not manually created?
17:33:41  <bterlson>ecmarkup --gen-step-ids <input-doc> or something would generate them for you
17:38:57  <jmdyck>Would you require that of authors before they submit a PR, or do it yourself after merge? (was one of Domenic's concerns)
17:40:23  <bterlson>it would have to be during PR, otherwise a version of 262 would get built without the links there
17:40:42  <bterlson>this is mostly a question :-P
17:40:47  * gskachkov_quit (Quit: gskachkov_)
17:42:20  <bterlson>TehShrike: is your namesake from the Hyperion Cantos or just the super metal bird?
17:42:38  <TehShrike>The bird :-)
17:42:55  <TehShrike>The Hyperion character is the most popular guess. Second-most popular is the transport from Tribes
17:42:55  <bterlson>jmdyck: actually no reason why travis couldn't run the stepgen thing as part of the build
17:43:18  <bterlson>TehShrike: I played tribes and I don't even remember that
17:43:53  <jmdyck>TabAtkins said "no, do *not* automatically do this on build."
17:44:08  <bterlson>jmdyck: I think that only applies to local builds
17:44:19  <bterlson>once you've committed and pushed, doing it as part of CI seems ok?
17:44:21  <TabAtkins>Yeah
17:45:40  <bterlson>TabAtkins: ReferenceError: Ambiguous opinion, please clarify.
17:46:28  <TabAtkins>Yours
17:46:45  <bterlson>ok to do in CI?
17:47:11  <bterlson>so then Domenic doesn't need to do anything
17:47:27  <bterlson>except, sadly, read the base32768 spew before each step
17:47:52  <jmdyck>yeah
17:50:59  <jmdyck>so then travis would be committing its additions?
17:51:25  <jmdyck>(the additions from ecmarkup --gen-step-ids)
17:52:53  <jmdyck>hm. could the spec-with-step-ids be on a separate branch?
17:54:57  <jmdyck>(that keeps pace with master, of course)
17:55:17  <bterlson>jmdyck: if separate, I'd say my original thinking of a separate metadata file is better
17:55:42  <bterlson>the problem with both approaches is that adding/removing steps becomes more difficult I think
17:55:47  <bterlson>have to make the edit in multiple places I guess
17:56:17  <bterlson>well, straight removal or straight add, no problem. Edits and additions with deletions are not easily understood by tooling.
17:59:30  <jmdyck>if you have a separate metadata file, how does it associate each spec-step with its link-id? it seems like you'd need an identifier for each step, which is the same problem all over again.
17:59:40  * gskachkov_joined
18:00:41  <jmdyck>ok, so the benefit of having step-ids in the master spec is that when a human edits an alg, they can do something intelligent with the step-ids.
18:02:43  <jmdyck>with the existing step-ids, that is.
18:11:23  <bterlson>jmdyck: with the separate metadata file you'd have to say as part of a build where steps moved, probably, if it can't figure it out
18:11:33  <bterlson>and yeah, the benefit of having it in master is you just move them how you want them
18:29:13  * Fishrock123quit (Read error: Connection reset by peer)
18:29:37  * Fishrock123joined
18:38:23  * bradleymeckjoined
18:50:11  <TabAtkins>And it's totally transparent about things, you can tell when new ones need to be added, and it's slightly more clear that substantial changes might warrant a new id. (Particularly if the IDs are random-looking base-32768 pairs or triples, rather than things that look interpretable.)
19:01:06  * annevkjoined
19:34:09  * Fishrock123quit (Remote host closed the connection)
19:41:19  * gskachkov_quit (Quit: gskachkov_)
19:58:48  * Fishrock123joined
19:58:50  * gskachkov_joined
20:35:07  <bterlson>the IDs that map to arabic characters are quite nice looking
20:36:37  <bterlson>#step-ڝɟ
20:37:56  <bterlson>look at this cute guy: ꆜ
20:38:09  <bterlson>question: BE or LE?
20:38:29  <TabAtkins>This is effectively a number, so you know the correct answer in your heart.
20:43:11  <TabAtkins>(It's BE, do BE, LE is the sort of thing you do for weird comp-arch reasons only.)
20:43:55  <ljharb>very tiny wars have been fought over this
20:47:46  <bterlson>LE is somewhat appealing in that sequential IDs don't look sequential *shrug*
20:47:59  <bterlson>but BE is what my heart wants
20:48:08  <ljharb>listen to your heart, brian
20:48:13  <tcare>+1
20:51:00  <bterlson>now how to write a 24-bit integer to a buffer
20:51:57  * tobiejoined
20:53:01  <bterlson>buf.writeUIntBE(n, 0, 3), apparently
20:54:52  <jmdyck>non-ascii fragment identifiers will need to percent-encoded, won
20:54:56  <jmdyck>'t they?
20:55:34  <jmdyck>(in URIs)
20:55:44  <bterlson>ok, another question: do I write 1. {#step-ڟ㶿} into the source document, or 1. {#ڟ㶿} and prepend 'step-' when I generate the anchor
20:55:56  <bterlson>jmdyck: in URIs definitely, in the document... also yes, as the current encoding is set to ascii
20:56:19  <bterlson>I think I will set the encoding to utf-8
20:57:58  <jmdyck>base 16 or 36 would be safer
20:58:50  <bterlson>for the source document encoding?
20:58:57  <bterlson>or for URLs
20:59:02  <bterlson>URIs (sorry)
20:59:03  <jmdyck>for the step-ids
20:59:40  <bterlson>but that makes them so long :'( Can you expand on the unsafeness of base32768 in a utf-8 encoded document?
21:00:26  <jmdyck>base 36 would only require 3 bytes, same as a 24-bit integer
21:04:03  <bterlson>hmm
21:04:50  <jmdyck>(and a percent-encoded utf-8 char would be even longer, though you're not necessarily concerned about length in URIs)
21:04:50  <bterlson>Encoding a 3-byte UInt into Base32768: Ҡԟ, Base64: AAAA
21:07:54  <bterlson>base36 I guess needs 5 characters if I do my math right?
21:08:03  <bterlson>with some wasted bits
21:10:36  <jmdyck>hm
21:10:58  <bterlson>wasted bits are no problem, I can scale the size of the step integer depending on the encoding scheme used
21:11:19  <bterlson>I'm drawn toward the shorter IDs personally, assuming there is no danger
21:11:34  <bterlson>and assuming there is no reason to care about URL-encoded length
21:11:35  <jmdyck>currently 9631 steps, so that's only 14 bits.
21:13:10  <jmdyck>base36 can represent up to ~5 times that in only 3 bytes
21:13:31  <bterlson>keep in mind that we can't easily "GC" old steps
21:13:38  <bterlson>so it's really a counter of every unique step that gets added over time
21:14:00  <bterlson>this is nearing the issue that finally got us off of word, which uses an 11-bit counter for lists that we exhausted
21:14:17  <jmdyck>is Ecmascript likely to see that much churn? 5 times complete replacement of every step?
21:14:23  <Bakkot>> keep in mind that we can't easily "GC" old steps
21:14:28  <Bakkot>wait, why not?
21:14:45  <bterlson>Bakkot: because if there's a link on the internet to some step, it would be very bad if that link began meaning somnething else
21:14:57  <Bakkot>Ah, fair enough.
21:14:59  <jmdyck>(that certainly wasn't the only problem with Word!)
21:15:05  <bterlson>certainly not :-P
21:15:19  <bterlson>but once Allen was digging in the XML, he might as well have been editing HTML ;)
21:16:30  <bterlson>jmdyck: not likely to see such churn but better safe than sorry later maybe?
21:17:02  <bterlson>it's not like growing is a problem
21:17:27  <bterlson>except for aesthetically displeasing alignment issues
21:17:52  <bterlson>but if we can pack a 24bit uint into 2 characters, why not?
21:19:10  <jmdyck>re alignment: you could insert a space somewhere (that gets discarded when generating the HTML). [but again, this is in the unlikely event of huge churn]
21:21:26  <jmdyck>I'm just worried that if you venture too far outside ascii, there may be usability problems.
21:22:09  <bterlson>does HTML use anything non-ascii for anything I wonder? TabAtkins/Domenic/etc.
21:22:43  <jmdyck>you mean the html specs?
21:23:08  * gskachkov_quit (Quit: gskachkov_)
21:23:09  <bterlson>yeah
21:23:33  <bterlson>someone has gone down this rabbit hole before me somewhere
21:24:27  <jmdyck>i've got an oldish version of the whatwg html spec handy,
21:24:43  <jmdyck>looks like the file itself is pure ascii
21:27:04  <bterlson>fwiw I printed everything 0-2^24 base32768-encoded and it renders fine in vim and code (although code seems to use some colored glyphs, eg. "ԉ➿")
21:41:40  <jmdyck>If the HTML spec editors wrestled with the idea of a stable identifier for every alg step, it doesn't look like they won. :)
21:43:03  <bterlson>I think that much is true since TabAtkins wants some sort of alignment here with bikeshed
21:59:41  * Fishrock123quit (Read error: Connection reset by peer)
22:01:28  * Fishrock123joined
22:05:50  * dilijevjoined
22:06:43  <dilijev>hello
22:07:57  <bterlson>dilijev: welcome
22:18:57  * not-an-aardvarkjoined
22:33:43  * Fishrock123quit (Quit: Leaving...)
22:42:45  <TabAtkins>bterlson: The {#foo} syntax is meant to be literally the ID, in Markdown (we're extending https://michelf.ca/projects/php-markdown/extra/#header-id). So no shortcuts there - it should say the full ID when it's in.
22:43:00  <bterlson>TabAtkins: agreed
22:43:14  <TabAtkins>That said, I don't see why it needs the full word "step" - just do "s-XX" to save 3 chars!
22:43:22  <bterlson>ahh, sure
22:43:25  <ljharb>eh
22:43:30  <ljharb>are we optimizing for size, or readability?
22:43:41  <bterlson>I'm not sure those are opposed in this case
22:43:41  <TabAtkins>Readability isn't important.
22:43:50  <ljharb>O.o
22:44:02  <bterlson>the proper read is "some slug I shouldn't touch"
22:44:10  <TabAtkins>Yup, exactly.
22:44:15  <ljharb>ah ok - so will i, as a spec reader, or someone trying to write spec text, ever need to care about these?
22:44:36  <bterlson>anything "inside", no, but you will have to move them around as a unit if you move steps
22:44:37  <TabAtkins>ljharb: Only insofar as you might click the permalink icon for a step, then copy-paste the URL that results.
22:44:38  <TabAtkins>That's all.
22:44:40  <ljharb>why the `s-` then?
22:44:49  <bterlson>namespacing IDs seems like good practice
22:44:50  <ljharb>ie why isn't it just an unintelligible hash
22:44:51  <ljharb>ok
22:44:57  <ljharb>¯\_(ツ)_/¯ sounds fine
22:44:58  <bterlson>eg. sec-, term- which are already in use
22:45:03  <bterlson>figure-
22:45:09  <TabAtkins>Also IDs in CSS need to be idents, so starting with an alpha character ensures that.
22:45:13  <ljharb>true
22:45:22  <TabAtkins>The "-" isn't really necessary, but it marks the "s" as a namespace nicely.
22:45:39  <TabAtkins>And prevents accidental readings from the other two chars merging with it.
22:46:01  <TabAtkins>`id=s-ex` is much better than `id=sex`, for example.
22:46:13  <ljharb>lol agreed
22:46:17  <ljharb>experts exchange agrees too
22:46:37  <TabAtkins>(That said, we won't even get near that identifier - ES won't even increment the second digit.)
22:46:49  <TabAtkins>The first, rather. The, uh, "tens" place.
22:47:03  <bterlson>LE is in your heart isn't it?!?
22:47:28  <TabAtkins>NO
22:49:52  <bterlson>mmhmm
22:50:41  <TabAtkins>(Oh, hm, we actually don't use alphas at all - the first character in base32768 is Russian, it looks like.)
22:51:36  <TabAtkins>Ah, because it only uses aligned blocks of 32 that all satisfy the criteria. The ASCII range doesn't have one of those.
22:51:52  <TabAtkins>(All four aligned blocks in ASCII have control chars, whitespace, or punctuation.)
22:52:20  <TabAtkins>Uh, in that case, you *could* drop the whole thing if you want, and just use the two chars.
22:52:33  <TabAtkins>All >ASCII chars are valid to start an ident in CSS.
23:04:33  <bterlson>TabAtkins: I don't follow those last two sentences
23:04:39  <bterlson>what am I dropping? What am I left with?
23:07:22  <TabAtkins>So if this base was done by *densely* packing the valid chars, then you could end up with an identifier like "32", which isn't a valid ident in CSS. (This probably isn't a real problem, but staying in CSS's easy valueset is a good idea in general.) Thus starting with an alpha helps, as it makes it an ident again. But since this *starts* out in post-ASCII,
23:07:22  <TabAtkins>that's not a concern - #ҠԀ is just fine in CSS.
23:07:36  <TabAtkins>So `id=ҠԀ` is probably fine too, rather than `id=s-ҠԀ`.
23:07:59  <TabAtkins>The namespacing might still be sufficiently valuable as an intent flag to keep the "s-" chars, tho.
23:10:46  <bterlson>I like the intent flag
23:10:56  <bterlson>we also may want to use this scheme for other things
23:33:17  <dilijev>why use base32768? really don't care about readability?
23:35:43  <TabAtkins>dilijev: Again, what's the purpose of "readability" for IDs that go on all 10k algorithm steps in the spec?
23:35:54  <dilijev>oh for algorithm steps, sure
23:36:23  <dilijev>still won't non-ascii entities be url-encoded? so for linking purposes this is almost a nongoal
23:36:40  <dilijev>i mean i guess it doesn't matter because it's just another step more unreadable
23:36:54  <dilijev>but at that point why not just have /s\-\d+/
23:37:17  <TabAtkins>No real need to. Only certain restrictive definitions of "URL" require that; in practice, browser URL bars, and <a href>, accept quite a wider subset.
23:38:07  <dilijev>practical disadvantage: turning numbered steps into base-n is an extra step for the writer or generator script to maintain.
23:38:21  <dilijev>just playing devil's advocate on complexity here
23:38:38  <TabAtkins>The entire point of this is that there's a generator script doing this, yes.
23:38:43  <dilijev>fair enough
23:38:49  <dilijev>but still code maintenance was more my point
23:39:05  <TabAtkins>And giving them a nonsense ID rather than a "sensible" number makes it more likely people will keep the ID unchanged when they move steps around, which is the point.
23:39:23  <TabAtkins>The more people view this as a meaningless atom, the better.
23:39:46  <dilijev>so randomly generate a number and encode to base-whatever to make it appear random and non-sequential so that renumbering is not something they would think to do? sounds fair
23:39:57  <TabAtkins>More or less, yeah.
23:39:59  <dilijev>that said i like the idea of using base32768 in the spec just for lols
23:40:04  <TabAtkins>That too, yes.
23:41:02  <dilijev>anyway you'd have to ensure that for modern browsers all of the base32768 characters actually can be pasted into a URL bar without getting corrupted
23:41:18  <dilijev>from what i understand base32768 was designed with that in mind since they are all Lo characters
23:41:43  <bterlson>I don't really care about the links; I'll just uriEncode() it and be done most likely
23:41:56  <bterlson>but I do think it is better for editing purposes that the thing be as short as can be
23:42:13  <bterlson>hence my preference for base32768 over base36
23:47:37  <dilijev>+1 for urlEncode() -- it's safer
23:48:25  <dilijev>(I was wrong about all the codepoints being Lo -- that's base65536 -- but they're all "safe" -- anyway this is not relevant if urlEncode is in play)
23:55:46  * gskachkov_joined