I was updating my home page and wondering if it would be easier using WordPress. So I’m testing, let me know what you think?
Today is my birthday. I woke to find many birthday greetings on Facebook, and more roll in throughout the day. It’s hard to admit how pleasing it is, embarrassing. I haven’t asked for birthday greetings and don’t usually give them. Maybe I’ll change my mind.
Perhaps I’m late to the party but I’m still trying to understand the ‘why’ of social networking — why does Facebook encourage birthday greetings? What human pleasure does getting 11 “happy birthday” notes trigger?
But it fits into the need to have and build community, and the mechanism for community requires periodic acknowledgement. We engage in sharing our humanity (everyone has a birthday) by greeting. Hello, goodbye, I’m here, poke. But not too often, once a year is enough.
I wrote about standards and community yesterday on the IETF list, but people didn’t seem to get it. Explaining that message and its relationship to birthday greetings is hard.
The topic of discussion was “Updating BCP 10 — NomCom ELEGIBILITY”.
- IETF : group that does Internet Standards
- BCP10: the unique process for how IETF recruits and picks leadership
- NOMCOM: the “Nominating Committee” which picks leadership amongst volunteers
- elegibility: qualifications for getting on the NOMCOM
I think BCP10 is a remarkable piece of social engineering, at the center of the question of governance of the Internet: how to make hard decisions, who has final authority, who gets to choose them, and how to choose the choosers. Most standards groups get their authority from governments and treaties or are consortia. But IETF developed a complex algorithm for trying to create structure without any other final authority to resolve disputes. But it looks like this complex algorithm is buggy, and the IETF is trying to debug the process without being too open about the problem. The idea was to let people volunteer, and choose randomly among qualified volunteers. But what qualifications? There’s been some concern about the latest round of nomcom volunteers, that’s what started this thread.
During the long email thread on the topic, the discussion turned to the tradeoffs between attending a meeting in person vs. using new Internet tools for virtual meetings or more support for remote participation. Various people noted that the advantage of meeting in person is the ability to have conversations in the hallways outside the formal, minuted meetings.
I thought people were too focused on their personal preferences rather than the needs of the community. What are we trying to accomplish, and how do meetings help with that? How would we satisfy the requirements for effective work.
A few more bits: I mention some of the conflicts between IETF and other standards groups over URLs and JSON because W3C, WHATWG, ECMA are different tribes, different communities.
Creating effective standards is a community activity to avoid the Tragedy of the Commons that would result if individuals and organizations all went their own way. The common good is “the Internet works consistently for everyone” which needs to compete against “enough of the Internet works ok for my friends” where everyone has different friends.
For voluntary standards to happen, you need rough consensus — enough people agree to force the remainder to go along.
It’s a community activity, and for that to work there has to be a sense of community. And video links with remote participation aren’t enough to create a sense of community.
There are groups that purport to manage with minimal face-to-face meetings, but I think those are mainly narrow scope and a small number of relevant players, or an already established community, and they regularly rely heavily on 24/7 online chat, social media, open source tools, wikis which are requirements for full participation.
The “hallway conversations” are not a nice-to-have, they’re how the IETF preserves community with open participation.
One negative aspect of IETF “culture” (loosely, the way in which the IETF community interacts) is that it isn’t friendly or easy to match and negotiate with other SDOs, so we see the WHATWG / W3C / IETF unnecessary forking of URL / URI / IRI, encodings, MIME sniffing, and the RFC7159-JSON competing specs based at least partly on cultural misunderstandings.
The main thing nomcom needs to select for is technical leadership (the skill of getting people to follow) in service of the common good). And nomcom members should have enough experience to have witnessed successful leadership. One hopes there might be some chance of that just by attending 3 meetings, although the most effective leadership is often exercised in those private hallway conversations where compromises are made.
(I think this post is pretty academic for the web dev crowd, oh well)
When talking about URLs and URNs or semantic web or linked data, I keep on returning to a topic. Carl Hewitt gave me a paper about inconsistency which this post reacts to.
The traditional AI model of semantics and meaning don’t work well for the web. Maybe this is old-hat somewhere but if you know any writings on this topic, send me references.
In the traditional model (from Bobrow’s essay in Representation and Understanding), the real world has objects and people and places and facts; there is a KRL Knowledge Representation Language in which statements about the world are written, using terms that refer to the objects in the real world. Experts use their expertise to write additional statements about the world, and an “Inference Engine” processes those statements together to derive new statements of facts.
This is like classic deduction “Socrates is a man, all men are mortal, thus Socrates is mortal” or arithmetic (37+53) by adding 7+3, write 0 carry 1 plus 3 plus 5 write 9, giving 90.
And to a first approximation, the semantic web was based on the idea of using URLs as the terms to refer to real world, and relationships, and RDF as an underlying KRL where statements consisted of triples.
Now we get to the great and horrible debate over “what is the range of the http function” which has so many untenable presumptions that it’s almost impossible to discuss. That the question makes sense. That you can talk about two resources being “the same”. That URLs are ‘unambiguous enough’, and the only question is to deal with some niggly ambiguity problems, with a proposal for new HTTP result codes.
So does http://larry.masinter.net refer to me or my web page? To my web page now or for all history, to just the HTML of the home page or does it include the images loaded, or maybe the whole site?
“http://larry.masinter.net” “looks” “good”.
So I keep on coming back to the fundamental assumption, the model for the model. Coupled with my concern that we’re struggling with identity (what is a customer, what is a visitor) in every field, and phishing and fraud on another front.
Another influence has been thinking about “speech acts”. It’s one thing to say “Socrates is a man” and completely different thing to say “Wow!”. “Wow!” isn’t an assertion (by itself), so what is it? It’s a “speech act” and you distinguish between assertions and questions and speech acts.
A different model for models, with some different properties:
Every speech is a speech act.
There are no categories into assertion, question, speech act. Each message passed is just some message intending to cause a reaction, on receipt. And information theory applies: you can’t supply more than the bits sent will carry. “http://larry.masinter.net” doesn’t intrinsically carry any more than the entropy of the string can hold. You can’t tell by any process whether it was intended to refer to me or to my web page.
Truth is too simple, make belief fundamental.
So in this model, individuals do not ‘know’ assertions, they only ‘believe’ to a degree. Some things are believed so strongly that they are treated as if they were known. Some things we don’t believe at all. A speech act accomplishes its mission if the belief of the recipient changes in the way the sender wanted. Trust is a measure of influence: your speech acts that look like statements influence my beliefs about the world insofar as I trust you. The web page telling me my account balance influences my beliefs about how much I owe.
Changing the model helps think about security
Part of the problem with security and authorization is we don’t have a good model for reasoning about it. Usually we divide the world into “Good guys” and “bad guys”: Good guys make true statements (“this web page comes from bank trustme”) while bad guys lie. (Let’s block the bad guys.) By putting trust and ambiguity at the base of the model and not as an after-patch we have a much better way of describing what we’re trying to accomplish.
Inference, induction, intuition are just different kinds of processing
In this model, you would like influence of belief to resemble logic in the cases where there is trust and those communicating have some agreement about what the terms used refer to. But inference is subject to its own flaws (“Which Socrates? What do you mean by mortal? Or ‘all men'”).
Every identifier is intrinsically ambiguous
Among all of the meanings the speaker might have meant, there is no inbound right way to disambiguate. Other context, out of band, might give the receiver of the message with a URL more information about what the sender might have meant. But part of the inference, part of the assessment of trust, would have to take into account belief about the sender’s model as to what the sender might have meant. Precision of terms is not absolute.
URNs are not ‘permanent’ nor ‘unambiguous’, they’re just terms with a registrar
I’ve written more on this which i’ll expand elsewhere. But URNs aren’t exempt from ambiguity, they’re generally just URLs with different assigned organizations to disambiguate if called on.
Metadata, linked data, are speech acts too.
When you look in or around an object on the net, you can often find additional data, trying to tell you things about the object. This is the metadata. But it isn’t “truth”, metadata is also a communication act, just one where one of the terms used is the object.
There’s more but I think I’ll stop here. What do you think?
This is about the IANA protocol parameter registries. Over in firstname.lastname@example.org people are worrying about preserving the IANA function and the relationship between IETF and IANA, because it is working well and shouldn’t be disturbed (by misplaced US political maneuvering that the long-planned transition from NTIA is somehow giving something away by the administration.)
Meanwhile, over in email@example.com, there’s a discussion of the Encodings document, being copied from WHATWG’s document of that name into W3C recommendation. See the thread (started by me), about the “false statement”.
Living Standards don’t need or want registries for most things the web use registries for now: Encodings, MIME types, URL schemes. A Living Standard has an exhaustive list, and if you want to add a new one or change one, you just change the standard. Who needs IANA with its fussy separate set of rules? Who needs any registry really?
So that’s the contradiction: why doesn’t the web need registries while other applications do? Or is IANAPLAN deluded?
It seemed natural, if you were sending files, to use MIME’s methods for doing so, in the hopes that the design constraints were similar and that implementors would already be familiar with email MIME implementations. The original file upload spec was done in IETF because at the time, all of the web, including HTML, was being standardized in the IETF. RFC 1867 was “experimental,” which in IETF used to be one way of floating a proposal for new stuff without having to declare it ready.
After some experimentation we wanted to move the spec toward standardization. Part of the process of making the proposal standard was to modularize the specification, so that it wasn’t just about uploading files in web pages. Rather, all the stuff about extending forms and names of form fields and so forth went with HTML. And the container, the holder of “form data”– independent of what kind of form you had or whether it had any files at all — went into the definition of multipart/form-data (in RFC2388). Now, I don’t know if it was “theoretical purity” or just some sense of building things that are general purpose to allow unintended mash-ups, but RFC2388 was pretty general, and HTML 3.2 and HTML 4.0 were being developed by people who were more interested in spec-ing a markup language than a form processing application, so there was a specification gap between RFC 2388 and HTML 4.0 about when and how and what browsers were supposed to do to process a form and produce multipart/form-data.
February of last year (2013) I got a request to find someone to update RFC 2388. After many months of trying to find another volunteer (most declined because of lack of time to deal with the politics) I went ahead and started work: update the spec, investigate what browsers did, make some known changes. See GitHub repo for multipart/form-data and the latest Internet Draft spec.
Now, I admit I got distracted trying to build a test framework for a “test the web forward” kind of automated test, and spent way too much time building what wound up to be a fairly arcane system. But I’ve updated the document, and recommended its “working group last call”. The only problem is that I just made stuff up based on some unvalidated guesswork reported second hand … there is no working group of people willing to do work. No browser implementor has reviewed the latest drafts that I can tell.
I’m not sure what it takes to actually get technical reviewers who will actually read the document and compare it to one or more implementations to justify the changes in the draft.
Go to it! Review the spec! Make concrete suggestions for change, comments or even better, send GitHub pull requests!
One of the main inventions of the Web was the URL. And I’ve gotten stuck trying to help fix up the standards so that they actually work.
The standards around URLs, though, have gotten themselves into an organizational political quandary to the point where it’s like many other situations where a polarized power struggle keeps the right thing from happening.
Here’s an update to an earlier description of the situation:
URLs were originally defined as ASCII only. Although it was quickly determined that it was desirable to allow non-ASCII characters, shoehorning utf-8 into ASCII-only systems was unacceptable; at the time, Unicode was not so widely deployed, and there were other issues. The tack was taken to leave “URI” alone and define a new protocol element, “IRI”; RFC 3987 published in 2005 (in sync with the RFC 3986 update to the URI definition). (This is a very compressed history of what really happened.)
The IRI-to-URI transformation specified in RFC 3987 had options; it wasn’t a deterministic path. The URI-to-IRI transformation was also heuristic, since there was no guarantee that %xx-encoded bytes in the URI were actually meant to be %xx percent-hex-encoded bytes of a utf8 encoding of a Unicode string.
To address issues and to fix URL for HTML5, a new working group was established in IETF in 2009 (The IRI working group). Despite years of development, the group didn’t get the attention of those active in WHATWG, W3C or Unicode consortium, and the IRI group was closed in 2014, with the consolation that the documents that were being developed in the IRI working group could be updated as individual submissions or within the “applications area” working group. In particular, one of the IRI working group items was to update the “scheme guidelines and registration process“, which is currently under development in IETF’s application area.
Independently, the HTML5 specs in WHATWG/W3C defined “Web Address”, in an attempt to match what some of the browsers were doing. This definition (mainly a published parsing algorithm) was moved out into a separate WHATWG document called “URL”.
The world has also moved on. ICANN has approved non-ascii top level domains, and IDN 2003 and 2008 didn’t really address IRI Encoding. Unicode consortium is working on UTS #46.
The big issue is to make the IRI -to-URI transformation non-ambiguous and stable. But I don’t know what to do about non-domain-name non-ascii ‘authority’ fields. There is some evidence that some processors are %xx-hex-encoding the UTF8 of domain names in some circumstances.
There are four umbrella organizations (IETF, W3C, WHATWG, Unicode consortium) and multiple documents, and it’s unclear whether there’s a trajectory to make them consistent:
The IRI working group closed, but work can continue in the APPS area working group. Documents sitting needing update, abandoned now, are three drafts (iri-3987bis, iri-comparison, iri-bidi-guidelines) intended originally to obsolete RFC 3987.
Other work in IETF that is relevant but I’m not as familiar with is the IDN/IDNA work for internationalizing domain names, since the rules for canonicalization, equivalence, encoding, parsing, and displaying domain names needs to be compatible with the rules for doing those things to URLs that contain domain names.
In addition, there’s quite a bit of activity around URNs and library identifiers in the URN working group, work that is ignored by other organizations.
The W3C has many existing recommendations which reference the IETF URI/IRI specs in various ways (for example, XML has its own restricted/expanded allowed syntax for URL-like-things). The HTML5 spec references something, the TAG seems to be involved, as well as the sysapps working group, I believe. I haven’t tracked what’s happened in the last few months.
Early versions of #46 and I think others recommends translating toAscii and back using punycode ? But it wasn’t specific about which schemes.
From a user or developer point of view, it makes no sense for there to be a proliferation of definitions of URL, or a large variety URL syntax categories. Yes, currently there is a proliferation of slightly incompatible implementations. This shouldn’t be a competitive feature. Yet the organizations involved have little incentive to incur the overhead of cooperation, especially since there is an ongoing power struggle for legitimacy and control. The same dynamic applies to the Encoding spec, and, to a lesser degree, handling of MIME types (sniffing) and multipart/form-data.
And my curiosity satisfied, I ‘get’ blogging, tweeting, facebook posting, linking in, although I haven’t tried pinning and instagramming. And I’m not sure what about.me is about, really, and quora sends me annoying spam which tempts me to read.
Meanwhile, I’m hardly blogging at all; I have lots of topics with something to say. Meanwhile Carol (wife) is blogging about a trip; I supply photo-captions and Internet support.
So I’m going to follow suit, try to blog daily. Blogspot for technical, Facebook for personal, tweet to announce. LinkedIn notice when there’s more to read. I want to update my site, too; more on that later.
I thought I would post here a pointer to the Adobe Standards Blog on “Forking Standards and Document Licensing” that Dave McAllister and I wrote in reaction to some of the controversy around the document license issue in W3C. Amazingly, this doesn’t seem to be as much of an issue in IETF.
I tried to explain HTTP/2.0 in my previous post. This post notes some nagging worries about HTTP/2.0 going forward. Maybe these are nonsense, but … tell me why I’m wrong ….
Faster is better, but faster for whom?
It should be no surprise that using software is more pleasant when it responds more quickly. But the effect is pronounced and the difference between “usable” and “just frustrating”. For the web, the critical time is between when the user clicks on a link and the results are legible and useful. Studies (and others) show that improving page load time has a significant effect on the use of web sites. And a primary component of web speed is the network speed: not just the bandwidth but, for the web, the latency. Much of the world doesn’t have high-speed Internet, and the web is often close to unusable.
The problem is — faster for whom? In general, when optimizing something, one makes changes that speed up common cases, even if making uncommon cases more expensive. Unfortunately, different communities can disagree about what is “common”, depending on their perspective.
Clearly, connection multiplexing helps sites that host all of their data at a single server more than it helps sites that open connection to multiple systems.
It should be a good thing that the protocol designers are basing optimizations by measuring the results on real web sites and real data. But the data being used risks a bias; so far little of the data used has been itself published and results reproduced. Decisions in the working group are being made based on limited data, and often are not reproducible or auditable.
Flow control at multiple layers can interfere
This isn’t the first time there’s been an attempt to revise HTTP/1.1; the HTTP-NG effort also tried. One of the difficulties with HTTP-NG was that there was some interaction between TCP flow control and the framing of messages at the application layer, resulting in latency spikes. And those working with SPDY report that SPDY isn’t effective without server “prioritization”, which I understand to be predictively deciding which resources the client will need first, and returning their content chunks with higher priority for being sent sooner. While some servers have added such facilities for prioritization and prediction, those mechanisms are unreported and proprietary.
While HTTP/2.0 started with SPDY, SPDY development development continues independently of HTTP/2.0. While the intention is to roll good ideas from SPDY into HTTP/2.0, there still remains the risk that the projects will fork. Whether the possibility of forking is itself positive or negative is itself controversial, but I think the bar should be higher.
There is a long-running and still unresolved debate around the guidelines for using, mandating, requiring use of, or implementation of encryption, in both HTTP/1.1 and HTTP/2.0. It’s clear that HTTP/2.0 changes the cost of multiple encrypted connections to the same host significantly, thus reducing the overhead of using encryption everywhere: Normally, setting up an encrypted channel is relatively slow, requiring a lot more network round trips to establish. With multiplexing, the setup cost only happens once, so encrypting everything is less of a problem.
But there are a few reasons why that might not actually be ideal. For example, there is also a large market for devices which monitor, adjust, redirect or otherwise interact with unencrypted HTTP traffic; a company might scan and block some kinds of information on its corporate net. Encryption everywhere will have a serious impact for sites that have these interception devices, for better or worse. And adding encryption in a situation where the traffic is already protected is less than ideal, adding unnecessary overhead.
In any case, encryption everywhere might be more feasible with HTTP/2.0 than HTTP/1.1 because of the lower overhead, but it doesn’t promise any significant advantage for privacy per se.
Need realistic measurement data
To insure that HTTP/2.0 is good enough to completely replace HTTP 1.1, it’s necessary to insure that HTTP/2.0 is better in all cases. We do not have agreement or reproducable ways of measuring performance and impact in a wide variety of realistic configurations of bandwidth and latency. Measurement is crucial, lest we introduce changes which make things worse in unanticipated situations, or wind up with protocol changes that only help the use cases important to those who attend the meetings regularly and not the unrepresented.
When setting up for the HTTP meeting in Hamburg, I was asked, reasonably enough, what the group is doing, why it was important, and my prognosis for its success. It was hard to explain, so I thought I’d try to write up my take “why HTTP/2.0?” Corrections, additions welcome.
HTTP Started Simple
- Using DNS, client get the IP address of the server in the URL
- opens a TCP connection to that server’s address on the port named in the URL
- client writes “GET” and the path of the URL onto the connection
- the server responds with HTML for the page
- the client reads the HTML and displays it
- the connection is closed
While each header has its uses and justification, and many are optional, headers add both size and complexity to every HTTP request. When HTTP headers get big, there is more chance of delay (e.g., the request no longer fits in a single packet), and the same header information gets repeated.
Many More Requests per Web Page
HTTP is stateless
Neither client nor server need to allocate memory or remember anything from one request/response to the next. This is an important characteristic of the web that allows highly popular web sites to serve many independent clients simultaneously, because the server need not allocate and manage memory for each client. Headers must be repeatedly sent, to maintain the stateless nature of the protocol.
Congestion and Flow Control
Flow control in TCP, like traffic metering lights, throttles a sender’s output to match the receivers capability to read. Using many simultaneous connections does not work well, because the streams use the same routers and bridges which must manage the streams independently, but the TCP flow control algorithms do not, cannot, take into account the other traffic on the other connections. Also, setting up a new connection potentially involves additional latency, and opening encrypted connections is even slower since it requires more round-trips of communication of information.
While these problems were well-recognized quite a while ago, work on optimizing HTTP labeled “HTTP-NG” (next generation) foundered. But more recent work (and deployment) by Google on a protocol called SPDY shows that, at least in some circumstances, HTTP can be replaced with something which can improve page load time. SPDY is already widely deployed, but there is an advantage in making it a standard, at least to get review by those using HTTP for other applications. The IETF working group finishing the HTTP/1.1 second edition (“HTTPbis”) has been rechartered to develop HTTP/2.0 which addresses performance problems. The group decided to start with (a subset of) SPDY and make changes from there.
HTTP/2.0 builds on HTTP/1.1; for the most part, it is not a reduction of the complexity of HTTP, but rather adds new features primarily for performance.
The obvious thing to do to reduce the size of something is to try to compress it, and HTTP headers compress well. But the goal is not just to speed transmission, it’s also to reduce parse time of the headers. The header compression method is undergoing significant changes.
Push vs. Pull
A “push” is when the server sends a response that hadn’t been asked for. HTTP semantics are strictly request followed by response, and one of the reasons why HTTP was considered OK to let out through a firewall that filtered out incoming requests. When the server can “push” some content to clients even when the client didn’t explicitly request it, it is “server push”. Push in HTTP/2.0 uses a promise “A is what you would get if you asked for B”, that is, a promise of the result of a potential pull. The HTTP/2.0 semantics are developed in such a way that these “push” requests look like they are responses to requests not made yet, so it is called a “push promise”. Making use of this capability requires redesigning the web site and server to make proper use of this capability.
With this background, I can now talk about some of the ways HTTP/2.0 can go wrong. Coming up!