Implementing Opportunistic Encryption Henry Spencer & D. Hugh Redelmeier Version 4+, 15 Dec 2000 Updates Major changes since last version: "Negotiation Issues" section discussing some interoperability matters, plus some wording cleanup. Some issues arising from discussions at OLS are not yet resolved, so there will almost certainly be another version soon. xxx incoming could be opportunistic or RW. xxx any way of saving unaware implementations??? xxx compression needs mention. Introduction A major long-term goal of the FreeS/WAN project is opportunistic encryption: a security gateway intercepts an outgoing packet aimed at a new remote host, and quickly attempts to negotiate an IPsec tunnel to that host's security gateway, so that traffic can be encrypted and authenticated without changes to the host software. (This generalizes trivially to the end-to-end case where host and security gateway are one and the same.) If the attempt fails, the packet (or a retry thereof) passes through in clear or is dropped, depending on local policy. Prearranged tunnels bypass all this, so static VPNs can coexist with opportunistic encryption. xxx here Although significant intelligence about all this is necessary at the initiator end, it's highly desirable for little or no special machinery to be needed at the responder end. In particular, if none were needed, then a security gateway which knows nothing about opportunistic encryption could nevertheless participate in some opportunistic connections. IPSEC gives us the low-level mechanisms, and the key-exchange machinery, but there are some vague spots (to put it mildly) at higher levels. One constraint which deserves comment is that the process of tunnel setup should be quick. Moreover, the decision that no tunnel can be created should also be quick, since that will be a common case, at least in the beginning. People will be reluctant to use opportunistic encryption if it causes gross startup delays on every connection, even connections which see no benefit from it. Win or lose, the process must be rapid. There's nothing much we can do to speed up the key exchange itself. (The one thing which conceivably might be done is to use Aggressive Mode, which involves fewer round trips, but it has limitations and possible security problems, and we're reluctant to touch it.) What we can do, is to make the other parts of the setup process as quick as possible. This desire will come back to haunt us below. :-) A further note is that we must consider the processing at the responder end as well as the initiator end. Several pieces of new machinery are needed to make this work. Here's a brief list, with details considered below. + Outgoing Packet Interception. KLIPS needs to intercept packets which likely would benefit from tunnel setup, and bring them to Pluto's attention. There needs to be enough memory in the process that the same tunnel doesn't get proposed too often (win or lose). + Smart Connection Management. Not only do we need to establish tunnels on request, once a tunnel is set up, it needs to be torn down eventually if it's not in use. It's also highly desirable to detect the fact that it has stopped working, and do something useful. Status changes should be coordinated between the two security gateways unless one has crashed, and even then, they should get back into sync eventually. + Security Gateway Discovery. Given a packet destination, we must decide who to attempt to negotiate a tunnel with. This must be done quickly, win or lose, and reliably even in the presence of diverse network setups. + Authentication Without Prearrangement. We need to be sure we're really talking to the intended security gateway, without being able to prearrange any shared information. He needs the same assurance about us. + More Flexible Policy. In particular, the responding Pluto needs a way to figure out whether the connection it is being asked to make is okay. This isn't as simple as just searching our existing conn database -- we probably have to specify *classes* of legitimate connections. Conveniently, we have a three-letter acronym for each of these. :-) Note on philosophy: we have deliberately avoided providing six different ways to do each step, in favor of specifying one good one. Choices are provided only when they appear to be necessary. (Or when we are not yet quite sure yet how best to do something...) OPI, SCM Smart Connection Management would be quite useful even by itself, requiring manual triggering. (Right now, we do the manual triggering, but not the other parts of SCM.) Outgoing Packet Interception fits together with SCM quite well, and improves its usefulness further. Going through a connection's life cycle from the start... OPI itself is relatively straightforward, aside from the nagging question of whether the intercepted packet is put on hold and then released, or dropped. Putting it on hold is preferable; the alternative is to rely on the application or the transport layer re-trying. The downside of packet hold is extra resources; the downside of packet dropping is that IPSEC knows *when* the packet can finally go out, and the higher layers don't. Either way, life gets a little tricky because a quickly-retrying application may try more than once before we know for sure whether a tunnel can be set up, and something has to detect and filter out the duplications. Some ARP implementations use the approach of keeping one packet for an as-yet-unresolved address, and throwing away any more that appear; that seems a reasonable choice. (Is it worth intercepting *incoming* packets, from the outside world, and attempting tunnel setup based on them? Perhaps... if, and only if, we organize AWP so that non-opportunistic SGs can do it somehow. Otherwise, if the other end has not initiated tunnel setup itself, it will not be prepared to do so at our request.) Once a tunnel is up, packets going into it naturally are not intercepted by OPI. However, we need to do something about the flip side of this too: after deciding that we *cannot* set up a tunnel, either because we don't have enough information or because the other security gateway is uncooperative, we have to remember that for a while, so we don't keep knocking on the same locked door. One plausible way of doing that is to set up a bypass "tunnel" -- the equivalent of our current %passthrough connection -- and have it managed like a real SCM tunnel (finite lifespan etc.). This sounds a bit heavyweight, but in practice, the alternatives all end up doing something very similar when examined closely. Note that we need an extra variant of this, a block rather than a bypass, to cover the case where local policy dictates that packets *not* be passed through; we still have to remember the fact that we can't set up a real tunnel. When to tear tunnels down is a bit problematic, but if we're setting up a potentially unbounded number of them, we have to tear them down *somehow* *sometime*. It seems fairly obvious that we set a tentative lifespan, probably fairly short (say 1min), and when it expires, we look to see if the tunnel is still in use (say, has had traffic in the last half of the lifespan). If so, we assign it a somewhat longer lifespan (say 10min), after which we look again. If not, we close it down. (This lifespan is independent of key lifetime; it is just the time when the tunnel's future is next considered. This should happen reasonably frequently, unlike rekeying, which is costly and shouldn't be too frequent.) Multi-step backoff algorithms probably are not worth the trouble; looking every 10min doesn't seem onerous. For the tunnel-expiry decision, we need to know how long it has been since the last traffic went through. A more detailed history of the traffic does not seem very useful; a simple idle timer (or last-traffic timestamp) is both necessary and sufficient. And KLIPS already has this. As noted, default initial lifespan should be short. However, Pluto should keep a history of recently-closed tunnels, to detect cases where a tunnel is being repeatedly re-established and should be given a longer lifespan. (Not only is tunnel setup costly, but it adds user-visible delay, so keeping a tunnel alive is preferable if we have reason to suspect more traffic soon.) Any tunnel re-established within 10min of dying should have 10min added to its initial lifespan. (Just leaving all tunnels open longer is unappealing -- adaptive lifetimes which are sensitive to the behavior of a particular tunnel are wanted. Tunnels are relatively cheap entities for us, but that is not necessarily true of all implementations, and there may also be administrative problems in sorting through large accumulations of idle tunnels.) It might be desirable to have detailed information about the initial packet when determining lifespans. HTTP connections in particular are notoriously bursty and repetitive. Arguably it would be nice to monitor TCP connection status. A still-open TCP connection is almost a guarantee that more traffic is coming, while the closing of the only TCP connection through a tunnel is a good hint that none is. But the monitoring is complex, and it doesn't seem worth the trouble. IKE connections likewise should be torn down when it appears the need has passed. They should linger longer than the last tunnel they administer, just in case they are needed again; the cost of retaining them is low. An SG with only a modest number of them open might want to simply retain each until rekeying time, with more aggressive management cutting in only when the number gets large. (They should be torn down eventually, if only to minimize the length of a status report, but rekeying is the only expensive event for them.) It's worth remembering that tunnels sometimes go down because the other end crashes, or disconnects, or has a network link break, and we don't get any notice of this in the general case. (Even in the event of a crash and successful reboot, we won't hear about it unless the other end has specific reason to talk IKE to us immediately.) Of course, we have to guard against being too quick to respond to temporary network outages, but it's not quite the same issue for us as for TCP, because we can tear down and then re-establish a tunnel without any user-visible effect except a pause in traffic. And if the other end does go down and come back up, we and it can't communicate *at all* (except via IKE) until we tear down our tunnel. So... we need some kind of heartbeat mechanism. Currently there is none in IKE, but there is discussion of changing that, and this seems like the best approach. Doing a heartbeat at the IP level will not tell us about a crash/reboot event, and sending heartbeat packets through tunnels has various complications (they should stop at the far mouth of the tunnel instead of going on to a subnet; they should not count against idle timers; etc.). Heartbeat exchanges obviously should be done only when there are tunnels established *and* there has been no recent incoming traffic through them. It seems reasonable to do them at lifespan ends, subject to appropriate rate limiting when more than one tunnel goes to the same other SG. When all traffic between the two ends is supposed to go via the tunnel, it might be reasonable to do a heartbeat -- subject to a rate limiter to avoid DOS attacks -- if the kernel sees a non-tunnel non-IKE packet from the other end. If a heartbeat gets no response, try a few (say 3) pings to check IP connectivity; if one comes back, try another heartbeat; if it gets no response, the other end has rebooted, or otherwise been re-initialized, and its tunnels should be torn down. If there's no response to the pings, note the fact and try the sequence again at the next lifespan end; if there's nothing then either, declare the tunnels dead. Finally... except in cases where we've decided that the other end is dead or has rebooted, tunnel teardown should always be coordinated with the other end. This means interpreting and sending Delete notifications, and also Initial-Contacts. Receiving a Delete for the other party's tunnel SAs should lead us to tear down our end too -- SAs (SA bundles, really) need to be considered as paired bidirectional entities, even though the low-level protocols don't think of them that way. SGD, AWP Given a packet destination, how do we decide who to (attempt to) negotiate a tunnel with? And as a related issue, how do the negotiating parties authenticate each other? DNSSEC obviously provides the tools for the latter, but how exactly do we use them? Having intercepted a packet, what we know is basically the IP addresses of source and destination (plus, in principle, some information about the desired communication, like protocol and port). We might be able to map the source address to more information about the source, depending on how well we control our local networks, but we know nothing further about the destination. The obvious first thing to do is a DNS reverse lookup on the destination address; that's about all we can do with available data. Ideally, we'd like to get all necessary information with this one DNS lookup, because DNS lookups are time-consuming -- all the more so if they involve a DNSSEC signature-checking treewalk by the name server -- and we've got to hurry. While it is unusual for a reverse lookup to yield records other than PTR records (or possibly CNAME records, for RFC 2317 classless delegation), there's no reason why it can't. (For purposes like logging, a reverse lookup is usually followed by a forward lookup, to verify that the reverse lookup wasn't lying about the host name. For our purposes, this is not vital, since we use stronger authentication methods anyway.) While we want to get as much data as possible (ideally all of it) from one lookup, it is useful to first consider how the necessary information would be obtained if DNS lookups were instantaneous. Two pieces of information are absolutely vital at this point: the IP address of the other end's security gateway, and the SG's public key*. (* Actually, knowledge of the key can be postponed slightly -- it's not needed until the second exchange of the negotiations, while we can't even start negotiations without knowing the IP address. The SG is not necessarily on the plain-IP route to the destination, especially when multiple SGs are present.) Given instantaneous DNS lookups, we would: + Start with a reverse lookup to turn the address into a name. + Look for something like RFC-2782 SRV records using the name, to find out who provides this particular service. If none comes back, we can abandon the whole process. + Select one SRV record, which gives us the name of a target host (plus possibly one or more addresses, if the name server has supplied address records as Additional Data for the SRV records -- this is recommended behavior but is not required). + Use the target name to look up a suitable KEY record, and also address record(s) if they are still needed. This gives us the desired address(es) and key. However, it requires three lookups, and we don't even find out whether there's any point in trying until after the second. With real DNS lookups, which are far from instantaneous, some optimization is needed. At the very least, typical cases should need fewer lookups. So when we do the reverse lookup on the IP address, instead of asking for PTR, we ask for TXT. If we get none, we abandon opportunistic negotiation, and set up a bypass/block with a relatively long life (say 6hr) because it's not worth trying again soon. (Note, there needs to be a way to manually force an early retry -- say, by just clearing out all memory of a particular address -- to cover cases where a configuration error is discovered and fixed.) xxx need to discuss multi-string TXTs In the results, we look for at least one TXT record with content "X-IPsec-Server(nnn)=a.b.c.d kkk", following RFC 1464 attribute/value notation. (The "X-" indicates that this is tentative and experimental; this design will probably need modification after initial experiments.) Again, if there is no such record, we abandon opportunistic negotiation. "nnn" and the parentheses surrounding it are optional. If present, it specifies a priority (low number high priority), as for MX records, to control the order in which multiple servers are tried. If there are no priorities, or there are ties, pick one randomly. "a.b.c.d" is the dotted-decimal IP address of the SG. (Suitable extensions for IPv6, when the time comes, are straightforward.) "kkk" is either an RSA-MD5 public key in base-64 notation, as in the text form of an RFC 2535 KEY record, or "@hhh". In the latter case, hhh is a DNS name, under which one Host/Authentication/IPSEC/RSA-MD5 KEY record is present, giving the server's authentication key. (The delay of the extra lookup is undesirable, but practical issues of key management may make it advisable not to duplicate the key itself in DNS entries for many clients.) It unfortunately does appear that the authentication key has to be associated with the server, not the client behind it. At the time when the responder has to authenticate our SG, it does not know which of its clients we are interested in (i.e., which key to use), and there is no good way to tell it. (There are some bad ways; this decision may merit re-examination after experimental use.) The responder authenticates our SG by doing a reverse lookup on its IP address to get a Host/Authentication/IPSEC/RSA-MD5 KEY record. He can attempt this in parallel with the early parts of the negotiation (since he knows our SG IP address from the first negotiation packet), at the risk of having to abandon the attempt and do a different lookup if we use something different as our ID (see below). Unfortunately, he doesn't yet know what client we will claim to represent, so he'll need to do another lookup as part of phase 2 negotiation (unless the client *is* our SG), to confirm that the client has a TXT X-IPsec-Server record pointing to our SG. (Checking that the record specifies the same key is not important, since the responder already has a trustworthy key for our SG.) Also unfortunately, opportunistic tunnels can only have degenerate subnets (/32 subnets, containing one host) at their ends. It's superficially attractive to negotiate broader connections... but without prearrangement, you don't know whether you can trust the other end's claim to have a specific subnet behind it. Fixing this would require a way to do a reverse lookup on the *subnet* (you cannot trust information in DNS records for a name or a single address, which may be controlled by people who do not control the whole subnet) with both the address and the mask included in the name. Except in the special case of a subnet masked on a byte boundary (in which case RFC 1035's convention of an incomplete in-addr.arpa name could be used), this would need extensions to the reverse-map name space, which is awkward, especially in the presence of RFC 2317 delegation. (IPv6 delegation is more flexible and it might be easier there.) There is a question of what ID should be used in later steps of negotiation. However, the desire not to put more DNS lookups in the critical path suggests avoiding the extra complication of varied IDs, except in the Road Warrior case (where an extra lookup is inevitable). Also, figuring out what such IDs *mean* gets messy. To keep things simple, except in the RW case, all IDs should be IP addresses identical to those used in the packet headers. For Road Warrior, the RW must be the initiator, since the home-base SG has no idea what address the RW will appear at. Moreover, in general the RW does not control the DNS entries for his address. This inherently denies the home base any authentication of the RW's IP address; the most it can do is to verify an identity he provides, and perhaps decide whether it wishes to talk to someone with that identity, but this does not verify his right to use that IP address -- nothing can, really. (That may sound like it would permit some man-in-the-middle attacks, but the RW can still do full authentication of the home base, so a man in the middle cannot successfully impersonate home base. Furthermore, a man in the middle must impersonate both sides for the DH exchange to work. So either way, the IKE negotiation falls apart.) A Road Warrior provides an FQDN ID, used for a forward lookup to obtain a Host/Authentication/IPSEC/RSA-MD5 KEY record. (Note, an FQDN need not actually correspond to a host -- e.g., the DNS data for it need not include an A record.) This suffices, since the RW is the initiator and the responder knows his address from his first packet. Certain situations where a host has a more-or-less permanent IP address, but does not control its DNS entries, must be treated essentially like Road Warrior. It is unfortunate that DNS's old inverse-query feature cannot be used (nonrecursively) to ask the initiator's local DNS server whether it has a name for the address, because the address will almost always have been obtained from a DNS name lookup, and it might be a lookup of a name whose DNS entries the host *does* control. (Real examples of this exist: the host has a preferred name whose host-controlled entry includes an A record, but a reverse lookup on the address sends you to an ISP-controlled name whose entry has an A record but not much else.) Alas, inverse query is long obsolete and is not widely implemented now. There are some questions in failure cases. If we cannot acquire the info needed to set up a tunnel, this is the no-tunnel-possible case. If we reach an SG but negotiation fails, this too is the no-tunnel-possible case, with a relatively long bypass/block lifespan (say 1hr) since fruitless negotiations are expensive. (In the multiple-SG case, it seems unlikely to be worthwhile to try other SGs just in case one of them might have a configuration permitting successful negotiation.) Finally, there is a sticky problem with timeouts. If the other SG is down or otherwise inaccessible, in the worst case we won't hear about this except by not getting responses. Some other, more pathological or even evil, failure cases can have the same result. The problem is that in the case where a bypass is permitted, we want to decide whether a tunnel is possible quickly. It gets even worse if there are multiple SGs, in which case conceivably we might want to try them all (since some SGs being up when others are down is much more likely than SGs differing in policy). The patience setting needs to be configurable policy, with a reasonable default (to be determined by experiment). If it expires, we simply have to declare the attempt a failure, and set up a bypass/block. (Setting up a tentative bypass/block, and replacing it with a real tunnel if remaining attempts do produce one, looks attractive at first glance... but exposing the first few seconds of a connection is often almost as bad as exposing the whole thing!) Such a bypass/block should have a short lifespan, say 10min, because the SG(s) might be only temporarily unavailable. The flip side of IKE waiting for a timeout is that all other forms of feedback, e.g. "host not reachable", should be *ignored*, because you cannot trust them! This may need kernel changes. Can AWP be done by non-opportunistic SGs? Probably not; existing SG implementations generally aren't prepared to do anything suitable, except perhaps via the messy business of certificates. There is one borderline exception: some implementations rely on LDAP for at least some of their information fetching, and it might be possible to substitute a custom LDAP server which does the right things for them. Feasibility of this depends on details, which we don't know well enough. [This could do with a full example, a complete packet by packet walkthrough including all DNS and IKE traffic.] MFP Our current conn database simply isn't flexible enough to cover all this properly. In particular, the responding Pluto needs a way to figure out whether the connection it is being asked to make is legitimate. This is more subtle than it sounds, given the problem noted earlier, that there's no clear way to authenticate claims to represent a non-degenerate subnet. Our database has to be able to say "a connection to any host in this subnet is okay" or "a connection to any subnet within this subnet is okay", rather than "a connection to exactly this subnet is okay". (There is some analogy to the Road Warrior case here, which may be relevant.) This will require at least a re-interpretation of ipsec.conf. Interim stages of implementation of this will require a bit of thought. Notably, we need some way of dealing with the lack of fully signed DNSSEC records. Without user interaction, probably the best we can do is to remember the results of old fetches, compare them to the results of new fetches, and complain and disbelieve all of it if there's a mismatch. This does mean that somebody who gets fake data into our very first fetch will fool us, at least for a while, but that seems an acceptable tradeoff. Negotiation Issues There are various options which are nominally open to negotiation as part of setup, but which have to be nailed down at least well enough that opportunistic SGs can reliably interoperate. Somewhat arbitrarily and tentatively, opportunistic SGs must support Main Mode, Oakley group 5 for D-H, 3DES encryption and MD5 authentication for both ISAKMP and IPsec SAs, RSA digital-signature authentication with keys between 2048 and 8192 bits, and ESP doing both encryption and authentication. They must do key PFS in Quick Mode, but not identity PFS. What we need from DNS Fortunately, we don't need any new record types or suchlike to make this all work. We do, however, need attention to a couple of areas in DNS implementation. First, size limits. Although the information we directly need from a lookup is not enormous -- the only potentially-big item is the KEY record, and there should be only one of those -- there is still a problem with DNSSEC authentication signatures. With a 2048-bit key and assorted supporting information, we will fill most of a 512-byte DNS UDP packet... and if the data is to have DNSSEC authentication, at least one quite large SIG record will come too. Plus maybe a TSIG signature on the whole response, to authenticate it to our resolver. So: DNSSEC-capable name servers must fix the 512-byte UDP limit. We're told there are provisions for this; implementation of them is mandatory. Second, interface. It is unclear how the resolver interface will let us ask for DNSSEC authentication. We would prefer to ask for "authentication where possible", and get back the data with each item flagged by whether authentication was available (and successful!) or not available. Having to ask separately for authenticated and non-authenticated data would probably be acceptable, *provided* both will be cached on the first request, so the two requests incur only one set of (non-local) network traffic. Either way, we want to see the name server and resolver do this for us; that makes sense in any case, since it's important that verification be done somewhere where it can be cached, the more centrally the better. Finally, a wistful note: the ability to do a limited form of inverse queries (an almost forgotten feature), to ask the local name server which hostname it recently mapped to a particular address, would be quite helpful. Note, this is *NOT* the same as a reverse lookup, and crude fakes like putting a dotted-decimal address in brackets do not suffice.