Running ancient 1990 BIND 4 on modern Internet
The following are observations and notes from running named 4.8.3 (1990) from 4.3BSD-Reno ported to 386BSD running on NetBSD as a caching resolver and an authoritative server. We often see out-of-date nameservers in our research — initially noticed by EDNS0 failures, acting as open resolvers, or serving unrelated (out of bailiwick) names.
Curious how a really old nameserver would behave in today's Internet? It should work. (The experimenter had spent the last 15+ years exhaustively testing various DNS servers including BIND 8, writing DNS code, and authoring DNS books and documentation.) Several RFCs that extend DNS specifically discuss backwards-compatibility and support with existing resolver and server implementations. The goal was not to troubleshoot bugs or code too much, but to see how it would behave in the real current world. This also didn't experiment with known documented problems with old BIND 4 like described in RFC 1912.
This used BIND named 4.8.3 from 4.3BSD-Reno (June 1990) that was ported to the 32-bit i386 386BSD 0.0.0. The code diffs between them were quickly reviewed and it appeared to be only portability fixes. See the 4.3BSD-Reno named code at TUHS. Over 120 DNS RFCs have been published since then. In perspective, only around 15 DNS-related RFCs before then. But at least the standard specification, RFC 1035, had been out for a few years.
Instead of further porting to a modern compiler and Unix system, used an existing binary (a.out little-endian 32-bit demand paged pure executable).
named 4.8.3 Wed Feb 26 02:54:26 GMT 1992 root@odysseus.TeleMuse.COM:/usr/src/usr.sbin/named
And instead of running 386BSD in an emulator, ran on NetBSD/amd64 8.2_STABLE using a custom kernel with these options:
options EXEC_AOUT # required by binaries from before 1.5 options COMPAT_NOMID # NetBSD 0.8, 386BSD, and BSDI
After rebooting, the kernel had a builtin module for the "ancient" a.out(5) executable binary format with support for detecting different binary loading conventions (like ZMAGIC) for executables without a machine ID.
Then had to disable NetBSD's security restriction of mmapping the virtual address 0 by running:
$ sudo sysctl -w vm.user_va0_disable=0
Then could run 386BSD 0.0.0 binaries. The named logged to the syslog daemon facility, and saw its logs in /var/log/messages. Debugging was enabled and increased by sending USR1 signals to it and logged to /var/tmp/named.run.
A copy of the 386BSD /etc/namedb/named.boot configuration file was made and commented out the sortlist option and enabled some extra primary zone files for testing. Ran it as root; originally ran as a less privileged user with an alternate port, but that port was also used for outgoing queries. It had one error at startup that was ignored as the system appeared to work anyways and it got periodic recvfrom errors (even after doubling net.inet.udp.recvspace):
Jul 17 11:03:01 t1 named: Return from getdtablesize() > FD_SETSIZE Jul 17 13:19:33 t1 named: recvfrom: No buffer space available
It was listening on both UDP and TCP port 53, as expected. It only knows about IPv4 (of course).
$ sockstat -n | grep 53 root named 4857 4 tcp *.53 *.* root named 4857 5 udp 192.168.1.2.53 *.* root named 4857 6 udp 127.0.0.1.53 *.* root named 4857 7 udp *.53 *.*
Running the 1990 Resolver
The existing June 1990 initial cache data (hints) for root domain servers still works, because one 30+ year old entry is still valid. Originally at Rensselaer Polytechnic Institute, it was acquired and ran by PSINet (the first commercial ISP) by 1990, and now owned by Cogent and known as C.ROOT-SERVERS.NET (renamed in 1995).
. IN NS C.NYSER.NET. C.NYSER.NET. IN A 18.104.22.168
Standard gethostbyname(3) style lookups worked, such as with the ping and host(1) commands or with 386BSD's BIND nsquery tool. A dig against it immediately failed with a status of FORMERR (see RFC 6891) and friendly message:
;; WARNING: EDNS query returned status FORMERR - retry with '+noedns'
A variety of dig's worked using the +noedns command-line option. Viewing the named's cache is done by sending a SIGINT to the process and looking at /var/tmp/named_dump.db. Also the debugging and tcpdump was used to watch its behavior and the packet traffic.
Then made sure the NetBSD system used it (/etc/resolv.conf) and already had stopped unbound:
$ sudo /etc/rc.d/unbound stop
And configured a busy Ubuntu Linux workstation to use it too:
$ sudo resolvectl dns enp1s0 192.168.1.2
Some queries for A, AAAA, and MX records for the top 20 domains were done, but twitter.com, linkedin.com, and amazon.com returned SERVFAIL. By reviewing packet traces and the named debugging, It sent back the response (SERVFAIL) before it finished getting addresses for the nameservers. This happened when following delegated (Authority Section) NS records at three or more deep, especially when no Additional Section glue address records. But shortly later failing queries like these started working (since the addresses for the nameservers were received).
Also didn't see it retrying over TCP when getting TC truncated (also noticed via packet capture). Never saw any packets with UDP size over 512. It did handle incoming TCP though.
Unknown record types, like AAAA and DNSKEY (which didn't exist back then), could be looked up, but they were not seen in the cache dump as seen in /var/tmp/named_dump.db. They also had non-decrementing TTLs in later results indicating they were not cached. The debugging appeared to show it attempting to cache them (and code review showed it didn't seem to fail even with outputting as "unknown").
doupdate: dname ns.netbsd.org type 28 class 1 ttl 86400 unknown type 28 ... QUESTIONS: ftp.NetBSD.org, type = 28, class = IN
An example of a lookup problem was:
$ dig +tries=1 incoming.telemetry.mozilla.org AAAA +noedns ;; Question section mismatch: got pipeline-incoming-prod-elb-149169523.us-west-2.elb.amazonaws.com/AAAA/IN ; <<>> DiG 9.10.5-P1 <<>> +tries=1 incoming.telemetry.mozilla.org AAAA +noedns ;; global options: +cmd ;; connection timed out; no servers could be reached
The debugging showed the sent Question Section was:
QUESTIONS: pipeline-incoming-prod-elb-149169523.us-west-2.elb.amazonaws.com, type = 28, class = IN
This bug was caused by two CNAME lookups that replaced the original query name. Also saw this for wiki.mozilla.org, www.usenix.org, and others.
Some outside nameservers were sending back EDNS0 OPT pseudo-section resource records (in the Additional Section) even though this ancient named cannot and didn't request them:
type = 41, class = 4096, ttl = 0 secs, dlen = 0
See RFC 6891 from 2013. The OPT RR has RR type 41, class is the requestor's UDP payload size, ttl is extended RCODE and flags and version, and dlen is the length of all RDATA. These servers should not return this OPT pseudosection RR: "... response MUST not include OPT record if not in request ..." per the specification and "MUST NOT be cached [or] forwarded". While didn't see any in the /var/tmp/named_dump.db cache dump, debugging messages were seen about attempting to update the internal cache with them.
After a couple days, the named process crashed a few times. (Troubleshooting wasn't done to see how to get the NetBSD or 386BSD gdb to read the a.out executable and the core dump generated on NetBSD system.) Likely the problems were due to receiving unknown OPT records and maybe other unknown responses which caused the named to overflow or take wrong code paths. It started debugging output garbage such as:
NAME SERVERS: <84> type = 49218, class = 2, ttl = 18 hours 12 mins 18 secs, dlen = 33174 ??? <94>^A type = 163, class = 62694, ttl = 194 days 13 hours 42 mins 28 secs, dlen = 2181 ??? ... . type = 512, class = 256, ttl = 512 days, dlen = 4356 ??? ... . type = 256, class = IN, ttl = 20194 days 21 hours 3 mins 4 secs, dlen = 28982 ??? . type = 36229, class = 248, ttl = 49709 days 18 hours 1 min 12 secs, dlen = 64476 ??? ADDITIONAL RECORDS: . type = 33732, class = 3263, ttl = 8712 days 8 hours 19 mins 12 secs, dlen = 33596 ???
Reproducing the crashes was unable to be done using some of the same last queries seen in the debugging. Maybe if did the same queries, in the same order, assuming the found delegated nameservers also behaved the same and in same order, the same failures would be able to be reproduced.
Using the 1990 Authoritative Server
The single named then (and today) runs both caching recursive service and authoritative publishing. The named.boot has few configurations and is simple to setup. The zone files are basically the same syntax.
The provided localhost.rev zone file has a dot in the SOA serial number. Serial 1.4 turned into 10004. That is documented in RFC 1912 (2.2).
Didn't troubleshoot fully, but could not have the entire SOA record on a single line unless had one set of ( ) parentheses around the numbers.
Trying to load a zone with newer record types didn't fail but just skipped over the problem records. So even if some entries don't load the rest of zone file still loads (The other working records could be changed and their new data is served).
Line 5: Unknown type: AAAA. odt.example.net: line 5: database format error ('AAAA', 19) Line 7: Unknown type: DNSKEY. odt.example.net: line 7: database format error ('DNSKEY', 19) d='www.odt.example.net', c=1, t=1, ttl=0, data='192.168.1.2' db_update(www.odt.example.net, 0xdf600, 0xdf600, 01, 0x146600) db_update: adding df600 d='www2.odt.example.net', c=1, t=5, ttl=0, data='t1.m.example.net.' db_update(www2.odt.example.net, 0xdf540, 0xdf540, 01, 0x146600) db_update: adding df540 zone type 1: 'odt.example.net' z_time 0, z_refresh 0
The named-xfer process with a configured secondary (for incoming) wasn't tried, but manual AXFR requests over TCP against the server were done. It sent back the starting and ending SOAs with records between as expected. But dig reported two warnings:
;; WARNING: ID mismatch: expected ID 49336, got 0 ... ;; Warning: query response not set
It appears the initial query transaction ID and the QR flag were not sent back in the AXFR response.
Our test suite has around 100 checks for DNS, DNSSEC, and domains over IPv4, IPv6, TCP, and UDP, based on best practices, recommendations, and requirements from registeries, IETF RFCs, and government mandates. A domain hosted on this named 4.8.3 server was tested. (The IPv6 tests were not performed; it doesn't know IPv6.)
- D102610 Authoritative test query should not return Recursion Available flag (RA): FAIL. Well let's ignore this since this old named doesn't have a configuration to handle the resolver separately from the primary/secondary server in the same process. But this is an example that often indicates a misconfiguration, including actually running an open resolver, or very out-of-date and likely insecure software.
- D106060 Checking Disabled (CD) bit should not be set in response from authoritative server: WARNING. This is ignored too, especially since this is recent behavior for BIND still (but some other authoritative servers don't copy it back) and since this is DNSSEC related. Normally, this is copied back in the response by a security-aware validating nameserver.
- D104560 Request using EDNS0 Extension Mechanisms for DNS OPT pseudo-RR does not fail: WARNING. As discussed above, this named doesn't know how to handle EDNS queries and likely starts to fail if wrongly receiving EDNS records in responses to its own queries. Again this is an example of most-likely an out-of-date and insecure server.
- D101300 SOA record should have corresponding DNSSEC signature: FAIL. Also ignore this. It doesn't even have EDNS! It can lookup some DNSSEC-related records, but has no idea about handling or sending combination records or flags.
This old named was used for over six days and handled well over 93,000 local queries and it performed over 250,000 outgoing queries (That is how many were logged.) Overall the ancient named 4.8.3 mostly worked in the modern DNS world. Periodically there were names that wouldn't resolve but would work soon later. It had a few crashes. It was missing some features that are often not required. We don't plan to continue using it.
What is the oldest nameserver that still somewhat works in today's DNS? Anyone try the server from 4.3BSD (VAX) in the last 20 years?
CNAME Synthesis for DNAME
It was suggested to test CNAME synthesis for DNAMEs. DNAME records can be used to map child labels to other domains, such as ftp.foo.example.net to ftp.example.org. The external authoritative server answers with the original DNAME and a synthesized CNAME to map the original query name to a new target name. (This is documented in RFC 6672, and was introduced as in RFC 2672 in 1999.)
named 4.8.3 does not know about DNAME and as an authoritative service it cannot be configured to even serve Type 39 regardless of processing it further.
The named 4.8.3 recursion works with DNAME's synthesized CNAME equivalent. For example, a DNAME for abc.example.net with target of example.org with a query for an A record from the old named would get this result and answers (from the debugging):
HEADER: opcode = QUERY, id = 21036, rcode = NOERROR header flags: qr aa rd ra qdcount = 1, ancount = 2, nscount = 0, arcount = 0 QUESTIONS: www.abc.example.net, type = A, class = IN ANSWERS: abc.example.net type = 39, class = IN, ttl = 13 hours 53 mins 20 secs, dlen = 12 ??? www.abc.example.net type = CNAME, class = IN, ttl = 13 hours 53 mins 20 secs, dlen = 6 domain name = www.example.org
The above type 39 is unhandled and its target isn't even understood (???). The old named will follow the CNAME though and return back to its client the CNAME, the Address (for the CNAME targea)t, but no DNAME. (It may also return the Authoritative Section and Additional Section records related to the CNAME target.)
HEADER: opcode = QUERY, id = 49581, rcode = NOERROR header flags: qr rd ra qdcount = 1, ancount = 2, nscount = 5, arcount = 3 QUESTIONS: www.abc.example.net, type = A, class = IN ANSWERS: www.abc.example.net type = CNAME, class = IN, ttl = 13 hours 53 mins 18 secs, dlen = 13 domain name = www.example.org www.example.org type = A, class = IN, ttl = 10 mins, dlen = 4 internet address = 22.214.171.124