It may still be possible to audit other STHs, and to scan new entries
up to the latest verified STH. This allows Cert Spotter to continue
to make forward progress even if a log is persistently skewed (as the
DigiCert has been lately).
Also, rework some code to be simpler and less redundant.
Include the tree size in plain decimal, since it's more user-friendly.
Don't include tree size in hash (redundant now that we're storing it
outside of hash) or version (implied by signature).
1. Instead of storing a single STH per log, we now store one verified
STH and any number of unverified STHs. When we process a log, we verify
each unverified STH using a consistency proof with the verified STH,
and only delete it if it successfully verifies. We set the verified
STH to the largest STH which we've successfully verified.
This has two important benefits. First, we never ever delete an STH
unless we can successfully verify it (previously, we would forget about
an STH under certain error conditions). Second, it lays the groundwork
for STH pollination. Upon reception of an STH, we can simply drop it in
the log's unverified_sths directory (assuming the signature is valid),
and Cert Spotter will audit it.
There is no more "evidence" directory; if a consistency proof fails,
the STHs will already be present elsewhere in the state directory.
2. We now persist a MerkleTreeBuilder between each run of Cert Spotter,
instead of rebuilding it every time from the consistency proof. This is
not intrinsically better, but it makes the code simpler considering we
can now fetch numerous consistency proofs per run.
3. To accommodate the above changes, the state directory has a brand
new layout. The state directory is now versioned, and Cert Spotter
will automatically migrate old state directories to the new layout.
This migration logic will be removed in a future Cert Spotter release.
As a bonus, the code is generally cleaner now :-)
If -all_time is specified, scan the entirety of all logs, even
existing logs. This matches user expectation better. Previously,
-all_time had no impact on existing logs.
The first time Cert Spotter is run, do not scan any logs, unless
-all_time is specified. This avoids a several hour wait the first
time Cert Spotter is run. If the user is interested in knowing
about existing certificates, they can use the certspotter.com API
or crt.sh. This is the same as existing behavior.
When a new log is added, scan it in its entirety even if -all_time is
not specified, so users are alerted to interesting certificates in the
new log. Hopefully new logs will be small and this won't take too long!
Previously, new logs were not scanned in their entirety unless -all_time
was specified.
Closes: #5
Watchlist is now read from ~/.certspotter/watchlist by default, or from
the file specified by -watchlist (- for stdin).
By default, only exact DNS names are matched. To match both the domain
itself and all sub-domains, prefix with a dot (e.g. .example.com).
Comments are now allowed in watchlist files.
e.g. contains control characters, Punycode conversion fails
There are quite simply too many certs with bogus DNS labels out in the wild,
and it just doesn't make sense to bother every .com domain holder because
GoDaddy signed a cert with a DNS name like "www. just4funpartyrentals.com"
It is highly unlikely any validator will ever match that DNS name.