STS (Secure Time Seeding) uses server time from SSL handshakes, which is fine when talking to other Microsoft servers, but other implementations put random data in that field to prevent fingerprinting.
This bug has created havocs for me. We had a “last synchronized” time stamp persisted to a DB so that the system was able to robustly deal with server restarts / bootstrapping on new environments.
The synchronization was used to continuously fetch critical incident and visualize them on a map. The data came through a third party api that broke down if we asked for too much data at a time, so we had to reason about when we fetched data last time, and only ask for new updates since then.
Each time the synchronization ran, it would persist an updated time stamp to the DB.
Of course this routine ran just as the server jumped several months into the feature for a few minutes. After this, the last run time stamp was now some time next year. Subsequent runs of the synchronization routine never found any updates as the date range it asked for didn’t really make sense.
It just ran successfully without finding any new issues. We were quite happy about it. It took months before we figured out we actually had a mayor discrepancy in our visualization map.
We had plenty of unit tests, integration tests, and system tests. We just didn’t think of having one that checked whether the server had time traveled to the future or not.
That’ll be one weird regression test. Imagine the comment you’ll have to write to explain “why” this test exists.
While the root issue was still unknown, we actually wrote one. It sort of made sense. Check that the date from isn’t later than date to in the generated range used for the synchronization request. Obviously. You never know what some idiot future coder (usually yourself some weeks from now) would do, am I right?
However, it was far worse to write the code that fulfilled the test. In the very same few lines of code, we fetched the current date from
time.now()
plus some time span asdate.to
, fetched the last synchronization timestamp from db asdate.from
, and then validated thatdate.from
wasn’t greater thandate.to
, and if so, log an error about it.The validation code made no logic sense when looking at it.
I’ve read the stuff on STS and my first thought was: How can anyone be so stupid to try such a loony concept and still be able to create a working piece of code?