Ring IR mode breaks face-distance thresholds
I have a small face-identification pipeline running against snapshots from my doorbell. It uses a fixed distance threshold to decide whether a face matches a known person. Daylight events work fine; night-time IR events get classified as "unknown" even when the person is clearly someone the system should recognize.
What was happening
A specific evening event: front door, around 7:40 PM, infrared mode (camera had already switched). The face in the snapshot was unambiguously me — same hairline, same nose, same head posture as the daylight reference photos. The encoder reported a distance of 0.59 against my reference set. My threshold was 0.55. Flagged as unidentified.
Daylight encodings of the same camera, same angle, same person were averaging around 0.42 — well under threshold. Same person, same camera, different lighting, distance jumped by 0.17.
What I found
The distance metric (face_distance in dlib's face_recognition library) computes the L2 norm between 128-dimensional face encodings. Those encodings are not lighting-invariant. IR mode produces a fundamentally different image — monochrome, different effective contrast, IR illuminator hot-spots near the eyes and nose. The encoder treats it as a different face than the same person under visible light.
A reference set built entirely from daylight photos will systematically inflate distances against IR captures. The encoder isn't broken — it's correctly noticing that IR and daylight look different. The threshold just wasn't designed for the IR case.
The fix
Two changes. First, build a separate IR reference set. For each known person, capture a few snapshots at night (or just stand in front of the camera in IR mode and let the pipeline grab them) and add those encodings under the same identity. Now the match step compares an IR query against IR-aware references.
def build_references(person_id, photos):
encodings = []
for photo, lighting in photos: # lighting is "visible" or "ir"
img = face_recognition.load_image_file(photo)
encs = face_recognition.face_encodings(img)
for enc in encs:
encodings.append({
"person_id": person_id,
"encoding": enc,
"lighting": lighting,
})
return encodings
def match(query_encoding, query_lighting, references, threshold=0.55):
# only compare against references with the same lighting class
candidates = [r for r in references if r["lighting"] == query_lighting]
if not candidates:
candidates = references # fall back if no IR refs yet
distances = face_recognition.face_distance(
[c["encoding"] for c in candidates],
query_encoding,
)
best = min(zip(distances, candidates), key=lambda x: x[0])
if best[0] < threshold:
return best[1]["person_id"], best[0]
return None, best[0]
Second, label the IR-mode flag on the snapshot itself so the pipeline knows which reference subset to compare against. Ring's metadata usually includes a night_vision boolean; if not, you can detect it from the image (IR captures are monochrome with characteristic hot spots).
What I'd do differently
Distance-threshold-based classifiers are sensitive to anything that affects the embedding distribution: lighting, angle, occlusion, even camera firmware updates that change post-processing. Maintaining a single global threshold is brittle. The cleaner pattern is per-condition thresholds, or — better — a learned classifier on top of the embedding that's trained on a small per-person dataset including the lighting conditions you actually care about. The threshold-per-condition fix above is the cheap version; the trained classifier is the version I'd build if accuracy mattered more than weekend-project time.