- The CyberLens Newsletter
- Posts
- Build a Mini Threat Hunter — Using AI to Detect Anomalous Login Behavior
Build a Mini Threat Hunter — Using AI to Detect Anomalous Login Behavior
An actionable walk-through for cybersecurity practitioners and enthusiasts

Tired of newsletters vanishing into Gmail’s promotion tab — or worse, being buried under ad spam?
Proton Mail keeps your subscriptions organized without tracking or filtering tricks. No hidden tabs. No data profiling. Just the content you signed up for, delivered where you can actually read it.
Built for privacy and clarity, Proton Mail is a better inbox for newsletter lovers and information seekers alike.

📘Interesting Tech Fact:
Long before YouTube and online learning platforms existed, one of the earliest digital tutorials appeared in 1960 on the PLATO system (Programmed Logic for Automatic Teaching Operations), a groundbreaking computer-based education network developed at the University of Illinois. What makes this especially fascinating is that PLATO not only introduced interactive step-by-step tutorials, but also pioneered technologies we associate with modern tech — including real-time chat, forums, e-mail, screen sharing, multiplayer games, and even an early version of emojis (called PLATO emoticons) created using special character combinations. These innovations helped shape the foundation of today’s digital learning culture, proving that the spirit of guided instruction and interactive tutorials was transforming technology decades before the internet became mainstream.
Introduction
In an era where adversaries frequently exploit valid credentials to slip past defenses, building systems to detect subtle signs of anomalous login behavior has become a must-have for any threat-hunting capability. This tutorial walks you through setting up a mini threat-hunting pipeline using AI, from data extraction and feature engineering to model training, alerting and compliance alignment. You’ll gain hands-on Python code, visual diagrams, real-world mapping to frameworks such as MITRE ATT&CK and National Institute of Standards and Technology (NIST) SP 800-207, and a strong rationale for why this matters.
By the end, you’ll have a working blueprint you can refine and extend in your own environment—and demonstrate value to leadership, incident response teams or hiring managers. Let’s get started.

Why anomalous login behavior matters
The login event is the gateway to your environment. Whether from a remote session, VPN access, web application or other portal, each authenticated session carries risk: credential misuse, account takeover, lateral movement and data exfiltration.
From a threat-framework standpoint:
In MITRE ATT&CK, the tactic Credential Access (TA0006) and technique Valid Accounts (T1078) highlight how adversaries use valid credentials to blend in and avoid detection→Mitre Attack.
Meanwhile craft like anomalous login behavior map to data sources such as “Logon Session” (DS0028) recognizing unusual sequences of logons that hint at compromise.
From a design posture, NIST SP 800-207 emphasizes Zero Trust: treat every access request, every session as untrusted until proven otherwise.
Therefore, by building a detection pipeline focused on login anomalies you address one of the core early-attack vectors. Attackers often move quickly; catching a deviation in login behavior gives your SOC or threat-hunting team a jump on them.
Threat Framework Alignment
Framework | Security Objective | How This Tutorial Supports It |
|---|---|---|
MITRE ATT&CK (T1078) | Identify misuse of valid credentials | Detect anomalous login signatures |
MITRE Data Source DS0028 | Logon session tracking | Feature-based login anomaly scoring |
NIST SP 800-207 | Continuous verification | Behavior-based authentication intelligence |
SOC Maturity | Shift-left detection | Early-stage compromise alerts |
This system adds a behavioral defense layer that SIEM rules alone miss.
Architecture Overview

Example
Key components:
Data ingestion – Collect login events, user metadata, device context and geolocation data.
Feature engineering – Create derived features like “time since last login”, “number of unique login locations in last X days”, “travel impossible flag”, “login volume deviation”.
Model training / baseline establishment – Using unsupervised or semi-supervised ML approaches to learn “normal” login patterns per user or cohort.
Scoring & alerting – Assign anomaly scores to new login events; set thresholds to trigger alerts.
Investigation & response – Map anomalous alerts to known attack techniques, coordinate with SOC or containment workflows.
Governance & compliance alignment – Ensure logging, audit trails, policy enforcement align with frameworks like NIST SP 800-207.
The beauty of this architecture is that it doesn’t require an enterprise-scale data warehouse to begin with—you can spin up a baseline proof-of-concept with moderate volumes and refine over time.

Sample Data and Feature Set
Below is a simple sample login dataset (CSV style) to get started:
CSV (Sample)
user_id,login_time,source_ip,geo_country,device_id,login_success,auth_method
alice,2025-11-07T08:12:00Z,203.0.113.45,US,DEV123,True,Password+MFA
bob,2025-11-07T08:15:34Z,198.51.100.76,GB,DEV234,True,Password
alice,2025-11-07T23:05:12Z,203.0.113.45,US,DEV123,False,Password
charlie,2025-11-07T04:45:00Z,192.0.2.89,US,DEV345,True,Password+MFA
alice,2025-11-08T02:30:28Z,198.51.100.120,FR,DEV678,True,Password+MFA
From such raw logs you can derive features such as:
login_hour(extract hour of day)is_night_time(e.g., between 00:00–05:00)previous_login_diff_minutes(time delta from prior login)geo_change_flag(country differs from last login)device_new_flag(device_id not seen before)failed_login_count_last_24h(count of failed attempts)success_ratio_last_7d(successful vs total login attempts)
Such engineered features allow the model to pick up deviations like “Alice’s account had a successful login from France at 2:30 am, using a device never seen before”—which might indicate compromise or travel-imposter behavior.
Build the AI Model (Python): Unsupervised anomaly detection with Isolation Forest:
Here is a Python example using scikit-learn to build an Isolation Forest model for anomaly detection. (You can adapt to your data store, streaming pipeline or cloud environment.)
Python (Sample)
import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
# Step 1: load the data
df = pd.read_csv("login_sample.csv", parse_dates=["login_time"])
# Step 2: sort and compute features
df = df.sort_values(["user_id","login_time"])
df["prev_login_time"] = df.groupby("user_id")["login_time"].shift(1)
df["login_diff_minutes"] = (df["login_time"] - df["prev_login_time"]).dt.total_seconds()/60
df["login_hour"] = df["login_time"].dt.hour
df["is_night"] = df["login_hour"].isin(range(0,5)).astype(int)
# Simple geolocation change flag
df["prev_geo"] = df.groupby("user_id")["geo_country"].shift(1)
df["geo_changed"] = (df["geo_country"] != df["prev_geo"]).astype(int)
# Fill NA
df.fillna({"login_diff_minutes":9999, "prev_geo":""}, inplace=True)
# Select features
features = df[["login_diff_minutes","login_hour","is_night","geo_changed"]].copy()
# Step 3: train the model (on historical baseline where login_success=True)
baseline = features[df["login_success"]==True]
model = IsolationForest(n_estimators=100, contamination=0.02, random_state=42)
model.fit(baseline)
# Step 4: score new events
df["anomaly_score"] = model.decision_function(features)
df["anomaly_flag"] = model.predict(features) == -1
# Step 5: display flagged logins
alerts = df[df["anomaly_flag"]]
print("⚠️ Anomalous logins detected:")
print(alerts[["user_id","login_time","geo_country","device_id","anomaly_score"]])Explanation of key points
Compute simple temporal and geolocation change features.
Train only on successful logins to build a baseline of “normal” user behavior.
Use
IsolationForest, an unsupervised anomaly detection algorithm well suited to identifying rare deviations.Scoring returns an anomaly score (higher means more normal), and we flag the –1 predictions (anomalous).
In practice you’d feed this into your alerting pipeline (e.g., send to SIEM, ticketing system, dashboard).
You can extend this code by:
Adding more contextual features (device risk score, known VPN vs non-VPN, login method, number of distinct devices).
Using a rolling window to continuously retrain or update baseline.
Employing supervised or semi-supervised learning if you have labelled compromise events.
Integrating real-time streaming ingestion from e.g. syslog, cloud identity logs or AD logs.

Evaluation and Tuning
When building anomaly detection systems you must account for key trade-offs:
False positives: Legit users traveling or using new devices may trigger alerts if your baseline is too tight.
False negatives: Attackers who mimic normal behavior may slip through if features are too broad.
Baseline drift: User behavior evolves (new device, new location, remote work patterns), so the model must adapt.
Interpretability: Analysts need meaningful context to act on alerts, not just a score.
Consider tracking metrics like:
% of flagged logins that lead to confirmed compromise (true positives)
% of flagged logins dismissed (false positives)
Mean time to investigation and response
Tune parameters (for instance the contamination rate in IsolationForest) to balance alert volume and risk. Use stratification by user segment (privileged accounts vs standard users) since risk profiles differ.
Mapping to Compliance and Frameworks
This tutorial doesn’t just deliver code—it ties into organizational governance and frameworks.
Governance Checklist:
Control Area | Requirement | Covered? |
|---|---|---|
Continuous verification | No “trusting” prior auth | ✅ |
Audit logs & retention | Identity events traceable | ✅ |
Adaptive security | Risk-based access enforcement | ✅ |
Threat mapping competency | ATT&CK alignment | ✅ |
Incident readiness | Alert routing to SOC | ✅ |
Zero Trust alignment – NIST SP 800-207
The Zero Trust paradigm states that every access request must be continuously verified, and that logging/monitoring of activity is essential.
Our mini threat hunter contributes by:
Monitoring access by user, device and location rather than relying purely on network perimeter.
Using behavioral context to enforce dynamic policy decisions (e.g., new device + unfamiliar country = higher risk).
Producing audit-worthy logs and alerts that support continuous monitoring and measurement of security posture.
MITRE ATT&CK mapping
Detecting anomalous login behavior addresses several attacker techniques:
T1078 (Valid Accounts) — an adversary using valid credentials to get access.
DS0028 (Logon Session) — monitoring login events to detect lateral movement or misuse.
By tagging alerts with the relevant ATT&CK technique(s), your SOC can prioritize and respond faster, and align with threat intelligence.
Governance and audit
Maintain the login event dataset with proper retention policies (e.g., 90 days active, 1 year archive) and ensure logs are tamper-protected.
Document your anomaly detection model versioning, audit thresholds, and review results periodically.
Tie alerts to incident response workflow: for example, when anomaly_flag==True and user is in high-risk group, escalate to privileged access review.
Operationalizing in a SOC / Threat-Hunting Workflow
To maximize value, integrate the mini threat-hunter into your operational processes:
Dashboard visualization
Create a view showing recent flagged logins, by user, location, device, anomaly score, time. Color-code high-risk accounts.
Example of an AI-Based Anomaly Detection Application
Investigation playbook
When an alert triggers:Verify user: is travel or new device expected?
Check device: is device recognized in asset inventory?
Check login time & location: is it impossible travel?
Map to ATT&CK: e.g., T1078 usage of valid accounts.
Escalate: If confirmed anomalous, disable session/token, force password reset, review previous login history.
Feedback loop
Label confirmed events as true or false positives → improve your features, thresholds or model.
Periodically retrain model on updated baseline.
Review thresholds quarterly and adjust for seasonal patterns (holiday travel, remote work surges).
Reporting & compliance
Generate monthly metrics: number of anomalies flagged, investigations opened, incidents prevented.
Align with Zero Trust KPIs: “% of login attempts flagged for additional review”, “time to mitigation after flag”.
Present high-impact findings to executives: e.g., “This month we detected 12 anomalous logins from new devices in high-risk locations, 2 of which led to account compromise”.
Example Use Case: Travel + New Device Flag
Imagine this scenario: user alice normally logs in from the US between 07:00–18:00 via company-issued devices. One morning she logs in at 02:35 UTC from France on a device not previously seen. Our model assigns a high anomaly score (e.g., –0.45) and flags the event.
Why this matters:
Unexpected time of login (night hours)
New geolocation (France)
New device ID
Same user as baseline
The alert triggers an investigation: Alice confirms she is on holiday and using a personal laptop. SOC forces password reset, revokes session token, flags as “benign travel” but adds her device to allowed list. Over time you may adjust the model to account for travel-patterns (e.g., device still unknown but travel is authorized) but meanwhile you avoided risk of account misuse.
Suggested Next Steps & Extensions
To elevate your mini threat hunter from proof-of-concept to production-grade:
Expand feature set: add MFA success/fail counts, application accessed after login, session duration, time-since-credential-reset.
Include supervised models: label known breach vs benign login events and train classification models.
Integrate with SIEM or UEBA solution: pipe anomaly scores to e.g. Splunk, Azure Sentinel.
Build feedback loop: automatic model retraining, self-tuning thresholds based on alert outcomes.
Deploy in real-time streaming: e.g., ingest login events via Kafka or AWS Kinesis, score in near-real-time, trigger automated workflows.
Correlate with lateral movement / endpoint logs: escalate flagged login events with device logs, network logs, to detect follow-on attacker activity.
Map to executive-level metrics: show how your hunting reduces “dwell time”, “time to detection”, “privileged account misuse”.
Why this tutorial is high-value
Bridges practical code with cybersecurity frameworks (MITRE ATT&CK, NIST SP 800-207) making it relevant both for implementation and governance.
Emphasizes behavioral anomaly detection, not just signature checking—a critical shift given adversaries are using valid accounts.
Provides repeatable components: dataset, feature engineering, modelling, alerting, investigation workflow.
Supports operational readiness: connects to SOC dashboards, investigation playbooks, Exec reporting.
Adheres to professional tone while remaining accessible to practitioners and enthusiasts alike.

Summary
You now have a roadmap and working example to build a mini threat-hunting system focused on anomalous login behavior. You’ve seen how to ingest login events, engineer meaningful features, train a baseline anomaly model, build alerts, and integrate them into investigation workflows—while aligning with Zero Trust principles (NIST SP 800-207) and adversary-technique frameworks, such as (MITRE ATT&CK).
In today’s threat landscape, where attackers increasingly mimic legitimate behavior, building and refining such capabilities becomes a competitive advantage for security teams. I encourage you to adapt the code, scale it in your environment, and refine it over time.

Subscribe to CyberLens
Cybersecurity isn’t just about firewalls and patches anymore — it’s about understanding the invisible attack surfaces hiding inside the tools we trust.
CyberLens brings you deep-dive analysis on cutting-edge cyber threats like model inversion, AI poisoning, and post-quantum vulnerabilities — written for professionals who can’t afford to be a step behind.
📩 Subscribe to The CyberLens Newsletter today and Stay Ahead of the Attacks you can’t yet see.




