N
Healthcare MLongoing

Project dossier

NeuroAssess

Clinical Parkinson's screening portal with transformer-based text inference and decision-support reports.

What it solves

Overview

NeuroAssess is a clinical machine learning screening platform that analyzes patient text and structured clinical signals to estimate Parkinson's Disease risk as decision support, not diagnosis. Interview focus: cover the PPMI cohort target classes, leak-free patient-level splits, traditional and transformer model suite, multimodal stacking, focal loss, model artifact loading, RAG report generation, medical document indexing, dual reports, digital-twin forecasting, and why outputs remain clinical decision support.

Target audience

Healthcare professionals evaluating screening workflows.Clinical researchers studying early Parkinson's detection.Medical AI developers discussing responsible model deployment.

System design

Architecture

The platform separates model inference, clinical preprocessing, report generation, and the browser portal. The Flask layer loads the trained model and exposes prediction routes while the frontend collects input and explains results. The source repo includes a Flask web app, PPMI feature mapping, LightGBM/XGBoost/SVM training, PubMedBERT/BioGPT/Clinical-T5 training, a multimodal ensemble, TF-IDF document indexing, dual report generation, optional digital-twin progression support, and runtime flags that defer heavy model/PDF work for faster startup.

Architecture diagram

Diagram loads when visible

Clinical input layer

Collects patient text, symptom notes, and structured fields through a cautious healthcare-focused UI.

ReactTypeScriptTailwind

Inference API layer

Loads the trained model, validates incoming fields, and returns prediction output with caveats.

FlaskPython

ML pipeline layer

Cleans clinical text, tokenizes input, and runs transformer inference for binary screening.

PyTorchTransformersscikit-learn

Report layer

Turns model output into explainable screening language for clinicians and project interviews.

Report generationClinical reference docs

Training orchestration layer

Scripts coordinate traditional model trials, transformer trials, focal-loss training, checkpoint selection, resume support, and RTX A4000 preflight checks.

train_model_suite.pyCUDA PyTorchfocal loss

Knowledge retrieval layer

Medical PDFs and text references are indexed so generated reports can include guideline-aware context instead of only raw class predictions.

TF-IDFmedical_docsDocumentManager

Digital twin layer

A forecasting view can estimate progression and treatment scenarios with a fast heuristic path and an optional PPMI-backed bridge.

DigitalTwinEnginePD_TWIN_BRIDGE_ENABLED

Implementation surface

Tech stack

PythonBackend

Training scripts, preprocessing, inference, and report generation.

PyTorchMl

Transformer model definition, training, and inference.

TransformersMl

Tokenization and transformer architecture support.

FlaskBackend

Prediction API for the clinical screening workflow.

pandasLibrary

Clinical record cleaning and feature preparation.

scikit-learnMl

Evaluation metrics, train-test splits, and preprocessing utilities.

LightGBM / XGBoost / SVMMl

Traditional ML baselines for structured PPMI clinical features and comparison against transformer models.

PubMedBERT / BioGPT / Clinical-T5Ml

Medical language model family used for clinical text-oriented transformer experiments.

joblibLibrary

Serialization and loading of traditional model, preprocessor, and ensemble artifacts.

TF-IDF document indexMl

Lightweight retrieval over medical reference documents for report generation.

Operational flow

How it works

The portal accepts clinical text, normalizes it, tokenizes it for a transformer model, returns a Parkinson's screening score, and frames the output as decision support.

1

Collect clinical context

The user enters clinical observations, symptom descriptions, and optional structured fields.

2

Clean and normalize

The backend standardizes text, removes unusable fields, handles missing values, and prepares model features.

3

Tokenize input

Clinical text is converted into token IDs and attention masks that the transformer can process.

4

Run inference

The model produces a binary screening prediction and confidence score from the processed input.

5

Explain the result

The portal presents risk, confidence, limitations, and suggested follow-up language without claiming diagnosis.

6

Map questionnaire fields to PPMI features

The web layer normalizes user inputs such as age, sex, BMI, tremor, rigidity, bradykinesia, postural instability, sleep, mood, and cognitive scores into model feature names.

This is an interview-critical boundary because invalid or missing clinical fields can silently distort model predictions.

7

Load model artifacts lazily

Startup can skip heavy initialization, then load models and document indexes on first prediction request when needed.

Lazy loading makes smoke tests and static frontend hosting faster while preserving the full local ML workflow.

8

Retrieve medical context for reports

The report workflow retrieves relevant disease information, guideline text, and feature interpretations before writing clinician-readable output.

The prediction is only one part of the system; the report must explain why a class matters and what follow-up language is safe.

9

Generate optional digital twin scenarios

The twin dashboard can produce progression or treatment scenario views using fast heuristics by default and a PPMI-backed bridge when enabled.

This separates demo responsiveness from heavier research workflows.

Sequence diagram

Diagram loads when visible

Concept depth

Key concepts

Transformers process all positions in parallel and learn which tokens should attend to each other. This makes them effective for long text where important clues may be far apart.

In NeuroAssess: NeuroAssess uses transformer inference to capture clinical wording patterns that simpler bag-of-words features can miss.

Confidence

Implementation evidence

Code highlights

Inference route

The API validates text input, builds tensors, and returns a cautious screening result.

Code highlight loads when visible

The route rejects empty clinical text before the model path.

The response includes a medical disclaimer because screening output is not diagnosis.

Clinical metric framing

Model evaluation reports sensitivity and specificity so interview answers stay healthcare-aware.

Code highlight loads when visible

Medical ML should not be defended with accuracy alone.

Sensitivity and specificity expose the false-negative and false-positive trade-off.

Patient-level split guard

The training pipeline should split by patient identifier before expanding records into model rows.

Code highlight loads when visible

The split happens at patient level, not row level.

The assertion makes leakage visible during development.

Safe clinical field normalization

Clinical form input is coerced into the feature schema while preserving missing-value behavior.

Code highlight loads when visible

Missing clinical fields are surfaced instead of silently converted to zeros.

The model boundary is the normalized PPMI feature schema.

Contracts

API design

Base URL: http://localhost:5000

POST/predict

Runs Parkinson's screening inference for clinical text.

{ "clinicalText": "Patient reports tremor and gait instability." }
{ "riskScore": 0.7134, "screening": "review", "disclaimer": "Decision support only; not a diagnosis." }
POST/report

Generates a clinician-readable report for a prediction result.

POST/api/predict

Normalizes patient data, loads model artifacts if needed, and returns cohort probabilities plus decision-support text.

{ "age": 63, "SEX": "male", "sym_tremor": 2, "sym_rigid": 1, "moca": 24 }
{ "predictedClass": "PRODROMAL", "confidence": 0.67, "disclaimer": "Decision support only." }
POST/api/reports/dual

Generates patient-facing and clinician-facing report variants from the same prediction and retrieved references.

POST/api/documents/upload

Accepts PDF or text medical references and indexes them for report retrieval experiments.

POST/api/twin/project

Returns digital-twin progression or treatment scenario output for a normalized patient profile.

State model

Database design

Model artifact and local medical document store

Data relationship diagram

Diagram loads when visible

model_artifact

Serialized model, tokenizer settings, and preprocessing configuration.

model_pathtokenizer_namemax_lengthtrained_at

prediction_log

Optional local record of screening requests for development and audit experiments.

prediction_idrisk_scorescreeningcreated_at

medical_docs

Reference documents used by the portal's explanatory report workflow.

doc_idtitlesourcesummary

ppmi_patient_features

Curated patient-level clinical features mapped from PPMI records before training and inference.

patient_idcohortagesexbmimotor_scoresnon_motor_scorescognitive_scores

model_registry

Saved model artifacts, preprocessing artifacts, checkpoint metadata, and validation metrics.

model_nameartifact_pathpreprocessor_pathvalidation_f1trained_at

document_index

Medical reference documents indexed for TF-IDF retrieval and report context.

doc_idfilenamedocument_typeextracted_textindexed_at

twin_scenario

Optional progression or treatment scenario outputs generated for digital-twin views.

scenario_idpatient_profile_hashhorizon_monthsrisk_curvecreated_at

Architecture decisions

Trade-offs

Model family

Transformer classifier over LSTM or bag-of-words model

Clinical notes can contain long-range context. Attention gives stronger handling of distant cues than recurrent or shallow text features.

API framework

Flask over FastAPI

The inference service is request-response oriented and small. Flask is sufficient and keeps the clinical ML path straightforward.

Product framing

Decision support over Diagnostic claim

A model prediction should support review, not replace clinical judgment or overstate medical validity.

Validation split

Patient-level split over Random row split

Repeated PPMI visits can leak patient identity across train and test. Patient-level splitting gives a more honest estimate of generalization.

Training objective

Class-weighted focal loss over Plain cross-entropy

The cohort labels are imbalanced and clinically important minority classes should not be ignored by a model that optimizes only easy examples.

Report generation

RAG-enhanced explanatory reports over Returning only class probabilities

A clinical-support tool must explain risk factors, caveats, and follow-up considerations in language a clinician can review.

Frontend deployment

Static Vite frontend on Vercel with external Flask API over Bundling local ML inference into Vercel

Model loading and PDF indexing are too heavy for a static frontend deployment, so the hosted UI should call a separate backend.

Lessons learned

Challenges and solutions

Problem

Class imbalance can make a model look accurate while missing positive cases.

Solution: Evaluate with sensitivity, specificity, confusion matrices, and threshold discussion.

Lesson: Healthcare ML needs metrics aligned to clinical risk, not just a single score.

Problem

Clinical text can include missing, inconsistent, or noisy fields.

Solution: Normalize text, validate required inputs, and make missing data behavior explicit in preprocessing.

Lesson: Data quality handling is part of the model, not a side concern.

Problem

Clinical models can look strong if patient visits leak across train and test splits.

Solution: Split by patient ID, assert disjoint patients, and report validation metrics from held-out patients only.

Lesson: For medical ML, evaluation design is part of the product's credibility.

Problem

Transformer training can be interrupted on long GPU runs.

Solution: Add A4000 preflight checks, resumable training scripts, checkpoint selection by validation F1, and resume commands.

Lesson: ML systems need operational training workflows, not just model code.

Problem

PDF extraction and model initialization slow down basic web smoke tests.

Solution: Defer PDF full-text extraction and allow skip-init startup while keeping full local initialization available through flags.

Lesson: Heavy ML systems benefit from runtime modes that separate UI checks from full inference readiness.

Runbook

Requirements and future work

Requirements

  • Python 3.x runtime with Flask.
  • PyTorch and Transformers packages for model inference.
  • Trained model and tokenizer artifacts available locally.
  • Clinical dataset used for training contains approximately 42,000 patient records according to the PRD.
  • PPMI curated CSV files must be present before training or evaluation.
  • sacremoses is required for BioGPT tokenization.
  • CUDA-enabled PyTorch is recommended for transformer training, with A4000 preflight scripts available.
  • PD_EXTRACT_PDF_TEXT enables full PDF extraction when RAG experiments need it.
  • PD_TWIN_BRIDGE_ENABLED enables the optional PPMI-backed digital-twin bridge.

Future improvements

  • Add calibrated confidence intervals and threshold selection UI.
  • Track model card metadata and dataset limitations inside the portal.
  • Add clinician feedback loops for post-review outcome capture.
  • Add an explicit model card page describing dataset version, cohort distribution, leakage controls, and known limitations.
  • Add probability calibration and threshold sliders for sensitivity/specificity tradeoff exploration.
  • Persist anonymized prediction audit records with consent-aware retention controls.
  • Add external validation on a dataset outside PPMI before making stronger clinical claims.

Active recall

Interview Q&A

BehavioralEasy

Why call this decision support instead of diagnosis?

02:00
ConceptsMedium

Why are sensitivity and specificity important here?

02:00
TradeoffsHard

What would you harden before clinical deployment?

02:00
ConceptsHard

Why is patient-level splitting mandatory for PPMI data?

02:00
ConceptsMedium

What are HC, PD, SWEDD, and PRODROMAL in this project?

02:00
TradeoffsMedium

Why include traditional ML if transformer models exist?

02:00
ArchitectureMedium

What is the role of RAG in NeuroAssess?

02:00
ConceptsHard

How would you explain focal loss in this clinical setting?

02:00
BehavioralHard

What should be checked before deploying this as a clinical tool?

02:00
ArchitectureMedium

Why does the frontend deploy separately from the Flask ML backend?

02:00
ConceptsHard

What does model calibration add beyond accuracy?

02:00