Veri Bilimci - Senaryo Soruları

Veri Bilimci 10 soru 06.04.2026
Bu sorular ve cevaplar genel bilgilendirme ve mülakat hazırlık amaçlıdır. Gerçek mülakat sorularını yansıtmaz.
1

Senaryo: Marketing team, customer churn prediction modeli istiyor. Sadece demografik veriler var. Ne yaparsınız?

1) Data quality assessment: Missing values, imbalance, noise. 2) Feature engineering: Behavioral proxies (website activity, email engagement), external data enrichment. 3) Baseline model: Logistic regression. 4) Advanced: Random Forest, XGBoost. 5) Interpretability: SHAP analysis. 6) Business validation: A/B test ile churn reduction campaign effectiveness ölçerim.
2

Senaryo: CEO "ürün recommendation engine yap, Amazon gibi" diyor. Sadece 3 aylık transaction data var. Nasıl approach edersiniz?

Phased approach: 1) Basit başlarım: Popularity-based, item-item collaborative filtering. 2) Cold-start problem için: Content-based filtering (product attributes). 3) Advanced: Matrix factorization (SVD), deep learning (Neural Collaborative Filtering). 4) Evaluation: Offline (precision@k, recall@k), Online (A/B test, CTR). 5) Scalability için Spark MLlib.
3

Senaryo: Modeliniz production'da performans kaybı yaşıyor (accuracy %85'ten %70'e düştü). Root cause analizi?

Investigation plan: 1) Data drift detection (feature distribution comparison), 2) Model degradation (concept drift), 3) Infrastructure issues (latency, memory), 4) Data pipeline issues (quality, freshness). Solutions: Retraining pipeline, feature update, model monitoring, alerting setup. Automated retrigger mechanism implement ederim.
4

Senaryo: Finance team "fraud detection modeli lazım, false positive oranı %0.1'den düşük olsun" diyor. Challenge ne?

Extreme imbalance challenge (fraud rate %0.1). Solutions: 1) Resampling (SMOTE, ADASYN), 2) Cost-sensitive learning, 3) Anomaly detection (Isolation Forest, Autoencoder), 4) Ensemble methods, 5) Human-in-the-loop verification. Metric optimization: Precision-recall tradeoff, business cost analysis.
5

Senaryo: Product manager "AI feature ile competitor'ları geçelim" diyor ama data yok. Ne yaparsınız?

Data-first approach: 1) Data strategy develop ederim (internal data, external sources, synthetic data), 2) MVP scope (minimum viable model), 3) Transfer learning (pretrained models), 4) Human labeling setup, 5) Iterative improvement. Eğer data acquisition cost > benefit ise, alternative solution öneririm.
6

Senaryo: Stakeholder'lar model kararlarını şeffaf istiyor ama complex model (neural network) kullandınız. Çözüm?

Explainability approach: 1) Global interpretability: Feature importance, partial dependence plots. 2) Local interpretability: LIME, SHAP values for individual predictions. 3) Rule extraction: Decision tree surrogate model. 4) Model comparison: Simple vs complex model performance tradeoff. Interactive explanation dashboard build ederim.
7

Senaryo: E-commerce sitesinde real-time recommendation lazım. Latency requirement: 50ms. Architecture?

Real-time architecture: 1) Offline: Batch model training, feature engineering. 2) Online: Precomputed recommendations, caching (Redis), model serving (TensorFlow Serving, TorchServe). 3) Fallback: Popular items. 4) Monitoring: Latency, cache hit rate. 5) Optimization: Model compression (quantization, pruning), approximate nearest neighbors.
8

Senaryo: Healthcare client "medical diagnosis AI" istedi. Ethics ve liability konuları nasıl ele alırsınız?

Healthcare AI considerations: 1) Regulatory compliance (HIPAA, FDA), 2) Model validation (external dataset, clinical trials), 3) Explainability (doctor decision support, not replacement), 4) Bias detection (demographic parity), 5) Human oversight (clinician-in-the-loop). Legal review ve ethics board approval gerekir.
9

Senaryo: Marketing team "campaign attribution modeli" istedi. Last-click basit diyorlar. Sizin öneriniz?

Attribution modeling: 1) Single-touch: Last-click, first-click (basit ama naive). 2) Multi-touch: Linear, time-decay, position-based. 3) Data-driven: Markov chains, Shapley values. 4) Incrementality testing: Randomized controlled trials. Marketing team'i education ederim ve pilot implementation öneririm.
10

Senaryo: Budget constraint: GPU resource'lar kısıtlı ama large-scale model train etmek lazım. Strategies?

Resource optimization: 1) Transfer learning (pretrained models fine-tuning), 2) Distributed training (data parallelism), 3) Mixed precision training (FP16), 4) Model compression (pruning, quantization), 5) Cloud spot instances. Ayrıca: Colab, Kaggle Kernels, academic partnerships. Cost-benefit analysis yaparım.