Overview
Strategy for generating realistic synthetic responses for the Web Hosting Market Segmentation Survey.
24 questions, 5 personas, 73% target qualification rate with 20+ consistency rules.
1. Personas & Distributions
Primary Personas (Q1 Distribution)
| Persona | Weight | Mean Budget | Mean Sites | Characteristics |
|---|---|---|---|---|
| Small Business Owner | 28% | $25/mo | ~4 sites | WordPress-heavy, value-conscious |
| Marketing Professional | 18% | $50/mo | ~12 sites | Analytics-driven, SEO-focused |
| Agency/Freelance | 22% | $75/mo | ~25 sites | Technical, needs dev tools |
| Enterprise | 12% | $300/mo | ~50 sites | Performance-critical, high budget |
| Hobbyist | 15% | $12/mo | ~2 sites | Price-sensitive, exploring options |
| None of above | 5% | — | — | DISQUALIFIED |
2. Key Distributions
Qualification Flow (Target 73%)
- Q2: 60% primary, 30% part of, 10% not involved (DQ)
- Q3: 85% No, 12% Yes (DQ), 3% Not sure
Technology (Q4)
WordPress 40% | Custom 18% | AI Builder 16% | JAMstack 12% | Static 7% | Hire 5% | Other 2%
Volume (Q5) - Log-normal by persona
1-5 sites: 50% | 6-10: 20% | 11-25: 15% | 26-50: 10% | 51+: 5%
Budget (Q24) - Aggregate
$0-10: 20% | $11-25: 25% | $26-50: 20% | $51-100: 15% | $101-200: 12% | $201-500: 6% | $501+: 2%
3. Multi-Select Patterns
Q7 Priorities (up to 3)
Selection count: 15% one, 35% two, 50% three
Top combos:
- "Easy+Low Pricing+Performance" (18%)
- "Performance+Support+Dev" (12%)
- "Analytics+Performance+SEO" (10%)
Q14 Extended Features (up to 3)
Selection count: 18% one, 38% two, 44% three
Popularity: Email 65% | Domain 55% | CDN 42% | eCommerce 38% | SEO 35% | Database 32% | Builders 28%
Conditional Logic
- Q7 selections trigger Q8-Q13 (45-70% trigger rates)
- Q14 selections trigger Q15-Q21 (28-65% trigger rates)
- Q1 Agency/Marketing triggers Q22 (~40% of responses)
4. Budget Calculation Formula
final_budget = base_persona_budget × feature_multipliers × consistency_adjustments
Feature Multipliers (cumulative)
- Low Pricing (Q7): 0.7×
- 24/7 Support (Q7): 1.3×
- 3 features (Q14): 1.2×
- eCommerce (Q14): 1.5×
- Premium support (Q12): 1.4×
- 4+ agency features (Q22): 1.6×
- Developer tools (Q7): 1.2×
Key Correlations
- Low Pricing → Budget: r = -0.45
- 24/7 Support → Budget: r = +0.38
- Feature count → Budget: r = +0.42
- Premium support → Budget: r = +0.48
5. Consistency Rules (20 rules with exception rates)
Budget Consistency
- IF Q8 "Lowest Cost" THEN Q24 bottom 40% (5% exceptions)
- IF Q12 any premium THEN Q24 ≥ median (8% exceptions)
- IF Q14 3 high-value features THEN Q24 top 60% (10% exceptions)
Role-Feature Alignment
- IF Q4 WordPress THEN Q12 "WP Specialists" 2× likely (45%→70%)
- IF Q4 JAMstack/Custom THEN Q7 "Dev Tools" 2.5× likely (30%→75%)
Cross-Question Logic
- IF Q7 "Analytics" THEN Q11 high engagement + Q14 "SEO" +20%
- IF Q7 "Support" THEN Q12 75% engage + Q24 top 50%
- IF Q7 "Low Pricing" THEN Q8 60% "Lowest Cost" + Q24 bottom 50%
6. Edge Cases & Variance (18-27% total)
Contradictory (5-8%)
- Budget-priority mismatch (3%): Low pricing but high budget
- Feature-budget mismatch (2%): Premium features but low budget
- Support contradiction (5%): Support priority but no premium selections
Minimal Engagement (8-12%)
- Minimum selections (5%): One option in all multi-selects
- "Don't know" tendency (4%): Not sure + minimal selections
- Incomplete conditionals (3%): Trigger but minimal engagement
Power Users (5-8%)
- Maximum engagement (4%): Max selections + high budget
- Technical expert (3%): JAMstack + all dev tools
- Agency power user (2%): 30+ sites + all agency features
Survey Fatigue (5-7%)
- Decreasing engagement (4%): Normal→reduced→minimal
- Straight-lining (2%): First options consecutively
- Satisficing (3%): Safe middle options
7. Implementation Approach
Stack
Python + pandas + NumPy + SciPy
Rationale: Strong statistics, tabular data handling, easy export
Algorithm
- Select persona (weighted random)
- Apply cross-persona modifiers
- Generate Q1-Q3 (qualification)
- If disqualified, stop
- Generate Q4-Q5 (tech/volume)
- Generate Q6-Q7 (purpose/priorities)
- Generate Q8-Q13 conditionals
- Generate Q14 (features)
- Generate Q15-Q21 conditionals
- Generate Q22 if applicable
- Generate Q23
- Calculate Q24 with multipliers
- Apply edge case patterns
- Validate consistency
- Export
Validation
- Statistical: Chi-square tests, correlation validation, qualification rate ~73%
- Business logic: Consistency rules, conditional accuracy, persona profiles
- Realism: Edge case frequencies, manual review, domain expert feedback
8. Success Metrics
- Qualification rate: 73% ±3% ✓ Achieved
- Persona distribution: ±5% of target ✓ Achieved
- Budget correlations: ±0.05 of target ✓ Achieved
- Conditional logic: 100% accuracy ✓ Achieved
- Consistency exceptions: ±2% of rule targets ✓ Achieved
- Edge cases: Within specified ranges ✓ Achieved
Quick Reference
Persona Budgets (medians): HOB $6 | SBO $12 | MKT $25 | AGN $40 | ENT $150
Multi-select Limits: Q7 up to 3 | Q14 up to 3 | Q23 up to 2
Conditionals: Q7→Q8-13 | Q14→Q15-21 | Q1(Agency/Mkt)→Q22