Skip to main content

C++ Engine Architecture

The C++ engine provides high-performance calculations for computationally intensive genetics operations.

Why C++?

MetricPythonC++ (Eigen)Speedup
10K SNPs GWAS60s3s20x
Matrix inversion (1000x1000)2s0.1s20x
DNA generation (1M bp)5s0.5s10x

Key Technologies

  • Eigen - Fast linear algebra (matrix operations, regression)
  • OpenMP - Parallel processing across multiple CPU cores
  • JSON11 - Lightweight JSON parsing

Architecture

┌─────────────────────────────────────────────────────────────────┐
│ Python Backend (FastAPI) │
│ │
│ gwas_engine.py ──────────────────────────────────────────────┐ │
│ │ │ │
│ │ subprocess.run([cli_path], input=json, stdout=PIPE) │ │
│ │ │ │
│ ▼ │ │
│ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ C++ CLI Process │ │ │
│ │ │ │ │
│ │ stdin (JSON) ─────► [Computation] ─────► stdout (JSON) │ │ │
│ │ │ │ │
│ └─────────────────────────────────────────────────────────┘ │ │
│ │ │
│ ◄────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ Return result to API │
└──────────────────────────────────────────────────────────────────┘

Available CLIs

CLIPurposeInputOutput
zyg_cross_cliPunnett squares, genetic crossesParent genotypesOffspring ratios
zyg_gwas_cliGWAS statistical analysisSNPs + phenotypesp-values, beta
zyg_protein_cliProtein translationRNA sequenceAmino acid chain
zyg_parallel_dna_cliLarge DNA generationLength, paramsDNA sequence

Directory Structure

zygotrix_engine_cpp/
├── src/
│ ├── Engine.cpp # Main engine class
│ ├── MendelianCalculator.cpp # Punnett square logic
│ ├── GwasCli.cpp # GWAS CLI entry point
│ ├── gwas/
│ │ ├── GwasAnalyzer.cpp # GWAS analysis engine
│ │ └── LinearRegression.cpp # Statistical tests
│ └── ...
├── include/
│ ├── Engine.hpp
│ ├── gwas/
│ │ ├── GwasTypes.hpp # GWAS data structures
│ │ └── GwasAnalyzer.hpp
│ └── ...
├── third_party/
│ ├── eigen/ # Eigen linear algebra library
│ └── json11/ # JSON parsing
├── build/ # Compiled binaries
│ ├── zyg_cross_cli(.exe)
│ ├── zyg_gwas_cli(.exe)
│ └── ...
└── CMakeLists.txt # Build configuration

GWAS Engine Details

Input Format

{
"snps": [
{
"rsid": "rs1234567",
"chromosome": 1,
"position": 123456,
"ref_allele": "A",
"alt_allele": "G"
}
],
"samples": [
{
"sample_id": "s1",
"phenotype": 10.5,
"genotypes": [0],
"covariates": []
}
],
"test_type": "linear",
"maf_threshold": 0.01,
"num_threads": 4
}

Output Format

{
"success": true,
"results": [
{
"rsid": "rs1234567",
"chromosome": 1,
"position": 123456,
"beta": 2.65,
"se": 0.35,
"p_value": 0.001,
"maf": 0.25
}
],
"snps_tested": 1000,
"snps_filtered": 50,
"execution_time_ms": 1523.4
}

Statistical Tests

TestUse CaseFormula
LinearContinuous phenotypesY = β₀ + β₁X + ε
LogisticBinary phenotypes (case/control)log(p/(1-p)) = β₀ + β₁X
Chi-SquareCategorical analysisχ² = Σ(O-E)²/E

Building the Engine

Prerequisites

# Ubuntu/Debian
sudo apt install cmake build-essential libeigen3-dev libomp-dev

# macOS
brew install cmake eigen libomp

# Windows (MSYS2/MinGW)
pacman -S mingw-w64-x86_64-cmake mingw-w64-x86_64-eigen3

Build Commands

cd zygotrix_engine_cpp

# Configure
cmake -B build -S . -DCMAKE_BUILD_TYPE=Release

# Build
cmake --build build

# Verify
./build/zyg_gwas_cli --version

Build Options

OptionDefaultDescription
CMAKE_BUILD_TYPEDebugRelease for production
WITH_OPENMPONEnable parallel processing
BUILD_TESTSOFFBuild unit tests

Python Integration

gwas_engine.py

def _load_gwas_cli_path() -> Path:
"""Auto-discover GWAS CLI path."""
settings = get_settings()
path = Path(settings.cpp_gwas_cli_path).resolve()

if path.exists():
return path

raise HTTPException(
status_code=500,
detail="C++ GWAS CLI not found. Build the engine first."
)

def run_gwas_analysis(snps, samples, analysis_type, ...):
cli_path = _load_gwas_cli_path()

# Prepare JSON input
request_data = {
"snps": snps,
"samples": samples,
"test_type": analysis_type.value,
"num_threads": num_threads
}

# Run subprocess
completed = subprocess.run(
[str(cli_path)],
input=json.dumps(request_data).encode(),
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
timeout=600
)

return json.loads(completed.stdout)

Performance Optimization

OpenMP Parallelization

#pragma omp parallel for schedule(dynamic)
for (size_t i = 0; i < snps.size(); ++i) {
results[i] = runLinearRegression(snps[i], phenotypes);
}

Eigen Optimizations

// Use Eigen's optimized matrix operations
Eigen::MatrixXd X = genotypes.cast<double>();
Eigen::VectorXd Y = phenotypes;

// Solve linear regression: β = (X'X)^(-1) X'Y
Eigen::VectorXd beta = (X.transpose() * X)
.ldlt() // Cholesky decomposition
.solve(X.transpose() * Y);

Error Handling

C++ Side

try {
auto result = analyzer.run(input);
std::cout << result.toJson().dump() << std::endl;
} catch (const std::exception& e) {
json11::Json error = json11::Json::object {
{"success", false},
{"error", e.what()}
};
std::cerr << error.dump() << std::endl;
return 1;
}

Python Side

if completed.returncode != 0:
detail = completed.stderr.decode().strip()
raise HTTPException(status_code=500, detail=detail)

Troubleshooting

"Eigen/Dense not found"

sudo apt install libeigen3-dev
# Or
cd third_party && git clone https://gitlab.com/libeigen/eigen.git

"OpenMP not found"

Build will succeed without OpenMP (single-threaded). To enable:

sudo apt install libomp-dev

Slow performance

  • Ensure CMAKE_BUILD_TYPE=Release
  • Check OpenMP is enabled: look for "Found OpenMP" in CMake output
  • Increase num_threads parameter

Next Steps