Platform Documentation

NexBI Architecture & Guide

A comprehensive deep-dive into NexBI's 8 core modules, Semantic Layer knowledge graph, integration ecosystem, and military-grade security infrastructure.

Platform Architecture & Deployment Mechanics

Data Integration & Automated Semantic Modeling

NexBI connects directly to your existing infrastructure, ensuring data sovereignty without requiring a massive warehouse migration. Under the hood, we leverage a high-throughput proprietary ingestion pipeline, maintaining strict tracking of extraction lineage (_nexbi_extracted_at, _nexbi_meta) to guarantee data provenance.

Within the Data Studio, engineers can visualize raw tables, automatically infer Primary/Foreign Key relationships across disparate schemas (e.g., purchases.user_id -> users.id), and define computed columns or row-level filters prior to ingestion. This guarantees the neuro-symbolic engine interacts with a strongly-typed, deterministic semantic layer rather than raw, chaotic database architectures.

Pipeline & Integration Specifications

Extraction: Standardized metadata integration for complete traceability.
Transformation (ELT): Visual interface for pre-ingestion row filtering and computed column definitions.
Context Management: Modular Knowledge Collections with explicit # reference invocation.
Execution: Isolated environments for the Code Interpreter and ML Notebook to ensure secure, deterministic analytical outputs.

The 8 Core Modules

Data Studio

Explore your data as a living knowledge base, queryable in natural language by anyone in your organization.

Architecture & Role Specifications

Transformation Engine: Visual pipeline mapping directly to underlying DAG execution nodes.
Entity Resolution: Automated schema inference mapping abstract schemas to business entities.
Type-safe Operations: Pre-computation of scalar values and type enforcement at ingestion.

Secure ML Studio

A sandboxed MLOps workspace to build, train, and deploy predictive models in a secure, isolated environment.

Architecture & Role Specifications

Sandboxed Execution: Ephemeral runtime environments isolated from internet-facing networks.
Zero-Data Movement: Models are trained and evaluated directly against local clusters.
Model Lineage: Strict versioning of serialized weights and algorithmic hyper-parameters.

Anomaly Detection

Continuous monitoring of your parameters with smart alerts — detect deviations before they become incidents.

Architecture & Role Specifications

Statistical Baselines: Continuous deviation tracking using bounded confidence intervals.
Threshold Heuristics: Programmable logical triggers acting on high-velocity data streams.
Alert Routing: Role-based dispatching of incidents through defined webhook channels.

Regulatory Watch

Automated tracking of sector-specific regulations, standards, and compliance updates integrated into your knowledge base.

Architecture & Role Specifications

Semantic Indexing: Continuous processing of unstructured compliance literature.
Policy Cross-referencing: Automated flagging of database schemas against indexed policy constraints.
Audit Immutability: Tamper-evident logs of all system configurations and data access.

Conversational Search

Query your internal documents — standards, audit reports, technical sheets — by simple conversation in Arabic, French, or English.

Architecture & Role Specifications

Intent Parsing: Neuro-symbolic resolution of ambiguous queries against enterprise taxonomy.
Hybrid Search: Dense vector embeddings combined with sparse keyword indexing.
Contextual Sandboxing: Search scope bounded explicitly by user's permission parameters.

Auto Reports

Generate executive summaries and PowerPoint presentations directly from your data, eliminating manual formatting.

Architecture & Role Specifications

Data-Driven Compilation: Dynamic injection of aggregated metrics into structured template files.
Scheduled Aggregation: Time-based orchestration of intensive aggregations without locking tables.
Multi-Format Export: Native serialization to application-specific buffers (PDF, PPTX).

Access Control

Granular role-based access — each department sees only their data perimeter, with no compromise on security.

Architecture & Role Specifications

Row-Level Sandboxing: Hard isolation of records at the query compiler level.
Identity Synchronization: Automated parsing of corporate directories for permission propagation.
Comprehensive Auditing: Persistent logging of all read/write actions mapped to session IDs.

Native Integrations

Connect to SAP, Oracle, Sage, IBM DB2, Odoo and more — without disruption to your existing systems.

Architecture & Role Specifications

Connection Abstraction: Unified interface masking the complexities of vendor-specific APIs.
Schema Drift Resilience: Graceful handling of upstream table modifications via semantic mapping.
High-Throughput Sync: Optimized parallel chunking for bulk data transfers.

Interactive Capabilities

The conversational interface serves as a unified gateway to the entire semantic layer. Instead of requiring users to navigate complex BI dashboards or write custom SQL, NexBI dynamically generates the exact interface required to answer the query. Whether it's rendering a financial forecast, extracting data from a compliance PDF, or building a new machine learning model, the system provisions the right analytical environment in real-time.

Stateful NLP to SQL/API Routing

The neuro-symbolic engine parses complex linguistic intents, resolving ambiguities against the enterprise knowledge graph before compiling deterministic SQL queries or internal API calls. It completely bypasses probabilistic text-to-SQL errors by validating schemas in real-time.

Abstract Syntax Tree (AST) query validation
Role-Based Access Control (RBAC) enforced at compilation
Multi-turn context retention

Deterministic Routing Engine

NexBI Sovereign Engine active. How can I help you analyze your data today?

AST Execution Trace

Lexical Parsing

Explicit Routing

Access Governance

DAG Compilation

Serialization

Dynamic Client-Side Visualization Engine

Instead of rendering static images or relying on iframe embedding, the system dynamically compiles query outputs into native interactive components. The layout engine recalculates dimensions and streams data points directly to the browser for instant exploration.

Zero-latency client-side rendering
Native data-to-component serialization
Interactive drill-downs via event callbacks

NEURAL_SYMBOLIC_ENGINE

[TARGET] Sovereign_Data_Mesh

Mode: Autonomous_Reasoning

> Context_Mapping_Node [ID: ctx_01]

Type: Semantic_Embedding

AccessLevel: Encrypted_L3

> Synthesis_View_Node [ID: insight_v2]

Dimensionality: [7, 2]

Compute: Edge_Accel

Latency: 1.2ms

Synthesis: 24msDelivery: 12ms

Total Revenue

$2.4M

+12.5% this month

Active Users

45.2K

+5.2% this month

LIVE SOCKET

Growth Trajectory

Visual DAG Editor & Transformation Pipeline

A low-code orchestration interface mapping directly to underlying DAG execution. Users visually define Directed Acyclic Graphs that the engine translates into scheduled ELT jobs, handling schema drift and dependency resolution natively.

Auto-inferred entity resolution (PK/FK mapping)
Type-safe computed columns
Live data sampling and schema validation

ETL_Pipeline_V2● Active Stream

PostgreSQL

dbt Transform

Knowledge DB

Execution Log

[14:02:01] Extracting: public.fact_sales_events

[14:02:02] Schema Map: purchases.user_id -> users.id

[14:02:03] Validation: Enforcing strict types (int8, varchar(255))

[14:02:04] Warning: 12 rows dropped (null constraints)

[14:02:05] Indexing: HNSW Vectorization started...

Batch 42 Sync4.2M / 5.1M records

Vectorized Multi-Modal Ingestion Core

Unstructured documents are parsed, chunked, and embedded into a high-dimensional vector space using proprietary embedding models. The indexing pipeline preserves metadata, enabling hybrid search (semantic + keyword) with strict document-level permissions.

Automated OCR and table extraction
Hierarchical chunking strategies
Dense vector storage with HNSW indexing

VECTOR_INDEXING_ENGINE

// Chunking Strategy: RecursiveCharacter

chunkSize: 512, chunkOverlap: 64

// Embedding Pipeline

model: "nexbi-multilingual-v3"

dimensions: 1536 (Dense Vector)

// Storage Backend

indexType: "HNSW"

metric: "Cosine Similarity"

doc_count: 47, total_vectors: 124,050

</DocumentProcessor>

Q3 Financial Report.pdf

PDF|45K Tokens|3.2K Vectors

Employee Handbook.docx

Word|12K Tokens|840 Vectors

Product Requirements

Wiki|8K Tokens|512 Vectors

Semantic search ready

HNSW INDEXED

Sandboxed MLOps Execution Environment

A secure, isolated Python runtime environment for executing data science workloads. The engine spins up ephemeral containers for model training, leveraging the connected data layer securely without exposing raw datasets to public internet.

Ephemeral container execution
Pre-configured analytical libraries
One-click model serialization and API deployment

forecast_model.ipynb

IDLE

[1]:

import nexbi
from nexbi.models import AutoForecaster

dataset = nexbi.query("Q3_Revenue_Data")
model = AutoForecaster(target="revenue").fit(dataset)

Out:

Model trained successfully in 2.4s

Sandbox Telemetry

Container State

Status: Isolated (No-Egress)

Memory: 1.24 GB / 8.00 GB

vCPU: 4 Cores allocated

Model Artifacts

Type: GradientBoosting

Weights: 12.4 MB

Pickled: SUCCESS

Data Provenance

Source: Q3_Revenue_Data

Row Count: 420,192

PII Filter: Active

Deterministic Knowledge Routing

LLM hallucinations are fundamentally a symptom of unconstrained context windows. NexBI mitigates this by compiling your modeled data into discrete, isolated "Knowledge Collections" (e.g., segregating "Purchases" from "Products").

During query execution, context is explicitly scoped. Users define exact data boundaries by invoking specific collections via # tags. This explicit routing drastically reduces token overhead, minimizes compute latency, and enforces strict access governance by sandboxing the model's search space.

Universal Translation

Maps rigid database schemas to intuitive business metrics.

Access Governance

Applies row-level and column-level security globally.

Continuous Learning

Refines its understanding based on user query corrections.

Semantic Translation

Technical Engine

Graph Nodes: 14.2M

Vector Dist (L2): 0.014

Index Type: HNSW

Business Value

Entity Resolution: Auto-Mapped

Data Silos: Unified

Time to Insight: Real-time

Live Traversal:

SAP.Invoice_ID → SF.Account_ID

Confidence Score: 99.8%

SAP ERPFinance & HR

PostgreSQLCustom Apps

SharePointDocuments

SalesforceCRM Data

Vector Core

Deg_In: 4 (Sync)

Deg_Out: 3 (Active)

Centrality: 0.89

SEMANTICLAYER

NLP & Semantics

DecisionsAutomated Actions

AlertsReal-time Triggers

ReportsLive Dashboards

Semantifying NLP queries into human-understandable business metrics.

Ecosystem & Integrations

NexBI is designed to sit on top of your existing architecture. With over 150+ native connectors, you can seamlessly ingest metadata and index documents without creating duplicate data silos.

Available Connectors in Category

SAP

Oracle

Sage

Odoo

Microsoft Dynamics

Sync Telemetry

Pipeline State

Protocol: gRPC / TLS 1.3

Method: Incremental CDC

Latency: 12ms (Active)

Schema Enforcement

Validation: Strict Typing

Anomalies: 0 Detected

Mapping: Auto-resolved

Resource Allocation

Workers: 4 (Auto-scaled)

Memory Limit: 16GB per node

Throughput: 2.4M rows/s

Sovereign Data Pipeline

Syncing Active

SAPCDC Stream Active

OracleCDC Stream Active

SageCDC Stream Active

OdooCDC Stream Active

NexBI Engine

Computing DAG...

Vectorizing

[14:35:32][CDC_SYNC_SUCCESS]Extracted delta from SAP. Normalized and mapped to central ontology.

[14:35:32][CDC_SYNC_SUCCESS]Extracted delta from Oracle. Normalized and mapped to central ontology.

[14:35:32][CDC_SYNC_SUCCESS]Extracted delta from Sage. Normalized and mapped to central ontology.

Security Architecture

Guardrail Telemetry

VPC Isolation State

Network: Air-Gapped

Egress: Blocked

Ingress: Strict IP Whitelist

Cryptographic Layer

Data-at-rest: AES-256-GCM

Data-in-transit: TLS 1.3

Key Mgmt: HSM Rotation

RBAC Enforcer

Policy Engine: Deterministic

Query Rewrite: Active

Row-level Sec: Enforced

NexBI Security Console

Air-Gapped

Local Node (192.168.1.10)

0 bytes external egress • E2E Encrypted

Sovereign

STATE: ISOLATED

Every request passes through a multi-layered security mesh before processing. Security is strictly deterministic, ensuring AI hallucinations can never bypass data governance.

On-Premise Deployment

Deployed within your own infrastructure — private cloud or air-gapped environments.

Data Sovereignty

Your data never leaves your environment. Full ownership of all models and outputs.

Granular Access Control

Department-level RBAC — production, lab, commercial, and executive each see their perimeter.

Full Audit Trail

Every query, every decision, every access — fully traceable and auditable.

Engagement Models

NexBI offers flexible adoption pathways, from rapid proof-of-concepts to full enterprise sovereign deployments.

Rapid Proof of Value (PoV)

Popular

3-4 weeks

Departmental Rollout

90-day cycles

Sovereign Enterprise License

Annual commitment

Strategic Co-Innovation

Multi-year contract

Rapid Proof of Value (PoV) Roadmap

Deploy on a critical use case with your real data. Immediate measurable ROI, zero public cloud risk, and no forced commitment beyond the pilot.

Week 1

Deployment & Integration

Connect databases securely without moving data.

Week 2

Semantic Modeling

Define business logic, metrics, and RBAC rules.

Week 3

Insight Generation

Interactive dashboards and NLP query testing.

Week 4

Value Delivery

Executive presentation and go/no-go decision.

Ready to build your sovereign Semantic Layer?

Contact our enterprise team to design a secure, localized deployment architecture for your organization.

Request Architecture Review