fpgaImagePipeline#

Executive Summary#

Module simulates the FPGA image processing pipeline of a star-tracker/MAC camera instrument. It accepts a camera image (either from a message or from a file path) and runs it through five sequential pipeline stages, publishing every intermediate data product as an output message so that verification and test code can observe the full computation.

Pipeline stages:

Calibration pre-processing — per-pixel operations driven by a 16-bit calibration image whose upper nibble encodes an op-code and lower 12 bits encode a literal or register reference.
Separable box blur — a pipelined separable box blur with a configurable kernel size of 5, 7, or 9. kernelSize rows are processed simultaneously; the kernel is never placed partially outside the image, so the border strip of (kernelSize-1)/2 pixels on every side is set to zero (no zero-padding or partial sums).
Binary threshold — pixels above threshold are set to 1, others to 0, packed MSB-first into bytes (ceil(width*height/8) bytes total).
Row/column sums — counts of above-threshold pixels accumulated per row and per column.
ROI ranking — the image is divided into square regions of roiRegionSize pixels per side; the top 8 regions by above-threshold pixel count are reported.

Overall data flow:

┌──────────┐     ┌─────────────┐     ┌──────────┐     ┌───────────┐     ┌─────────┐
│  Camera  │     │ Calibration │     │   Box    │     │ Threshold │     │   Row/  │
│  Image   │────▶│    Pre-     │────▶│   Blur   │────▶│  (1-bit   │────▶│  Col    │────▶ ROI
│ (12-bit) │     │ processing  │     │ (2-pass) │     │  packed)  │     │  Sums   │
└──────────┘     └─────────────┘     └──────────┘     └───────────┘     └─────────┘
                       │                   │                 │                │
                 rawImageOutMsg    blurredImageOutMsg  threshImageOutMsg  rowColSumOutMsg

Stage 2: Separable Box Blur#

The blur is a separable 2-D box filter that mimics the FPGA streaming pipeline. kernelSize rows are buffered and processed simultaneously. The kernel window never extends outside the image boundary, so only pixels where a full k×k footprint fits entirely within the image receive a blurred value. The border strip of half = (kernelSize-1)/2 pixels on every side is set to zero.

Why separable? A k×k box filter applied naively requires k² additions per pixel. Decomposing it into k independent 1-D horizontal sums (one per buffered row) followed by a single vertical reduction over those k sums reduces the work to 2k operations per pixel, matching what the FPGA hardware implements.

Valid output region#

For an image of width W and height H with kernel size k (half = (k-1)/2):

┌──────────────────────────────────────────────────────┐
│         border (half rows, all zero)                 │
├─────┬────────────────────────────────────────┬───────┤
│  b  │                                        │   b   │
│  o  │   valid blurred pixels                 │   o   │
│  r  │   rows  half .. H-1-half               │   r   │
│  d  │   cols  half .. W-1-half               │   d   │
│  e  │                                        │   e   │
│  r  │   (W - k + 1) cols × (H - k + 1) rows │   r   │
├─────┴────────────────────────────────────────┴───────┤
│         border (half rows, all zero)                 │
└──────────────────────────────────────────────────────┘

Pipeline data flow#

For each row window rStart = 0 .. H-k, k rows are held in the pipeline simultaneously. A rowSums[k] array holds one running 1-D horizontal window sum per row:

Image rows (rStart = 0, k = 5):

    row 0: [p00 p01 p02 p03 p04 p05 ...]
    row 1: [p10 p11 p12 p13 p14 p15 ...]
    row 2: [p20 p21 p22 p23 p24 p25 ...]   ← processed in parallel
    row 3: [p30 p31 p32 p33 p34 p35 ...]
    row 4: [p40 p41 p42 p43 p44 p45 ...]

rowSums initialised over columns [0..k-1]:

    rowSums[0] = p00+p01+p02+p03+p04
    rowSums[1] = p10+p11+p12+p13+p14
    rowSums[2] = p20+p21+p22+p23+p24
    rowSums[3] = p30+p31+p32+p33+p34
    rowSums[4] = p40+p41+p42+p43+p44

Column reduction → blurBuf_ at centre pixel (rStart+half, c+half):

    colSum = rowSums[0]+rowSums[1]+rowSums[2]+rowSums[3]+rowSums[4]
    blurBuf_[(rStart+2)*W + (0+2)] = colSum >> shift

Sliding window (column advance)#

After each column position the horizontal window in every row slides right by one pixel: the pixel entering the right edge of the k-wide window is added; the pixel falling off the left edge is subtracted:

Column c=0  →  c=1:

    rowSums[i] += rawBuf_[(rStart+i)*W + (c+k)]   ← add column k
    rowSums[i] -= rawBuf_[(rStart+i)*W + c]        ← remove column 0

Column position across valid range (W=10, k=5):

    c=0: window covers cols [0..4], output at col 2
    c=1: window covers cols [1..5], output at col 3
    c=2: window covers cols [2..6], output at col 4
    ...
    c=5: window covers cols [5..9], output at col 7  (last valid position)

Row window advance#

After all column positions are processed for a given rStart, the row window slides down by one row. Row sums are re-initialised from scratch for the new window:

rStart=0: rows 0-4 → outputs in row 2  (centre of rows 0-4)
rStart=1: rows 1-5 → outputs in row 3
rStart=2: rows 2-6 → outputs in row 4
...
rStart=H-k: rows H-k..H-1 → outputs in row H-1-half

┌─────────────────────────────────────────────────────────┐
│  rStart=0   [ row0 | row1 | row2 | row3 | row4 ]        │
│                                  ↑ output row 2         │
├─────────────────────────────────────────────────────────┤
│  rStart=1   [ row1 | row2 | row3 | row4 | row5 ]        │
│                                  ↑ output row 3         │
├─────────────────────────────────────────────────────────┤
│  rStart=2   [ row2 | row3 | row4 | row5 | row6 ]        │
│                                  ↑ output row 4         │
└─────────────────────────────────────────────────────────┘

Normalisation#

The 2-D box sum (sum of k² pixel values) is right-shifted to approximate division by k²:

blurBuf_[r][c] = colSum >> blurShift(kernelSize)

kernelSize  blurShift  divisor
----------  ---------  -------
     5          1          2
     7          2          4
     9          3          8

Worked example#

The following traces the algorithm on a 10×10 image with kernelSize = 5 (half = 2, blurShift = 1). Input pixel values are pixel[r][c] = r + c + 1.

Input image:

col:   0   1   2   3   4   5   6   7   8   9
row 0: 1   2   3   4   5   6   7   8   9  10
row 1: 2   3   4   5   6   7   8   9  10  11
row 2: 3   4   5   6   7   8   9  10  11  12
row 3: 4   5   6   7   8   9  10  11  12  13
row 4: 5   6   7   8   9  10  11  12  13  14
row 5: 6   7   8   9  10  11  12  13  14  15
row 6: 7   8   9  10  11  12  13  14  15  16
row 7: 8   9  10  11  12  13  14  15  16  17
row 8: 9  10  11  12  13  14  15  16  17  18
row 9: 10  11  12  13  14  15  16  17  18  19

numOutRows = numOutCols = 10 - 5 + 1 = 6; valid outputs written to rows 2-7, cols 2-7.

rStart = 0 (pipeline window: rows 0-4)

Initialise rowSums[5] over columns 0-4:

rowSums[0] = 1+2+3+4+5  = 15    (row 0, cols 0-4)
rowSums[1] = 2+3+4+5+6  = 20    (row 1, cols 0-4)
rowSums[2] = 3+4+5+6+7  = 25    (row 2, cols 0-4)
rowSums[3] = 4+5+6+7+8  = 30    (row 3, cols 0-4)
rowSums[4] = 5+6+7+8+9  = 35    (row 4, cols 0-4)

Column c = 0 — kernel footprint rows 0-4, cols 0-4 (brackets mark the active window):

col:   0   1   2   3   4   5  ...
row 0: [ 1   2   3   4   5 ] 6  ...
row 1: [ 2   3   4   5   6 ] 7  ...
row 2: [ 3   4   5   6   7 ] 8  ...   (centre row, rStart+half = 2)
row 3: [ 4   5   6   7   8 ] 9  ...
row 4: [ 5   6   7   8   9 ]10  ...

colSum = 15+20+25+30+35 = 125. blurBuf_ at centre pixel (2, 2): 125 >> 1 = 62.

Advance window — add column 5, subtract column 0:

rowSums[0] += pixel[0][5] - pixel[0][0] = 6 - 1 = +5  --> 20
rowSums[1] += pixel[1][5] - pixel[1][0] = 7 - 2 = +5  --> 25
rowSums[2] += pixel[2][5] - pixel[2][0] = 8 - 3 = +5  --> 30
rowSums[3] += pixel[3][5] - pixel[3][0] = 9 - 4 = +5  --> 35
rowSums[4] += pixel[4][5] - pixel[4][0] = 10- 5 = +5  --> 40

Column c = 1 — kernel footprint rows 0-4, cols 1-5:

col:   0  [1   2   3   4   5 ] 6  ...
row 0: 1  [2   3   4   5   6 ] 7  ...
row 1: 2  [3   4   5   6   7 ] 8  ...
row 2: 3  [4   5   6   7   8 ] 9  ...   (centre row)
row 3: 4  [5   6   7   8   9 ]10  ...
row 4: 5  [6   7   8   9  10 ]11  ...

colSum = 20+25+30+35+40 = 150. blurBuf_ at centre pixel (2, 3): 150 >> 1 = 75.

Columns c = 2..5 follow the same pattern; each advance adds +5 to every rowSum because the input gradient is uniform (+1 per pixel).

rStart = 1 (pipeline window: rows 1-5)

rowSums is re-initialised from rows 1-5 over columns 0-4. Output pixels are written to row rStart + half = 3. The same column sliding proceeds across c = 0..5.

Row pipeline progression (all `rStart` steps, at `c = 0`)#

The table below shows the rowSums contribution from every row that is active in the k-row window for each rStart step. Values shown are the 1-D horizontal sums over cols 0–4. Rows outside the current window are blank. The window slides one row downward each step; the colSum and blurred output follow at the bottom:

rStart:         0      1      2      3      4      5
           ┌──────────────────────────────────────────
row  0:    │  [15]
row  1:    │  [20]  [20]
row  2:    │  [25]  [25]  [25]
row  3:    │  [30]  [30]  [30]  [30]
row  4:    │  [35]  [35]  [35]  [35]  [35]
row  5:    │        [40]  [40]  [40]  [40]  [40]
row  6:    │              [45]  [45]  [45]  [45]
row  7:    │                    [50]  [50]  [50]
row  8:    │                          [55]  [55]
row  9:    │                                [60]
           └──────────────────────────────────────────
colSum:        125    150    175    200    225    250
out row:         2      3      4      5      6      7
blurBuf:        62     75     87    100    112    125

The staircase shape traces the five-row window sweeping from the top of the image to the bottom. Each column of the table is one rStart iteration; each row of the table is one rowSums[i] entry. The output row advances by one for every step because the centre of the window is always at rStart + half.

Complete blurBuf_ output (0 = border pixel, never written):

col:   0    1    2    3    4    5    6    7    8    9
row 0: 0    0    0    0    0    0    0    0    0    0
row 1: 0    0    0    0    0    0    0    0    0    0
row 2: 0    0   62   75   87  100  112  125    0    0
row 3: 0    0   75   87  100  112  125  137    0    0
row 4: 0    0   87  100  112  125  137  150    0    0
row 5: 0    0  100  112  125  137  150  162    0    0
row 6: 0    0  112  125  137  150  162  175    0    0
row 7: 0    0  125  137  150  162  175  187    0    0
row 8: 0    0    0    0    0    0    0    0    0    0
row 9: 0    0    0    0    0    0    0    0    0    0

Manual spot-check — blurBuf_ at (3, 7), kernel at rows 1-5, cols 5-9:

sum = (7+8+9+10+11) + (8+9+10+11+12) + (9+10+11+12+13)
    + (10+11+12+13+14) + (11+12+13+14+15)
    =  45 + 50 + 55 + 60 + 65
    = 275
275 >> 1 = 137  (matches blurBuf_[3][7] above)

Message Connection Descriptions#

Module I/O Messages#
Msg Variable Name	Msg Type	Description
imageInMsg	CameraImageMsgPayload	Optional camera image input message; the `imagePointer` is cast to `uint16_t*` (12-bit values in lower bits, one element per pixel).
rawImageOutMsg	FpgaRawImageMsgPayload	Calibration-preprocessed image; `imagePointer` points to the module’s internal buffer.
blurredImageOutMsg	FpgaRawImageMsgPayload	Box-blurred image; `imagePointer` points to the module’s internal buffer.
threshImageOutMsg	FpgaThreshImageMsgPayload	1-bit packed binary threshold result; `imagePointer` points to the module’s internal buffer.
rowColSumOutMsg	FpgaRowColSumMsgPayload	Per-row and per-column above-threshold pixel counts.
roiOutMsg	FpgaBinsMsgPayload	Top-8 ROI regions sorted descending by above-threshold pixel count.
configOutMsg	FpgaPipelineConfigMsgPayload	Snapshot of the active pipeline configuration.

User Guide#

Import the module:

from xmera.fswAlgorithms import fpgaImagePipeline

Instantiate and configure:

pipe = fpgaImagePipeline.FpgaImagePipeline()
pipe.ModelTag = "fpgaPipeline"
pipe.setImageWidth(4096)
pipe.setImageHeight(3000)
pipe.setKernelSize(5)          # 5, 7, or 9
pipe.setThreshold(200)
pipe.setRoiRegionSize(64)      # 64, 128, or 256

Connect an image source (choose one):

# Option A — disk file (useful for unit testing)
pipe.setImageFileName("/path/to/image.tiff")

# Option B — live message from camera emulator
pipe.imageInMsg.subscribeTo(cameraModule.imageOutMsg)

Optionally enable calibration:

pipe.setCalibEnabled(True)
pipe.setCalibImageFile("/path/to/calib.tiff")
pipe.setCalibRegA(100)   # register values for op-codes 0x1/0x6/0xb

Optionally enable intermediate image saving:

pipe.setSaveImages(True)
pipe.setSaveDir("/tmp/fpga_out")

Add to simulation task:
```
sim.AddModelToTask(taskName, pipe)
```

Access output messages or internal buffers from Python:

# Read output message fields
rawMsg = pipe.rawImageOutMsg.read()
print(rawMsg.width, rawMsg.height)
# Access internal pixel values directly (for testing)
pixel = pipe.getRawPixel(row * width + col)
above = pipe.getThreshBit(row * width + col)

Note: internal buffer accessors are available for testing only and expose raw pointers that are valid only for the lifetime of the module.

Class FpgaImagePipeline#

class FpgaImagePipeline : public SysModel#

FPGA image processing pipeline simulation module.

Simulates the FPGA image processing pipeline of a star-tracker/MAC camera instrument. Every intermediate data product is published as an output message for verification.