.. _fpgaimagepipeline:

fpgaImagePipeline
=================


Executive Summary
-----------------

Module simulates the FPGA image processing pipeline of a star-tracker/MAC camera instrument.
It accepts a camera image (either from a message or from a file path) and runs it through
five sequential pipeline stages, publishing every intermediate data product as an output
message so that verification and test code can observe the full computation.

Pipeline stages:

1. **Calibration pre-processing** — per-pixel operations driven by a 16-bit calibration image
   whose upper nibble encodes an op-code and lower 12 bits encode a literal or register reference.
2. **Separable box blur** — a pipelined separable box blur with a configurable kernel size of 5, 7,
   or 9.  ``kernelSize`` rows are processed simultaneously; the kernel is never placed partially
   outside the image, so the border strip of ``(kernelSize-1)/2`` pixels on every side is set to
   zero (no zero-padding or partial sums).
3. **Binary threshold** — pixels above ``threshold`` are set to 1, others to 0, packed MSB-first
   into bytes (``ceil(width*height/8)`` bytes total).
4. **Row/column sums** — counts of above-threshold pixels accumulated per row and per column.
5. **ROI ranking** — the image is divided into square regions of ``roiRegionSize`` pixels per side;
   the top 8 regions by above-threshold pixel count are reported.

Overall data flow::

    ┌──────────┐     ┌─────────────┐     ┌──────────┐     ┌───────────┐     ┌─────────┐
    │  Camera  │     │ Calibration │     │   Box    │     │ Threshold │     │   Row/  │
    │  Image   │────▶│    Pre-     │────▶│   Blur   │────▶│  (1-bit   │────▶│  Col    │────▶ ROI
    │ (12-bit) │     │ processing  │     │ (2-pass) │     │  packed)  │     │  Sums   │
    └──────────┘     └─────────────┘     └──────────┘     └───────────┘     └─────────┘
                           │                   │                 │                │
                     rawImageOutMsg    blurredImageOutMsg  threshImageOutMsg  rowColSumOutMsg


Stage 2: Separable Box Blur
---------------------------

The blur is a separable 2-D box filter that mimics the FPGA streaming pipeline.
``kernelSize`` rows are buffered and processed simultaneously.  The kernel window
never extends outside the image boundary, so only pixels where a full k×k footprint
fits entirely within the image receive a blurred value.  The border strip of
``half = (kernelSize-1)/2`` pixels on every side is set to zero.

**Why separable?**  A k×k box filter applied naively requires k² additions per pixel.
Decomposing it into k independent 1-D horizontal sums (one per buffered row) followed
by a single vertical reduction over those k sums reduces the work to 2k operations per
pixel, matching what the FPGA hardware implements.

Valid output region
~~~~~~~~~~~~~~~~~~~

For an image of width W and height H with kernel size k (``half = (k-1)/2``)::

    ┌──────────────────────────────────────────────────────┐
    │         border (half rows, all zero)                 │
    ├─────┬────────────────────────────────────────┬───────┤
    │  b  │                                        │   b   │
    │  o  │   valid blurred pixels                 │   o   │
    │  r  │   rows  half .. H-1-half               │   r   │
    │  d  │   cols  half .. W-1-half               │   d   │
    │  e  │                                        │   e   │
    │  r  │   (W - k + 1) cols × (H - k + 1) rows │   r   │
    ├─────┴────────────────────────────────────────┴───────┤
    │         border (half rows, all zero)                 │
    └──────────────────────────────────────────────────────┘

Pipeline data flow
~~~~~~~~~~~~~~~~~~

For each row window ``rStart = 0 .. H-k``, ``k`` rows are held in the pipeline simultaneously.
A ``rowSums[k]`` array holds one running 1-D horizontal window sum per row::

    Image rows (rStart = 0, k = 5):

        row 0: [p00 p01 p02 p03 p04 p05 ...]
        row 1: [p10 p11 p12 p13 p14 p15 ...]
        row 2: [p20 p21 p22 p23 p24 p25 ...]   ← processed in parallel
        row 3: [p30 p31 p32 p33 p34 p35 ...]
        row 4: [p40 p41 p42 p43 p44 p45 ...]

    rowSums initialised over columns [0..k-1]:

        rowSums[0] = p00+p01+p02+p03+p04
        rowSums[1] = p10+p11+p12+p13+p14
        rowSums[2] = p20+p21+p22+p23+p24
        rowSums[3] = p30+p31+p32+p33+p34
        rowSums[4] = p40+p41+p42+p43+p44

    Column reduction → blurBuf_ at centre pixel (rStart+half, c+half):

        colSum = rowSums[0]+rowSums[1]+rowSums[2]+rowSums[3]+rowSums[4]
        blurBuf_[(rStart+2)*W + (0+2)] = colSum >> shift

Sliding window (column advance)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

After each column position the horizontal window in every row slides right by one pixel:
the pixel entering the right edge of the k-wide window is added; the pixel falling off
the left edge is subtracted::

    Column c=0  →  c=1:

        rowSums[i] += rawBuf_[(rStart+i)*W + (c+k)]   ← add column k
        rowSums[i] -= rawBuf_[(rStart+i)*W + c]        ← remove column 0

    Column position across valid range (W=10, k=5):

        c=0: window covers cols [0..4], output at col 2
        c=1: window covers cols [1..5], output at col 3
        c=2: window covers cols [2..6], output at col 4
        ...
        c=5: window covers cols [5..9], output at col 7  (last valid position)

Row window advance
~~~~~~~~~~~~~~~~~~

After all column positions are processed for a given ``rStart``, the row window slides
down by one row.  Row sums are re-initialised from scratch for the new window::

    rStart=0: rows 0-4 → outputs in row 2  (centre of rows 0-4)
    rStart=1: rows 1-5 → outputs in row 3
    rStart=2: rows 2-6 → outputs in row 4
    ...
    rStart=H-k: rows H-k..H-1 → outputs in row H-1-half

    ┌─────────────────────────────────────────────────────────┐
    │  rStart=0   [ row0 | row1 | row2 | row3 | row4 ]        │
    │                                  ↑ output row 2         │
    ├─────────────────────────────────────────────────────────┤
    │  rStart=1   [ row1 | row2 | row3 | row4 | row5 ]        │
    │                                  ↑ output row 3         │
    ├─────────────────────────────────────────────────────────┤
    │  rStart=2   [ row2 | row3 | row4 | row5 | row6 ]        │
    │                                  ↑ output row 4         │
    └─────────────────────────────────────────────────────────┘

Normalisation
~~~~~~~~~~~~~

The 2-D box sum (sum of k² pixel values) is right-shifted to approximate division by k²::

    blurBuf_[r][c] = colSum >> blurShift(kernelSize)

    kernelSize  blurShift  divisor
    ----------  ---------  -------
         5          1          2
         7          2          4
         9          3          8


Worked example
~~~~~~~~~~~~~~

The following traces the algorithm on a 10×10 image with ``kernelSize = 5``
(``half = 2``, ``blurShift = 1``).  Input pixel values are ``pixel[r][c] = r + c + 1``.

Input image::

    col:   0   1   2   3   4   5   6   7   8   9
    row 0: 1   2   3   4   5   6   7   8   9  10
    row 1: 2   3   4   5   6   7   8   9  10  11
    row 2: 3   4   5   6   7   8   9  10  11  12
    row 3: 4   5   6   7   8   9  10  11  12  13
    row 4: 5   6   7   8   9  10  11  12  13  14
    row 5: 6   7   8   9  10  11  12  13  14  15
    row 6: 7   8   9  10  11  12  13  14  15  16
    row 7: 8   9  10  11  12  13  14  15  16  17
    row 8: 9  10  11  12  13  14  15  16  17  18
    row 9: 10  11  12  13  14  15  16  17  18  19

``numOutRows = numOutCols = 10 - 5 + 1 = 6``; valid outputs written to rows 2-7, cols 2-7.

**rStart = 0 (pipeline window: rows 0-4)**

Initialise ``rowSums[5]`` over columns 0-4::

    rowSums[0] = 1+2+3+4+5  = 15    (row 0, cols 0-4)
    rowSums[1] = 2+3+4+5+6  = 20    (row 1, cols 0-4)
    rowSums[2] = 3+4+5+6+7  = 25    (row 2, cols 0-4)
    rowSums[3] = 4+5+6+7+8  = 30    (row 3, cols 0-4)
    rowSums[4] = 5+6+7+8+9  = 35    (row 4, cols 0-4)

Column ``c = 0`` — kernel footprint rows 0-4, cols 0-4 (brackets mark the active window)::

    col:   0   1   2   3   4   5  ...
    row 0: [ 1   2   3   4   5 ] 6  ...
    row 1: [ 2   3   4   5   6 ] 7  ...
    row 2: [ 3   4   5   6   7 ] 8  ...   (centre row, rStart+half = 2)
    row 3: [ 4   5   6   7   8 ] 9  ...
    row 4: [ 5   6   7   8   9 ]10  ...

colSum = 15+20+25+30+35 = 125.
``blurBuf_`` at centre pixel (2, 2): 125 >> 1 = **62**.

Advance window — add column 5, subtract column 0::

    rowSums[0] += pixel[0][5] - pixel[0][0] = 6 - 1 = +5  --> 20
    rowSums[1] += pixel[1][5] - pixel[1][0] = 7 - 2 = +5  --> 25
    rowSums[2] += pixel[2][5] - pixel[2][0] = 8 - 3 = +5  --> 30
    rowSums[3] += pixel[3][5] - pixel[3][0] = 9 - 4 = +5  --> 35
    rowSums[4] += pixel[4][5] - pixel[4][0] = 10- 5 = +5  --> 40

Column ``c = 1`` — kernel footprint rows 0-4, cols 1-5::

    col:   0  [1   2   3   4   5 ] 6  ...
    row 0: 1  [2   3   4   5   6 ] 7  ...
    row 1: 2  [3   4   5   6   7 ] 8  ...
    row 2: 3  [4   5   6   7   8 ] 9  ...   (centre row)
    row 3: 4  [5   6   7   8   9 ]10  ...
    row 4: 5  [6   7   8   9  10 ]11  ...

colSum = 20+25+30+35+40 = 150.
``blurBuf_`` at centre pixel (2, 3): 150 >> 1 = **75**.

Columns ``c = 2..5`` follow the same pattern; each advance adds +5 to every ``rowSum``
because the input gradient is uniform (+1 per pixel).

**rStart = 1 (pipeline window: rows 1-5)**

``rowSums`` is re-initialised from rows 1-5 over columns 0-4.  Output pixels are
written to row ``rStart + half = 3``.  The same column sliding proceeds across ``c = 0..5``.

Row pipeline progression (all ``rStart`` steps, at ``c = 0``)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The table below shows the ``rowSums`` contribution from every row that is active in
the k-row window for each ``rStart`` step.  Values shown are the 1-D horizontal sums
over cols 0–4.  Rows outside the current window are blank.  The window slides one
row downward each step; the colSum and blurred output follow at the bottom::

    rStart:         0      1      2      3      4      5
               ┌──────────────────────────────────────────
    row  0:    │  [15]
    row  1:    │  [20]  [20]
    row  2:    │  [25]  [25]  [25]
    row  3:    │  [30]  [30]  [30]  [30]
    row  4:    │  [35]  [35]  [35]  [35]  [35]
    row  5:    │        [40]  [40]  [40]  [40]  [40]
    row  6:    │              [45]  [45]  [45]  [45]
    row  7:    │                    [50]  [50]  [50]
    row  8:    │                          [55]  [55]
    row  9:    │                                [60]
               └──────────────────────────────────────────
    colSum:        125    150    175    200    225    250
    out row:         2      3      4      5      6      7
    blurBuf:        62     75     87    100    112    125

The staircase shape traces the five-row window sweeping from the top of the image to
the bottom.  Each column of the table is one ``rStart`` iteration; each row of the
table is one ``rowSums[i]`` entry.  The output row advances by one for every step
because the centre of the window is always at ``rStart + half``.

Complete ``blurBuf_`` output (0 = border pixel, never written)::

    col:   0    1    2    3    4    5    6    7    8    9
    row 0: 0    0    0    0    0    0    0    0    0    0
    row 1: 0    0    0    0    0    0    0    0    0    0
    row 2: 0    0   62   75   87  100  112  125    0    0
    row 3: 0    0   75   87  100  112  125  137    0    0
    row 4: 0    0   87  100  112  125  137  150    0    0
    row 5: 0    0  100  112  125  137  150  162    0    0
    row 6: 0    0  112  125  137  150  162  175    0    0
    row 7: 0    0  125  137  150  162  175  187    0    0
    row 8: 0    0    0    0    0    0    0    0    0    0
    row 9: 0    0    0    0    0    0    0    0    0    0

Manual spot-check — ``blurBuf_`` at (3, 7), kernel at rows 1-5, cols 5-9::

    sum = (7+8+9+10+11) + (8+9+10+11+12) + (9+10+11+12+13)
        + (10+11+12+13+14) + (11+12+13+14+15)
        =  45 + 50 + 55 + 60 + 65
        = 275
    275 >> 1 = 137  (matches blurBuf_[3][7] above)


Message Connection Descriptions
--------------------------------

.. list-table:: Module I/O Messages
    :widths: 25 25 50
    :header-rows: 1

    * - Msg Variable Name
      - Msg Type
      - Description
    * - imageInMsg
      - :ref:`CameraImageMsgPayload`
      - Optional camera image input message; the ``imagePointer`` is cast to ``uint16_t*``
        (12-bit values in lower bits, one element per pixel).
    * - rawImageOutMsg
      - :ref:`FpgaRawImageMsgPayload`
      - Calibration-preprocessed image; ``imagePointer`` points to the module's internal buffer.
    * - blurredImageOutMsg
      - :ref:`FpgaRawImageMsgPayload`
      - Box-blurred image; ``imagePointer`` points to the module's internal buffer.
    * - threshImageOutMsg
      - :ref:`FpgaThreshImageMsgPayload`
      - 1-bit packed binary threshold result; ``imagePointer`` points to the module's internal buffer.
    * - rowColSumOutMsg
      - :ref:`FpgaRowColSumMsgPayload`
      - Per-row and per-column above-threshold pixel counts.
    * - roiOutMsg
      - :ref:`FpgaBinsMsgPayload`
      - Top-8 ROI regions sorted descending by above-threshold pixel count.
    * - configOutMsg
      - :ref:`FpgaPipelineConfigMsgPayload`
      - Snapshot of the active pipeline configuration.


User Guide
----------

#. Import the module::

    from xmera.fswAlgorithms import fpgaImagePipeline

#. Instantiate and configure::

    pipe = fpgaImagePipeline.FpgaImagePipeline()
    pipe.ModelTag = "fpgaPipeline"
    pipe.setImageWidth(4096)
    pipe.setImageHeight(3000)
    pipe.setKernelSize(5)          # 5, 7, or 9
    pipe.setThreshold(200)
    pipe.setRoiRegionSize(64)      # 64, 128, or 256

#. Connect an image source (choose one)::

    # Option A — disk file (useful for unit testing)
    pipe.setImageFileName("/path/to/image.tiff")

    # Option B — live message from camera emulator
    pipe.imageInMsg.subscribeTo(cameraModule.imageOutMsg)

#. Optionally enable calibration::

    pipe.setCalibEnabled(True)
    pipe.setCalibImageFile("/path/to/calib.tiff")
    pipe.setCalibRegA(100)   # register values for op-codes 0x1/0x6/0xb

#. Optionally enable intermediate image saving::

    pipe.setSaveImages(True)
    pipe.setSaveDir("/tmp/fpga_out")

#. Add to simulation task::

    sim.AddModelToTask(taskName, pipe)

#. Access output messages or internal buffers from Python::

    # Read output message fields
    rawMsg = pipe.rawImageOutMsg.read()
    print(rawMsg.width, rawMsg.height)
    # Access internal pixel values directly (for testing)
    pixel = pipe.getRawPixel(row * width + col)
    above = pipe.getThreshBit(row * width + col)

Note: internal buffer accessors are available for testing only and expose raw pointers
that are valid only for the lifetime of the module.


Class FpgaImagePipeline
-----------------------

.. doxygenclass:: FpgaImagePipeline
   :project: xmera
   :members: