图像处理算法:白平衡、除法器、乘法器~笔记

2024-01-24 00:12

本文主要是介绍图像处理算法:白平衡、除法器、乘法器~笔记,希望对大家解决编程问题提供一定的参考价值,需要的开发者们随着小编来一起学习吧!

参考:

基于FPGA的自动白平衡算法的实现        

白平衡初探 (qq.com)         

FPGA自动白平衡实现步骤详解-CSDN博客

xilinx 除法ip核(divider) 不同模式结果和资源对比(VHDL&ISE)_ise除法器ip核-CSDN博客 

 数字信号处理-04- FPGA常用运算模块-除法器(二)-阿里云开发者社区 (aliyun.com)

 【FPGA】:ip核--Divider(除法器)_除法器ip核-CSDN博客

 数字信号处理-04- FPGA常用运算模块-除法器_tlast-CSDN博客

目的:还原出真实的白色

色温的概念和示例:

涉及的资源:除法器、乘法器

除法器基本介绍

LUTMult
LUTMult . A simple lookup estimate of the reciprocal of the divisor followed by a
multiplier. Only remainder output type is supported because of the bias required in the
reciprocal estimate. This bias would introduce an offset (error) if used to create a
fractional output. This is recommended for operand widths less than or equal to 12 bits.
This implementation uses DSP slices, block RAM and a small amount of FPGA logic
primitives (registers and LUTs). For operand widths where either Radix2 or the LUTMult
options are possible, the LUTMult solution offers a solution using fewer FPGA logic
resources because of the use of DSP and block RAM primitives.
除数的倒数和乘数的简单查找估计。由于倒数估计中需要偏差,因此仅支持余数输出类型。如果用于创建小数输出,此偏差会引入偏移(误差)。对于小于或等于 12 位的操作数宽度,建议这样做。该实现使用 DSP Slice、Block RAM 和少量 FPGA 逻辑原语(寄存器和 LUT)。对于可以使用 Radix2 或 LUTMult 选项的操作数宽度,LUTMult 解决方案提供了一种使用更少 FPGA 逻辑资源的解决方案,因为使用了 DSP 和 Block RAM 原语。
Radix-2.
Radix-2. Radix-2 non-restoring integer division using integer operands, allowing either
a fractional or an integer remainder to be generated. This is recommended for operand
widths less than around 16 bits or for applications requiring high throughput. The
Send Feedback  implementation uses FPGA logic primitives (registers and LUTs). The Radix2 solution does not use DSP or block RAM primitives, so this implementation is recommended
when these primitives are needed elsewhere.
基数-2。使用整数操作数的 Radix-2 非恢复整数除法,允许生成小数或整数余数。对于小于 16 位左右的操作数宽度或需要高吞吐量的应用,建议这样做。发送反馈实现使用 FPGA 逻辑原语(寄存器和 LUT)。 Radix2 解决方案不使用 DSP 或块 RAM 原语,因此当其他地方需要这些原语时,建议使用此实现。
High Radix.
High Radix . High Radix division with prescaling. This is recommended for operand
widths greater than around 16 bits. This implementation uses DSP slices and block
RAMs.
带预缩放的高基数除法。对于大于 16 位左右的操作数宽度,建议这样做。该实现使用 DSP Slice 和 Block RAM。

延迟测算

The latency of the divider core is a function of the AXI4-Stream configuration parameters
and the latency of the algorithm selected. Latency is only a constant when the AXI4-Stream
mode is set to Non-Blocking and when the core algorithm and throughput are set such that
one sample is input per clock cycle. If the core is set to accept data only one in N cycles,
then data is only accepted on cycles N, 2N, 3N, …. It is not the case that data is accepted
immediately as long as N cycles have passed since the previous input. Hence, latency
appears to be increased if data is presented before the core is able to accept it. Another
effect which can cause latency to vary and increase is if full AXI4-Stream behavior is
selected. This is because a FIFO is used to manage data for this mode and the depth of the
FIFO adds to the latency. However, it should be noted that the intention of selecting
AXI4-Stream is to replace the need to balance latency with a handshake which manages
data flow at runtime, so latency should be less of a consideration. Because latency can vary
due to these effects, only minimum latency can be determined as a constant for any given
configuration of the core. In the following sections the latency of the algorithm alone is
discussed.
分频器核心的延迟是 AXI4-Stream 配置参数和所选算法延迟的函数。仅当 AXI4-Stream 模式设置为非阻塞且核心算法和吞吐量设置为每个时钟周期输入一个样本时,延迟才为常数。如果内核设置为仅在 N 个周期中接受一个数据,则仅在第 N、2N、3N、... 个周期上接受数据。只要自上一次输入以来经过≥N个周期,数据就不会被立即接受。因此,如果数据在核心能够接受之前呈现,则延迟似乎会增加。另一个可能导致延迟变化和增加的影响是选择完整的 AXI4-Stream 行为。这是因为 FIFO 用于管理此模式的数据,而 FIFO 的深度会增加延迟。然而,应该注意的是,选择 AXI4-Stream 的目的是用在运行时管理数据流的握手来取代平衡延迟的需要,因此延迟应该不被考虑。由于延迟可能会因这些影响而变化,因此对于任何给定的内核配置,只能将最小延迟确定为常数。在以下部分中,仅讨论算法的延迟。

LUTMult
The latency of the fully pipelined LUTMult is 8.
Radix-2
The latency (number of enabled clock cycles required before the core generates the first
valid output) for a fully pipelined divider is a function of the bit width of the dividend. If
fractional output is required, the fully pipelined latency is also a function of the fractional
bit width. In general:

Radix-2 全流水线除法器的延迟(内核生成第一个有效输出之前所需的启用时钟周期数)是被除数位宽的函数。如果需要小数输出,则完全流水线延迟也是小数位宽度的函数。一般来说:

• Fully pipelined latency is of the order M for integer remainder dividers, where M is the
width of the Quotient
• Fully pipelined latency is of the order M + F for fractional remainder dividers where F is
the width of the Fractional output
• 对于整数余数除法器,完全流水线延迟的数量级为 M,其中 M 是商的宽度
• 对于小数余数除法器,完全流水线延迟的数量级为 M + F,其中 F 是小数输出的宽度
Table 2-1 provides a list of the fully pipelined latency formula for divider selections. With
full pipelining, maximum possible performance is achieved. When clocks per division is 1,
latency can be set manually to a figure between 0 and the value shown in Table 2-1 . This
allows the latency of the core to be reduced at the expense of reducing the maximum clock
frequency at which the core can be clocked. Reducing the latency reduces the number of
registers used, but the LUT count remains approximately the same
表 2-1 提供了用于分频器选择的完全流水线延迟公式的列表。通过完整的流水线,可以实现最大可能的性能。当每分频时钟数为1时,延迟可以手动设置为0到表2-1所示值之间的数字。这允许减少内核的延迟,但代价是降低内核可以计时的最大时钟频率。减少延迟会减少使用的寄存器数量,但 LUT 数量保持大致相同
1. M = Dividend and Quotient Width, F = Fractional Width, A = total Latency of AXI interfaces.
1. M = 被除数和商宽度,F = 小数宽度,A = AXI 接口的总延迟。
High Radix Solution
Tables 2-2 and 2-3 show latency for the High Radix solution. To this, add 0 for NonBlocking
mode, 1 for Blocking mode with no output tready and 3 for Blocking mode with output
tready .
表 2-2 和 2-3 显示了高基数解决方案的延迟。为此,添加 0 表示非阻塞模式,添加 1 表示不带输出的阻塞模式,添加 3 表示带输出的阻塞模式。

除法器配置

Common Options

Describes parameters common to both implementations and allows the selection of the divider implementation.

描述两种实现方式共有的参数,并允许选择分频器实现方式。

• Algorithm Type: This selects between Radix-2, LUTMult and High Radix division solutions.

Radix-2 Options计算频次

Clocks Per Division: Determines the throughput of the Radix-2 solution (interval in clocks between inputs (or outputs)). A low value for this parameter results in high throughput, but also in greater resource use.

每分频时钟数:确定 Radix-2 解决方案的吞吐量(输入(或输出)之间的时钟间隔)。此参数的值较低会导致吞吐量较高,但也会导致资源使用量增加。

High Radix and LUTMult Options

Number of iterations (High Radix only): Read-only text field that reports the number of iterations performed by the High Radix engine for each divide. This sets the maximum throughput of the divider. To achieve this throughput, the operands must be supplied as soon as requested by the core s_axis_dividend_tready and s_axis_divisor_tready outputs.

迭代次数(仅限高基数):只读文本字段,报告高基数引擎为每次划分执行的迭代次数。这设置了分频器的最大吞吐量。为了实现此吞吐量,必须在核心 s_axis_dividend_tready 和 s_axis_divisor_tready 输出请求时立即提供操作数。

Throughput (High Radix only): Read-only text field that reports the maximum throughput that can be sustained by the divider when operands are supplied at a constant rate. In AXI blocking modes, throughput might be slightly higher due to Send Feedback Divider Generator v5.1 26 PG151 February 4, 2021 www.xilinx.com Chapter 4: Design Flow Steps buffering. This rate applies when FlowControl is set to NonBlocking and the output channel DOUT has no tready.

吞吐量(仅限高基数):只读文本字段,报告以恒定速率提供操作数时分频器可以维持的最大吞吐量。在 AXI 阻塞模式下,由于发送反馈分频器生成器设计流程步骤缓冲,吞吐量可能会稍高。当FlowControl 设置为NonBlocking 并且输出通道DOUT 没有tready 时,适用此速率。

Common Options

Detect Divide-by-Zero: Check box. Determines if the core has a DIVIDE_BY_ZERO field in the output tuser port (m_axis_dout_tuser) to signal when a division by zero has been performed

检测除零:复选框。确定内核的输出 tuser 端口 (m_axis_dout_tuser) 中是否有 DIVIDE_BY_ZERO 字段,以在执行除以零时发出信号

AXI4-Stream Options

Flow Control: Blocking or NonBlocking. This is more fully explained in AXI4-Stream Considerations in Chapter 3. NonBlocking mode provides an easier migration path from the previous version of the Divider Generator core. Blocking mode eases data flow management to/from other AXI4-Stream blocking mode cores at the expense of some additional resource and latency

流量控制:阻塞或非阻塞。第 3 章中的 AXI4-Stream 注意事项对此进行了更全面的解释。NonBlocking 模式提供了从先前版本的 Divider Generator 核心的更简单的迁移路径。阻塞模式简化了进出其他 AXI4-Stream 阻塞模式内核的数据流管理,但会带来一些额外的资源和延迟

Optimize Goal: This applies only to blocking mode. When ACLKEN is selected and Optimize Goal is set to Resources, performance might be reduced. See Resource Utilization in Chapter 2.

优化目标:这仅适用于阻塞模式。当选择 ACLKEN 并将优化目标设置为资源时,性能可能会降低。请参阅第 2 章中的资源利用。

• Output has TREADY: Selects whether the output channel has a tready signal. This is required to allow back pressure from downstream, for example, if connected to another AXI4-Stream Blocking core. Without tready, downstream circuitry cannot halt dataflow from the divider, but some resource is saved

• 输出有TREADY:选择输出通道是否有TREADY 信号。这是允许来自下游的背压所必需的,例如,如果连接到另一个 AXI4-Stream Blocking 核心。如果没有tready,下游电路无法停止来自分频器的数据流,但可以节省一些资源

Output TLAST Behavior: Selects the source of the output channel tlast signal. When neither or only one input channel has a tlast then the output tlast is not present or derives from the input tlast appropriately. When both input channels have tlast, the output channel tlast can derive from either alone, the logical OR of both inputs, or the logical AND of both inputs.

输出 TLAST 行为:选择输出通道 tlast 信号的源。当没有或只有一个输入通道具有 tlast 时,则输出 tlast 不存在或从输入 tlast 适当导出。当两个输入通道都有 tlast 时,输出通道 tlast 可以单独源自两个输入的逻辑“或”或两个输入的逻辑“与”。

Latency Options配置流水

Latency Configuration: Automatic (fully pipelined) or manual (determined by following field). Latency configuration for Radix2 solution is configurable only when clocks per division is set to 1. This is due to iterative feedback and hence non-optional registers when clocks per division is greater than 1.

延迟配置:自动(完全流水线)或手动(由以下字段确定)。仅当每分频时钟设置为 1 时,Radix2 解决方案的延迟配置才可配置。这是由于迭代反馈,因此当每分频时钟大于 1 时,寄存器是非可选的。

Latency: When Latency Configuration is set to Automatic, this field provides the latency from input to output in terms of clock enabled clock cycles. When Manual, this field is used to specify the latency required. When high performance (clock frequency) is not required, a lower value in this field can save resources

延迟:当延迟配置设置为自动时,此字段以时钟启用的时钟周期提供从输入到输出的延迟。当手动时,该字段用于指定所需的延迟。当不需要高性能(时钟频率)时,该字段的值较低可以节省资源

Control Signals

ACLKEN: Determines if the core has a clock enable input (ACLKEN).

ARESETn: Determines if the core has an active-Low synchronous clear input (ARESETn).

Note:

a. The signal ARESETn always takes priority over ACLKEN, that is, ARESETn takes effect regardless of the state of ACLKEN.

b. The signal ARESETn is active-Low.

c. The signal ARESETn should be held active for at least two clock cycles. This is because, for performance, ARESETn is internally registered before being fed to the reset port of primitives

除法器输出:

Output Channel

• Remainder Type: This selects between remainder types Fractional and Remainder presented on the FRACTIONAL field of the output tdata port (m_axis_dout_tdata). Fractional remainder type is the only option for High Radix.

• 余数类型:在输出tdata 端口(m_axis_dout_tdata) 的FRACTIONAL 字段上显示的余数类型Fractional 和Remainder 之间进行选择。小数余数类型是高基数的唯一选择。

• Fractional Width: If Fractional remainder type is selected, this determines the number of bits provided on the FRACTIONAL field of the output channel (m_axis_dout_tdata). When High Radix is selected, the total output width (quotient part plus fractional part) is limited to 82. The width of the quotient is equal to the width of the dividend and is set in the Dividend channel section. The width of the tuser port is the sum of the present input channel tuser fields plus one if divide_by_zero detect is active. See AXI4-Stream Considerations in Chapter 3 for the internal structure of the tuser port. This channel also has a tlast port if either of the input channels has a tlast port

• 小数类型:

如果选择小数类型,则这个选择将确定输出通道的小数字段上提供的位数(m_axis_dout_tdata)。

当选择 High Radix 时,总输出宽度(商部分加小数部分)限制为 82。商的宽度等于被除数的宽度,并在 Dividend Channel 部分中设置。如果divide_by_zero 检测处于活动状态,tuser 端口的宽度是当前输入通道tuser 字段的总和加一。有关 tuser 端口的内部结构,请参阅第 3 章中的 AXI4-Stream 注意事项。

如果任一输入通道有 tlast 端口,则该通道也有 tlast 端口

TDATA Structure for Output (DOUT) Channel The structure of m_axis_dout_tdata is more complex. This port contains both quotient and, if present, remainder or fractional outputs. When the remainder type is set to remainder, the two outputs are considered separate and so are byte-oriented before being concatenated to make the m_axis_dout_tdata signal. When remainder type is fractional, the fractional part is considered an extension of the quotient so these two fields are concatenated before being padded to the next byte boundary.

输出 (DOUT) 通道的 TDATA 结构 m_axis_dout_tdata 的结构更为复杂。该端口包含商以及余数或小数输出(如果存在)。

当余数类型设置为余数时,两个输出被视为独立的,因此在连接形成 m_axis_dout_tdata 信号之前是面向字节的。

当余数类型为小数时,小数部分被视为商的扩展,因此这两个字段在填充到下一个字节边界之前会被连接起来。

WITH REMAINDER:PAD+QUOTIENT+REMAINDER+PAD

WITH FRACTIONAL PART:PAD+QUOTIENT+FRACTIONAL

这篇关于图像处理算法:白平衡、除法器、乘法器~笔记的文章就介绍到这儿,希望我们推荐的文章对编程师们有所帮助!



http://www.chinasem.cn/article/637971

相关文章

Python中的随机森林算法与实战

《Python中的随机森林算法与实战》本文详细介绍了随机森林算法,包括其原理、实现步骤、分类和回归案例,并讨论了其优点和缺点,通过面向对象编程实现了一个简单的随机森林模型,并应用于鸢尾花分类和波士顿房... 目录1、随机森林算法概述2、随机森林的原理3、实现步骤4、分类案例:使用随机森林预测鸢尾花品种4.1

不懂推荐算法也能设计推荐系统

本文以商业化应用推荐为例,告诉我们不懂推荐算法的产品,也能从产品侧出发, 设计出一款不错的推荐系统。 相信很多新手产品,看到算法二字,多是懵圈的。 什么排序算法、最短路径等都是相对传统的算法(注:传统是指科班出身的产品都会接触过)。但对于推荐算法,多数产品对着网上搜到的资源,都会无从下手。特别当某些推荐算法 和 “AI”扯上关系后,更是加大了理解的难度。 但,不了解推荐算法,就无法做推荐系

康拓展开(hash算法中会用到)

康拓展开是一个全排列到一个自然数的双射(也就是某个全排列与某个自然数一一对应) 公式: X=a[n]*(n-1)!+a[n-1]*(n-2)!+...+a[i]*(i-1)!+...+a[1]*0! 其中,a[i]为整数,并且0<=a[i]<i,1<=i<=n。(a[i]在不同应用中的含义不同); 典型应用: 计算当前排列在所有由小到大全排列中的顺序,也就是说求当前排列是第

csu 1446 Problem J Modified LCS (扩展欧几里得算法的简单应用)

这是一道扩展欧几里得算法的简单应用题,这题是在湖南多校训练赛中队友ac的一道题,在比赛之后请教了队友,然后自己把它a掉 这也是自己独自做扩展欧几里得算法的题目 题意:把题意转变下就变成了:求d1*x - d2*y = f2 - f1的解,很明显用exgcd来解 下面介绍一下exgcd的一些知识点:求ax + by = c的解 一、首先求ax + by = gcd(a,b)的解 这个

综合安防管理平台LntonAIServer视频监控汇聚抖动检测算法优势

LntonAIServer视频质量诊断功能中的抖动检测是一个专门针对视频稳定性进行分析的功能。抖动通常是指视频帧之间的不必要运动,这种运动可能是由于摄像机的移动、传输中的错误或编解码问题导致的。抖动检测对于确保视频内容的平滑性和观看体验至关重要。 优势 1. 提高图像质量 - 清晰度提升:减少抖动,提高图像的清晰度和细节表现力,使得监控画面更加真实可信。 - 细节增强:在低光条件下,抖

【数据结构】——原来排序算法搞懂这些就行,轻松拿捏

前言:快速排序的实现最重要的是找基准值,下面让我们来了解如何实现找基准值 基准值的注释:在快排的过程中,每一次我们要取一个元素作为枢纽值,以这个数字来将序列划分为两部分。 在此我们采用三数取中法,也就是取左端、中间、右端三个数,然后进行排序,将中间数作为枢纽值。 快速排序实现主框架: //快速排序 void QuickSort(int* arr, int left, int rig

poj 3974 and hdu 3068 最长回文串的O(n)解法(Manacher算法)

求一段字符串中的最长回文串。 因为数据量比较大,用原来的O(n^2)会爆。 小白上的O(n^2)解法代码:TLE啦~ #include<stdio.h>#include<string.h>const int Maxn = 1000000;char s[Maxn];int main(){char e[] = {"END"};while(scanf("%s", s) != EO

秋招最新大模型算法面试,熬夜都要肝完它

💥大家在面试大模型LLM这个板块的时候,不知道面试完会不会复盘、总结,做笔记的习惯,这份大模型算法岗面试八股笔记也帮助不少人拿到过offer ✨对于面试大模型算法工程师会有一定的帮助,都附有完整答案,熬夜也要看完,祝大家一臂之力 这份《大模型算法工程师面试题》已经上传CSDN,还有完整版的大模型 AI 学习资料,朋友们如果需要可以微信扫描下方CSDN官方认证二维码免费领取【保证100%免费

【学习笔记】 陈强-机器学习-Python-Ch15 人工神经网络(1)sklearn

系列文章目录 监督学习:参数方法 【学习笔记】 陈强-机器学习-Python-Ch4 线性回归 【学习笔记】 陈强-机器学习-Python-Ch5 逻辑回归 【课后题练习】 陈强-机器学习-Python-Ch5 逻辑回归(SAheart.csv) 【学习笔记】 陈强-机器学习-Python-Ch6 多项逻辑回归 【学习笔记 及 课后题练习】 陈强-机器学习-Python-Ch7 判别分析 【学

系统架构师考试学习笔记第三篇——架构设计高级知识(20)通信系统架构设计理论与实践

本章知识考点:         第20课时主要学习通信系统架构设计的理论和工作中的实践。根据新版考试大纲,本课时知识点会涉及案例分析题(25分),而在历年考试中,案例题对该部分内容的考查并不多,虽在综合知识选择题目中经常考查,但分值也不高。本课时内容侧重于对知识点的记忆和理解,按照以往的出题规律,通信系统架构设计基础知识点多来源于教材内的基础网络设备、网络架构和教材外最新时事热点技术。本课时知识