教程 ·

Redshift加速、优化渲染的运算方式

翻译:七色基围蝦(Finmeteor Yiu),BKS

(可能有包含不准确的地方,请大家多多包涵)

*(公式计算的数值是准确的,大家可以动手算算)*

*ray bundle -- 这里指的是RS自适应采样的切片大小,基础数值为32(工作原理比方说,每采样32次就分析一下这个地方是否干净了,如果干净了就停止采样,如果不干净就继续再采样32次)自适应机制暂且不能靠手动更改*

*ray bundle packets 指的是进行了多少组数的ray bundle计算,但在(maximum samples - minimum samples)\ray bundle =ray bundle packets < 1时,会自动降低ray bundle的值*

Redshift uses an adaptive unified sampling engine to fire rays intelligently. This allows it to fire more samples at desired areas and it reduces the amounts of samples where it is not necessary.properly optimizing this can lead to faster renders with less noise. This is controlled in several ways.

You can also turn“off”adaptivity and use a pure brute force method which fires rays equally across every pixel in the image.this is done by making all local samples 1 and making the min/max the same. It’s not efficient and slow using that method because you lose all the advantages of redshift’s intelligent adaptivity.we will be covering how the unified sampling engine works behind the scenes to best optimize your settings and take advantage of the speed and quality benefits it provides

Redshift使用自适应统一采样引擎来智能地烘制光线。 这样可以在需要的区域烘制更多的样本,并且在没有必要的情况下减少了采样的次数。正确地优化可以使得渲染更快,噪点更少。 此外有这几种控制方式。

您也可以关闭自适应采样,并使用简单暴力的计算法,在图像里的每个像素上,均匀地算出光线。使所有样本的采样为1,并使最小/最大值的相同采样来完成。 使用这种方法效率和速度都很慢。这操作使你失去了rs的智能适应性的所有优势。接下来我们将介绍引擎中统一采样如何在幕后工作,利用最佳地优化设置使渲染的速度和质量都提高,发挥它的优势

 

TO take advantage of the intelligent adaptive sampler in redshift just remember you need more local samples than maximum samples.

This lets the unified sampler focus on anti aliasing,depth of field,and motion blur samples instead of lights,GI,specular,etc

为了利用Reshift中的智能自适应采样器,只需记住您需要的基础部分采样比最大采样的部分更多一点。

这才能让统一采样器专注于在抗锯齿,景深和运动模糊的样本上,而不把时间浪费在灯光,GI,镜面反射等上方.

 

Minimum samples- the minimum amount of rays fired

Maximum samples- the maximum amount of rays fired

Local samples- the local samples fired(specular,refractions,loghts,AO,Brute force GI)

Adaptive error threshold- controls”intelligence” sensitivity of the adaptive engine

最小采样 - 光线的最小数量

最大采样 - 光线的最大数量

局部采样 - 局部的光线采样(镜面反射,折射,loghts,AO,GI)

自适应错误阈值 - 控制自适应引擎的“智能”分析层次

 

TO calculate how redshift samples and how many”primary rays per pixel”it shoots you use a simple equation

您可以使用一个简单的公式计算出Reshift是怎么对“每个像素点上主光线通过的数量”来进行计算的

Local samples\max samples = primary rays per pixel

局部采样\最大采样=每个像素点上主光线通过的数量(接下来简称主光)

This is important because redshift cannot fire less than 1 primary ray pixel. So if your local samples are less than max samples redshift is forced to fire 1 primary ray. For example 256 local samples\512 max samples = .5 which means redshift rounds up to 1

划重点,由于Reshift无法触发少于1的主光线像素。 如果您的本地样本数量少于最大样本数,Reshift会将主光提升为1。 例如,256采样\ 512采样= .5,实际采样数值取1(类似int 低于1的全归类为1

Using this knowledge we can then figure out how many samples per pixel are fired using another set of equations

利用这公式的知识点,我们就可以计算出不同组里每个像素中的采样值了

Primary rays per pixel*minimum samples=minimum samples per pixel

Primary rays per pixel*maximum samples=maximum samples per pixel

主光x最小采样=每个像素点上的最小采样

主光x最大采样-每个像素点上的最大采样

 

This now lets us know the bounding contraints the adaptive sampling engine has to work with。For example lets say you are using 16 min samples,512 max samples and 1024 local samples.

现在我们知道自适应采样引擎必须约束在这公式中运作。例如,假设您使用了最小16个采样,512个最大采样和1024个基本采样。

1024 local samples\512 max samples = 2primary rays per pixel

16 min samples *2 primary rays per pixel = 32 minimum samples per pixel

512 max samples *2 primary rays per pixel =1024 max samples per pixel

1024基本采样\512 max samples =2主光

16最小采样*2主光=每像素点上进行了32次最小采样

512最大采样*2主光=每像素点上进行了1024次最大采样

This now lets us know the goal posts the redshift adaptive sampling engine can use 。It will fire anywhere between 32-1024 samples per pixel.to increase the sensitivity of the engine you now can decrease the adaptive error threshold.lower numbers make it more sensitive to noise which causes it to fire more rays until it reaches the maximum if needed,the default is 0.01

现在我们知道了Reshift自适应采样引擎可以使用的目标任务量。它在图像中以每像素32-1024个采样点间触发解算设置。为了提高引擎的灵活度,您现在可以降低自适应误差阈值。较低的数值能更灵活的控制噪点计算,发射光线的计算范围可以计算到你设置的最大值为止。默认值是0.01

 

The “show samples”override lets us better  visualize where  the  rays are being fired.the darker the area the closer to minimum rays fired,the lighter the closer to maximum.notice how they all share the same areas where it fires less or more rays.the adaptive sampler error threshold is working the same way at 0.01 across all 4 images with a minimum of 16 samples.the only difference is that the overall gardient increases the higher the maximum it can go.

“show samples”覆盖让我们可以更好地观察光线在哪里烘制。越靠近最小光线的区域越黑,越接近最大值越亮。请注意它们是如何共享(光线)相同区域的,哪个部分射出了更少或者更多的光线 。自适应采样器误差阈值在以下4幅图像中都以0.01的相同方式工作,最小为16个采样。唯一的区别是整体灰度渐变越大,其可以达到的最大值越高。

 

A way of thinking of this is as below.4 gradients each doubling in size.the adaptive sampler stops firing rays in the same spot but because the gradient spectrum scale increases it falls on a different value but they all are technically cutting off at roughly the same ray count.this is why the render times are almost the same even when max samples is 1024/2048 local.the sampler is being intelligent and not firing more rays than needed.the reason it does increase render time slightly is because it is still firing extra rays in a few areas it feels it still needs like strong specular highlights and anti aliased edges.

对此的一种思考方式如下. 4个渐变的尺寸倍增,自适应采样器停止在同一个区域上,在这基础上再烘制光线。由于灰度光谱尺度的增加,它的计算数值会落在不同的值上。但它们在技术上使用的是相同的光线计数方式。这就是为什么即使最大采样数为1024/2048局部时,渲染时间也几乎相同的原因。采样器是智能的,并不会烘制比实际需要中还多的光线。如果采用了这方法渲染时间变得更长了,那有可能是某几个区域还在计算额外的光线,那几个区域需要更强烈的镜面高光或者反锯齿边缘。

 

Eventually for a static image with no dof or motion blur you hit diminishing returns where your wasting rays.typically you won’t need more than 512 maximum samples for your upper bounds.when using dof and motion blur you can easily require up 1025+ max samples to clean that noise.

最终,对于没有景深或运动模糊的静态图像,您可以增加采样来获得递减收益。通常,您不需要设512的最大样本来限制最大上限。对于景深和运动模糊,您可以使用高于1025+ 的采样来清洁那些噪点,这时的渲染速度不会因此变慢。

 

There is also a hidden mechanic with the way redshift groups sample workload to the gpu,redshift fires rays in groups of 4 min,8,16 and 32 max.this process is called ray sorting.as redshift renders a pixel it fires multiple ray bundles until the algorithm says stop.we will call multiple ray bundles”ray bundle packets”.the adaptive sampling engine analyzes each group bundle and decides whether to fire more samples or stop.so if you have low ray bundle pacekets fired the adaptive engine has less information to work with leading to a decrease in speed and noise reduction.redshift fires the minimum samples first and then continues to fire ray bundle packets until it,it decides its enough or hits maximum.an equation to find out how many ray bundle packets is as follows.

在这里还有一个隐藏的机制,就是Reshift会将组样本工作量采样到GPU中,Reshift组以最小采样4,8,16和最大采样32为单位烘制光线。这个过程的运作称为光线排序。Reshift会将在一个像素点上用多个光线追踪复合计算,直到算法说停止的时候才停止。这里我们将会调用多个光射线束“射线束组”。自适应采样引擎分析每个束组,并决定它是否要发射更多的样本,还是计算到一半就可停止。因此,如果您有低射线组束发射,自适应引擎的计算次数会减少信息来降低噪点。重新切换会先触发最小样本,然后继续激发射线组束,直到射线束组到(机器判定的最大)或达到最大值的时候停止。下面方程可以让你找出射线束组。

(Maximum samples - minimum samples\ray bundle = ray bundle packets[integer >=1]’ then shoot 32 max,if not try16,8,4 min ray bundle

射线束\(最大采样-最小采样)=射线束组【不足1只取整数为1】如果不采用其他参数,只会采用最大采样32,16,8,最小采用4为标准计算

This now lets us know the amount of ray bundle packets redshift sends to the gpu.for example lets use this example.(1024 max - 32 min)\32 ray bundle=31 ray bundle packets

This means redshift has 31times to analyze the pixel and intelligently decide whether to stop firing rays or continue until reaching the maximum samples per pixel count.

接下来将用这个粒子解释一下Reshift发送到GPU的光束包的数量是如何计算的

(1024-32)\32=31束组

这意味着Reshift有31次用于分析像素,并智能地决定是否停止发射光线亦或继续,直至每个像素达到计数的最大采样数。

So what if the ray bundle packets are not whole numbers like 15.5?For example

(512max -16min)\ 32ray bundle =15.5 ray bundle packets

This means redshift can fit 15ray bundle packets of 32 ray bundle sizes and then use smaller ray bundle sizes like 16,8,4 until it reach the maximum samples per pixel count if the adaptive sampler needs that many.

那么如果射线束组不是像15.5这样的整数呢?例如

(512max -16min)\ 32ray= 15.5束组

这意味着Reshift可以适应32次光线运算中进行15次的束组计算,然后使用较小的光束尺寸(如16,8,4),直到自适应采样器达到你需要的每个像素点上的最大采样值为止
32 ray bundle*15 ray bundle packets =480 + 16 ray bundle = 496

Meaning it actually took 16 ray bundle packets,15 packets of 32ray bundles and 1 extra packet of a 16 sized ray bundle.if for example the 16 did not fit,it would step down to using 8 and eventually 4 as a minimum.

32射线束* 15射线束组= 480 + 16射线束= 496

这意味着它实际上需要16束组,15包32束光束和1个额外的16个光束包。例如,如果16不适合,它将下降到使用8,8也不适合的话最终降为4

So in order to maximize the effeciency of each ray group you want redshift to always fire 32 sized ray groups,you begin to have diminishing returns if your firing more ray bundle but using smaller ray bundle clusters like 16,8,4 instead of the full 32. 32 minimum samples per pixel which can be divided into the maximum samples per pixel exponents of 32 evenly optimize the ray bundle packets to use the max cluseter of 32.

因此,为了最大限度地提高每个射线组的效率,Reshift默认发射32个射线组,如果开始你并不希望Reshift保持在32,你可以使用更小的射线束集群(如16,8,4),需从32的采样数中递减。这样在控制上,可以保证每个像素里有32次采样,而每个像素的采样样本当中,含有32基数的均匀优化射线束组达到最大值计算。

This is important because if you don’t have enough ray bundle packets the algorithm wastes rays because it is unable to distinguish when to stop firing without enough packets.with enough packets of ray bundles to work intelligently even if your bounding constraints per primary ray fired are equal.the ray bundle packet amount will alter how the adaptive sampler decides to fire ray samples.this is of course an ideal scenario,redshift’s adaptive sampler can stop firing ray bundle packets long before it reaches the full maximum samples and for the final packets use smaller ray  bundle sizes like 4,8,16.

这一点很重要,因为如果你没有足够的射线束组,算法就会浪费光线计算的利用率,因为它无法区分在没有足够的包的情况下要何时停止烘制射线。即使每个主射线的边界都有约束,射线束能根据数据量自动改变自适应采样器的射线样本(这当然是在一种理想的情况下),Reshift的自适应采样器仍会将其采样达到最大值上,所以此前先建议从4,8,16往上递增测试

Below we have examples showing the differences between having more or less ray bundle packets with your render times.below both images have equal amounts of primary rays per pixel.the range for the algorithm to work with between 64 min 2024 max are equal.but the image on the right is using local samples to help the unified sampler.the image on the left has 61 ray bundle packets which give smoother gradient transitions.

下面我们举例说明具有更多或更少光线束组的数据与渲染时间之间的差异。以下两幅图像的每个像素的主射线数量相等。算法在最小采样64到最大采样2024之间使用的范围。但是, 右侧的图像使用本地采样来统一采样器。左侧的图像具有61个射线束组,可以提供更平滑的渐变过渡。

But it is working harder and firing more rays across the board because it doesn’t have as many dark spots as the image on the right.even though the image on the right has only 31 ray bundle packets.this leads to the image on the right having faster render times because it is being intelligent about where it samples by firing less samples by firing less samples where they are not needed because the use of local samples makes the unified sampling engine work effciently.the image on the left is having diminishing returns using that many ray bundle packets.

渲染器在两图中耗时几乎相近,并且工作程度都是一样完全计算的,而右图中却用了31次射线束,导致采样计算次数低了,没达到计数中的最大值,故噪点比较多.事实达到右边图的效果可以降低采样来计算,这样不止计算量少,也能更快速的达到右图的效果。

 

The examples below also have equal boundaries of 256 min primary rays per pixel and 1024 max primary rays per pixel.even though the image on the left is not using local samples to help the unified sampler,it is still rendering faster because it is using 24 ray bundle packets instead of the much lower 4 ray bundle packets of the right.so even if the image on the right is using local samples to help the unidied sampler,because it is using only 4 ray bundle packets the adaptive sampling engine has difficulty with noise detection which increases render times

下面的例子也有相同的边缘区域,每个像素有最小采样的256和最大采样的1024的光线通过。尽管左侧的图像没有使用基础采样来帮助自适应统一采样器,但它仍然渲染得更快。虽然它使用的是24射线束组,但计算只用了4射线束组就完成了。右边的图像虽然使用局部样本来帮助自适应单独采样器,但它仅仅使用了4射线束组计算,这对于自适应采样引擎来说是很难分辨处理的,所以不仅耗费了时间噪点也没有减少。

It is important to keep a balance between too many ray bundle packets or too few while keeping in mind the balance between min,max,and local samples

在记录最小,最大和基础采样之间平衡的同时,也要注重射线束组里通过最大最小采样的数据平衡。

参与评论