<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Электронный научно-практический журнал «Современные научные исследования и инновации» &#187; частотное маскирование</title>
	<atom:link href="http://web.snauka.ru/issues/tag/chastotnoe-maskirovanie/feed" rel="self" type="application/rss+xml" />
	<link>https://web.snauka.ru</link>
	<description></description>
	<lastBuildDate>Sat, 18 Apr 2026 09:41:14 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>On Spectral Resolution in Audio Compression Systems</title>
		<link>https://web.snauka.ru/en/issues/2020/01/91139</link>
		<comments>https://web.snauka.ru/en/issues/2020/01/91139#comments</comments>
		<pubDate>Mon, 13 Jan 2020 04:28:11 +0000</pubDate>
		<dc:creator>Сучилин Владимир Александрович</dc:creator>
				<category><![CDATA[05.00.00 Technical sciences]]></category>
		<category><![CDATA[audio compression]]></category>
		<category><![CDATA[frequency masking]]></category>
		<category><![CDATA[psychoacoustics]]></category>
		<category><![CDATA[spectral resolution]]></category>
		<category><![CDATA[аудиосжатие]]></category>
		<category><![CDATA[психоакустикa]]></category>
		<category><![CDATA[спектральное разрешение]]></category>
		<category><![CDATA[частотное маскирование]]></category>

		<guid isPermaLink="false">https://web.snauka.ru/issues/2020/01/91139</guid>
		<description><![CDATA[Introduction The features for the perception of audio information by human ear are well-known [1]. One of these features is the inability of the human hearing to distinguish rather weak sounds in the presence of a more intense tone nearby. In psychoacoustics, it is denoted as the frequency masking. In the frequency domain, this appears [...]]]></description>
			<content:encoded><![CDATA[<p><strong><span style=" Arial;  medium;">Introduction</span></strong></p>
<p><span style=" Arial;  medium;">The features for the perception of audio information by human ear are well-known [1]. One of these features is the inability of the human hearing to distinguish rather weak sounds in the presence of a more intense tone nearby. In psychoacoustics, it is denoted as the frequency masking. In the frequency domain, this appears when two harmonic oscillations are simultaneously perceived in the restricted frequency range called as the masking area [2]. Actually, the choice of this area should be associated with the limitations of the resolution of the auditory perception, which is the ability to distinguish between two individual but close tone pitches. Empirically, in the range of less than 1000 Hz, the normal human hearing perceives a frequency deviation less than 3 Hz (up to 1.5 Hz). Moreover, above 1000 Hz, it can be estimated as follows [2]:</span></p>
<p><span style=" Arial;  medium;">            ŝ ≈ 0.0035 f   ,           (Ɐ f &gt; 1 kHz)             (1)</span></p>
<p><span style=" Arial;  medium;">In general, it is senseless to talk about the frequency masking, if some tones are indistinguishable to human hearing. Therefore, the interval [f-s;f+s] can be considered as the genuine masking area.</span></p>
<p>&nbsp;</p>
<p><strong><span style=" Arial;  medium;">The use of the frequency masking with DFT</span></strong></p>
<p><span style=" Arial;  medium;">To use of the frequency masking, the digitalized audio signal is transformed in the frequency domain, which is performed by means of DFT [3]-[4]. The latter recalculates N consecutive samples of the digital signal {x</span><sub><span style=" Arial;  medium;">n</span></sub><span style=" Arial;  medium;">} into N pairs of coefficients of the complex spectrum S(k) characterizing the representation of the signal in the frequency domain:</span></p>
<p><span style=" Arial;  medium;">Re [S(k)] = </span><img src="http://content.snauka.ru/web/91139_files/0.gif" alt="" width="136" height="42" /><span style=" medium;">        </span><span style=" Arial;  medium;">(2)</span></p>
<p><span style=" Arial;  medium;">Im [S(k)] = </span><img src="http://content.snauka.ru/web/91139_files/0(1).gif" alt="" width="134" height="42" /></p>
<p><span style=" Arial;  medium;">Further, based on these entities, the amplitude spectrum of the signal is formed. Due to the central symmetry of the amplitude spectrum (i.e. spectrogram), only the first N/2 values are used for the analysis:</span></p>
<p><span style=" Arial;  medium;">|A(k)| = </span><img src="http://content.snauka.ru/web/91139_files/0(2).gif" alt="" width="163" height="26" /><span style=" Arial;  medium;"> </span><span style=" medium;">, </span><span style=" Arial;  medium;">k = 1…Ñ</span><span style=" medium;">           </span><span style=" Arial;  medium;">(3)</span></p>
<p>where Ñ = (N+1)/2 by odd N, and  Ñ = N/2 by even N.</p>
<p><span style=" Arial;  medium;">Thus the DFT allows getting N samples of a signal spectrum in the range from zero to half of the sampling rate. Such spectrum, in contrast to the “continuous” spectrum of the signal, is discrete and, as N increases, more and more approaches the real spectrum. In digital audio compression, since N is usually finite when using the DFT it is also reasonable to take into account the spectral resolution, which is expressed by the equality:</span></p>
<p><span style=" Arial;  medium;">ȗ = F</span><sub><span style=" Arial;  medium;">s </span></sub><span style=" Arial;  medium;">/ N       (4)</span></p>
<p><span style=" Arial;  medium;">With the audio spectrogram, the minimum size of the masking area must be formed in view of the actual spectral resolution, since the use of the frequency masking should be based on evaluating values for two adjacent frequencies of the spectrogram. That means that the reasonable use of the frequency masking is possible only when the following criterion is met:</span></p>
<p><span style=" Arial;  medium;">ȗ ≤ ŝ       (5)</span></p>
<p><span style=" Arial;  medium;">Let us define the spectrogram frequency as </span><em><span style=" Arial;  medium;">cut-off frequency</span></em><span style=" Arial;  medium;">, when after this one the criterion (5) is no longer satisfied. In view of (1) and (4), that can be simply expressed as:</span></p>
<p><span style=" Arial;  medium;">f<sub>c</sub> ≈ F<sub>s </sub>(0.0035 N)<sup>-1</sup>          (6)</span></p>
<p><span style=" Arial;  medium;">Thus, the cut-off frequency is directly proportional to the sampling frequency of the audio signal and inversely proportional to the number of samples of DFT.</span></p>
<p>&nbsp;</p>
<p><strong><span style=" Arial;  medium;">Example of cut-off frequencies</span></strong></p>
<p><span style=" Arial;  medium;">In the audio compression systems (i.e. audio encoder MP3 [6] or digital broadcasting [4]), the implementation of DFT is generally basing on 1024-point FFT [7]. For this case, the results of applying (6) to the sampling rates used in audio compression systems are shown in the following table.</span></p>
<p><strong><span style=" Arial;  medium;">Table 1.</span></strong><span style=" Arial;  medium;"> Cut-off frequencies by 1024-point FFT</span></p>
<table border="1">
<tbody>
<tr valign="top">
<td width="142">
<div align="center"><span style=" Arial;">Sampling rate (kHz)</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">FFT resolution (Hz)</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">Cut-off frequency (Hz)</span></div>
</td>
</tr>
<tr valign="top">
<td width="142">
<div align="center"><span style=" Arial;">32.0</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">31.25</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">9000</span></div>
</td>
</tr>
<tr valign="top">
<td width="142">
<div align="center"><span style=" Arial;">44.1</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">43.07</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">12500</span></div>
</td>
</tr>
<tr valign="top">
<td width="142">
<div align="center"><span style=" Arial;">48.0</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">46.88</span></div>
</td>
<td width="132">
<div align="center"><span style=" Arial;">13500</span></div>
</td>
</tr>
</tbody>
</table>
<p><span style=" Arial;  medium;">Obviously, by the e.g. doubled power of FFT (i.e. 2048-point FFT) the cut-off frequencies in Table 1 will be increased by two times. That leads to the entire range of frequencies perceived by the human hearing while using the each of the sampling rate of the Table 1.</span></p>
<p>&nbsp;</p>
<p><strong><span style=" Arial;  medium;">Discussion</span></strong></p>
<p><span style=" Arial;  medium;">From the results in Table 1, it follows that the area of the reasonable use of the frequency masking in audio compression systems depends on the spectral resolution. As an example, consider the spectrogram (Fig. 1), where 1024-point FFT is used.</span></p>
<p style="text-align: center;"><img class="aligncenter size-full wp-image-91141" title="ris1" src="https://web.snauka.ru/wp-content/uploads/2020/01/ris1.png" alt="" width="486" height="327" /></p>
<div align="center"><strong><span style=" Arial;  medium;">Fig. 1</span></strong><span style=" Arial;  medium;"> 1024-point FFT spectrogram of jazz music fragment</span></div>
<p><span style=" Arial;  medium;">The frequency masking for this spectrogram by the sampling rate of 32 kHz only can be applied at the range up to the red bar in Fig 1 i.e. 9.0 kHz. For the other two sampling rates, these are 12.5 kHz (green bar) and 13.5 kHz (blue bar) accordingly. Fortunately, the average human hearing is unable to perceive frequencies above 12 kHz [2]. This actually prevents the occurrence of audible artifacts at the top of the frequency range which is perceivable by average human hearing, though noticeable to sophisticated music listeners. At the same time, the criterion (5) would be met at all three sampling rates at all frequency range, when the power of FFT is equal to 2048 or more by the audio compression.</span></p>
<p>&nbsp;</p>
<p><strong><span style=" Arial;  medium;">Conclusions</span></strong></p>
<p><span style=" Arial;  medium;">As shown above, by using some psychoacoustics effects in audio compression systems it is necessary to take into account both spectral resolution and the resolution of auditory perception. In this regard, a specific frequency can be specified for each sampling rate, which actually determines the frequency range within which these effects may be reasonably applied without noticeable distortion of the sound information. Hereby, the crucial thing is to specify a correct relation between the masking area, on the one hand, and the spectral resolution, on the other. In audio compression systems, the spectral resolution is determined under consideration both the sampling rate and the power of FFT. It is shown that with the use of the 1024-point FFT the frequency masking reasonably works just within the frequency range which is perceived by average human hearing. However, from the point of view of music expert requirements, the use of e.g. 2048-point FFT is more preferable.</span></p>
]]></content:encoded>
			<wfw:commentRss>https://web.snauka.ru/en/issues/2020/01/91139/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
