SOCR LetterFrequencyData

From Socr

Revision as of 18:43, 31 May 2010 by IvoDinov (Talk | contribs)
(diff) ← Older revision | Current revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

SOCR Data - Latin Letters Frequency Distributions in Different Languages

Data Description

The data table below present the average frequencies of the 26 most common Latin letters for different languages. Letter frequencies in text are studied in cryptography. The exact letter frequency distribution underling a given language is unknown and varies with time, since all writers tend to write slightly differently and are affected by their culture. Modern International Morse code encodes the most frequent letters with the shortest symbols; arranging the Morse alphabet into groups of letters that require equal amounts of time to transmit, and then sorting these groups in increasing order. Similar ideas are used in modern data-compression techniques such as Huffman coding.

Letter frequencies, like word frequencies, tend to vary by writer, subject and language. Accurate average letter frequencies are obtained by analyzing large amounts of representative text.

Sources

Data Table

Letter English French German Spanish Portuguese Esperanto Italian Turkish Swedish Polish Toki_Pona Dutch Avgerage
a 0.08 0.08 0.07 0.13 0.15 0.12 0.12 0.12 0.09 0.08 0.17 0.07 0.11
b 0.01 0.01 0.02 0.01 0.01 0.01 0.01 0.03 0.01 0.01 0.00 0.02 0.01
c 0.03 0.03 0.03 0.05 0.04 0.01 0.05 0.01 0.01 0.04 0.00 0.01 0.03
d 0.04 0.04 0.05 0.06 0.05 0.03 0.04 0.05 0.05 0.03 0.00 0.06 0.04
e 0.13 0.15 0.17 0.14 0.13 0.09 0.12 0.09 0.10 0.07 0.07 0.19 0.12
f 0.02 0.01 0.02 0.01 0.01 0.01 0.01 0.00 0.02 0.00 0.00 0.01 0.01
g 0.02 0.01 0.03 0.01 0.01 0.01 0.02 0.01 0.03 0.01 0.00 0.03 0.02
h 0.06 0.01 0.05 0.01 0.01 0.00 0.02 0.01 0.02 0.01 0.00 0.02 0.02
i 0.07 0.08 0.08 0.06 0.06 0.10 0.11 0.08 0.05 0.07 0.15 0.07 0.08
j 0.00 0.01 0.00 0.00 0.00 0.04 0.00 0.00 0.01 0.02 0.03 0.01 0.01
k 0.01 0.00 0.01 0.00 0.00 0.04 0.00 0.05 0.03 0.03 0.05 0.02 0.02
l 0.04 0.05 0.03 0.05 0.03 0.06 0.07 0.06 0.05 0.03 0.10 0.04 0.05
m 0.02 0.03 0.03 0.03 0.05 0.03 0.03 0.04 0.04 0.02 0.04 0.02 0.03
n 0.07 0.07 0.10 0.07 0.05 0.08 0.07 0.07 0.09 0.05 0.12 0.10 0.08
o 0.08 0.05 0.03 0.09 0.11 0.09 0.10 0.02 0.04 0.07 0.08 0.06 0.07
p 0.02 0.03 0.01 0.03 0.03 0.03 0.03 0.01 0.02 0.02 0.04 0.02 0.02
q 0.00 0.01 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00
r 0.06 0.07 0.07 0.07 0.07 0.06 0.06 0.07 0.08 0.04 0.00 0.06 0.06
s 0.06 0.08 0.07 0.08 0.08 0.06 0.05 0.03 0.06 0.04 0.04 0.04 0.06
t 0.09 0.07 0.06 0.05 0.05 0.05 0.06 0.03 0.09 0.02 0.05 0.07 0.06
u 0.03 0.06 0.04 0.04 0.05 0.03 0.03 0.03 0.02 0.02 0.03 0.02 0.03
v 0.01 0.02 0.01 0.01 0.02 0.02 0.02 0.01 0.02 0.00 0.00 0.03 0.01
w 0.02 0.00 0.02 0.00 0.00 0.00 0.00 0.00 0.00 0.04 0.03 0.02 0.01
x 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
y 0.02 0.00 0.00 0.01 0.00 0.00 0.00 0.03 0.01 0.03 0.00 0.00 0.01
z 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.02 0.00 0.05 0.00 0.01 0.01
Others 0 0.03 0 0 0 0.02 0 0.12 0.06 0.2 0 0 0.04

Graphs

  • Histogram (HistogramChartDemo7) of the English letters





Translate this page:

(default)

Deutsch

Español

Français

Italiano

Português

日本語

България

الامارات العربية المتحدة

Suomi

इस भाषा में

Norge

한국어

中文

繁体中文

Русский

Nederlands

Ελληνικά

Hrvatska

Česká republika

Danmark

Polska

România

Sverige

Personal tools