Frequency Distribution Fitting for Electronic Documents
(1) Marquette University & St. Xavier's College, Ahmedabad
(*) Corresponding Author
Abstract
Studies of frequency distributions of natural language elements have identified some distributions that offer a good fit. Using electronic documents, we show that some of these distributions cannot be used to model the frequency of bytes in electronic documents even if these documents represent natural language documents.
Full Text:
PDFReferences
R. Flesch. “A new readability yardstick.” Journal of Applied Psychology, 32 (3), 221, 1948.
K. H. Best. “Sind Wort-und Satzlänge brauchbare Kriterien zur Bestimmung der Lesbarkeit von Texten? In: Wichter, Sigurd/Busch, Albert (eds.) Wissenstransfer Erfolgskontrolle und Rückmeldungen aus der Praxis.” Peter Lang Verl, Frankfurt, 2006.
A. D. R. Kulandai and T. Schwarz. “Content-Aware Reduction of Bit Flips in Phase Change Memory.” IEEE Letters of the Computer Society, 2020.
B. Krevitt and B. Griffith. “A Comparison of Several Zipf-Type Distributions in Their Goodness of Fit to Language Data.” Journal of the American Society for Information Science, 23 (3), 220, 1972.
W. Li and P. Miramontes. “Fitting Ranked English and Spanish Letter Distribution in U.S and Mexican Presidential Speeches.” Journal of Quantitative Linguistics, 18 (4), 359–380, 2011.
C. Manning and H. Schütze. “Foundations of Statistical Natural Language Processing.” MIT Press, 2003.
H. Pande and H.S. Dhami. “Mathematical Modelling of Occurrence of Letters and Word's Initials in Texts of Hindi Language.” SKASE Journal of Theoretical Linguistics, 7 (2), 2010.
DOI: https://doi.org/10.24071/ijasst.v3i1.2854
Refbacks
- There are currently no refbacks.
Publisher : Faculty of Science and Technology
Society/Institution : Sanata Dharma University
This work is licensed under a Creative Commons Attribution 4.0 International License.