Frequency Distribution Fitting for Electronic Documents

Arockia David Roy Kulandai(1*),

(1) Marquette University & St. Xavier's College, Ahmedabad
(*) Corresponding Author

Abstract


Studies of frequency distributions of natural language elements have identified some distributions that offer a good fit. Using electronic documents, we show that some of these distributions cannot be used to model the frequency of bytes in electronic documents even if these documents represent natural language documents.


Full Text:

PDF

References


R. Flesch. “A new readability yardstick.” Journal of Applied Psychology, 32 (3), 221, 1948.

K. H. Best. “Sind Wort-und Satzlänge brauchbare Kriterien zur Bestimmung der Lesbarkeit von Texten? In: Wichter, Sigurd/Busch, Albert (eds.) Wissenstransfer Erfolgskontrolle und Rückmeldungen aus der Praxis.” Peter Lang Verl, Frankfurt, 2006.

A. D. R. Kulandai and T. Schwarz. “Content-Aware Reduction of Bit Flips in Phase Change Memory.” IEEE Letters of the Computer Society, 2020.

B. Krevitt and B. Griffith. “A Comparison of Several Zipf-Type Distributions in Their Goodness of Fit to Language Data.” Journal of the American Society for Information Science, 23 (3), 220, 1972.

W. Li and P. Miramontes. “Fitting Ranked English and Spanish Letter Distribution in U.S and Mexican Presidential Speeches.” Journal of Quantitative Linguistics, 18 (4), 359–380, 2011.

C. Manning and H. Schütze. “Foundations of Statistical Natural Language Processing.” MIT Press, 2003.

H. Pande and H.S. Dhami. “Mathematical Modelling of Occurrence of Letters and Word's Initials in Texts of Hindi Language.” SKASE Journal of Theoretical Linguistics, 7 (2), 2010.




DOI: https://doi.org/10.24071/ijasst.v3i1.2854

Refbacks

  • There are currently no refbacks.









Publisher : Faculty of Science and Technology

Society/Institution : Sanata Dharma University

 

 

 

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.