Backpropagation Neural Network for Book Classification Using the Image Cover

I Putu Budhi Darma Purwanta(1*), Cyprianus Kuntoro Adi(2), Ni Putu Novita Puspa Dewi(3),

(1) Universitas Gadjah Mada
(2) Universitas Sanata Dharma
(3) Universitas Sanata Dharma
(*) Corresponding Author

Abstract


Artificial Neural Networks are known to provide a good model for
classification. The goal of this research is to classify books in Bahasa (Bahasa Indonesia) using its cover. The data is in the form of scanned images, each with the size of 300 cm height, 130 cm width, and 96 dpi image resolution the research conducted features extraction using image processing method, MSER (Maximally Stable Externally Regions) to identify the area of book title, and Tesseract Optical Character Recognition (OCR) to detect the title. Next, features extracted from MSER and OCR are converted into a numerical matrix as the input to the Backpropagation Artificial Neural Network. The accuracy obtained using one hidden layer and 15 neurons is 63.31%. Meanwhile, the evaluation using 2 hidden layers with a combination of 15 and 35 neurons resulted in accuracy of 79.89%. The ability of the model to classify the book was affected by the image quality, variation, and number of training data.

Full Text:

PDF

References


B. K. Iwana, S. T. R. Rizvi, S. Ahmed, A. Dengel, and S. Uchida, Judging a book by its cover, 2016. https://arxiv.org/abs/1610.09204.

M. T. Hagan and M. H. Beale, Neural Network Design, 2nd ed. Oklahoma, Martin Hagan, 2014.

T. Mandl, Tolerant information retrieval with backpropagation networks, Neural Comput. Appl., 9(4) (2000) 280289.

https://doi.org/10.1007/s005210070005.

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, L. Van Gool, A comparison of affine region detectors, Int. J. Comput. Vis., 65(1-2) (2005) 43-72. https://doi.org/10.1007/s11263-005-3848-x.

J. Matas, O. Chum, M. Urban, and T. Pajdla, Robust wide baseline stereo from maximally stable extremal regions, Image and Vision Computing 22(10) (2004) 761-767. https://doi.org/10.1016/j.imavis.2004.02.006.

D. Nistr and S. Henrik, Linear time maximally stable extremal regions, Proceedings of the 10th European Conference on Computer Vision, EECV 2008, pp. 183196. https://doi.org/10.1007/978-3-540-88688-4_14.

X. Shen, G. Hua, L. Williams, & Y. Wu, Dynamic Hand Gesture Recognition?: An exemplar-based approach from motion divergence fields, Image and Vision Computing, 30(3) (2012) 227235. https://doi.org/10.1016/j.imavis.2011.11.003.

Q. Zhang, Y. Wang, and L. Wang, Registration of images with affine geometric distortion based on maximally stable extremal regions and phase congruency, Image and Vision Computing, 36 (2015) 2339, https://doi.org/10.1016/j.imavis.2015.01.008.

O. Nobuyuki, A threshold selection method from gray-level histograms, IEEE Trans. Syst. Man. Cybern., 9(1) (1979) 6266. https://doi.org/10.1109/TSMC.1979.4310076.

H. Chen, S. S. Tsai, G. Schroth, D. M. Chen, R. Grzeszczuk, and B. Girod, Robust text detection in natural images with edge-enhanced maximally stable extremal regions, Proceedings 18th IEEE International Conference on Image Processing, Brussels, 2011, pp. 2609-2612. https://doi.org/10.1109/ICIP.2011.6116200.

M. R. Islam, C. Mondal, M. K. Azam, and A. S. M. J. Islam, Text detection and recognition using enhanced MSER detection and a novel OCR technique, Proceedings of the 5th International Conference on Informatics, Electronics and Vision, ICIEV 2016, Dhaka, 2016, pp. 15-20. https://doi.org/10.1109/ICIEV.2016.7760054.

Z. Zhang, K. Qi, K. Chen, C. Li, J. Chen, and H. Guan, A novel system for robust text location and recognition of book covers, Lecture Notes of Computer Science Part II, LNCS 5995, 9th ed., H. Zha, R. Taniguchi, and S. Maybank, Eds. Berlin, Heidelberg: Springer, 2010, pp. 608617. https://doi.org/10.1007/978-3-642-12304-7_57.

R. Smith, An overview of the tesseract OCR engine, Proceedings of the 9th International Conference on Document Analysis and Recognition, ICDAR 2007, Parana, 2007, pp. 629-633. https://doi.org/10.1109/ICDAR.2007.4376991.

R. Smith, D. Antonova, and D. Lee, Adapting the tesseract open source OCR engine for multilingual OCR, Proceedings of the International Workshop on Multilingual OCR, 2009, vol. 1, no. 1, pp. 18. http://doi.acm.org/10/1145/1577802.1577804.

Tesseract Open-Source OCR, Google Open Source. https://opensource.google.com/projects/tesseract, 2018 (accessed 20 May 2018).

E. Aganj, P. Monasse, and R. Keriven, Multi-view texturing of imprecise mesh, Proceedings of the Asian Conference on Computer Vision, ACCV 2009, Part II, LNCS 5995, 2009, pp. 468476. https://doi.org/10.1007/978-3-642-12304-7_44.

C. D. Manning, P. Raghavan, and H. Schutze, An Introduction to Information Retrieval, Cambridge, United Kingdom, Cambridge University Press, 2009.

M. Mustakim, Seri Penyuluhan Bahasa Indonesia: Bentuk dan Pilihan Kata. Jakarta, Pemasyarakatan Pusat Pembinaan dan Bahasa, Badan Pengembangan dan Pembinaan Kementerian Pendidikan dan Kebudayaan, 2014.




DOI: https://doi.org/10.24071/ijasst.v2i2.2653

Refbacks

  • There are currently no refbacks.









Publisher : Faculty of Science and Technology

Society/Institution : Sanata Dharma University

 

 

 

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.