Pembentukan Tesaurus pada Cross-Lingual Text dengan Pendekatan Constraint Satisfaction Problem

Umy Rizqi, Chastine Fatichah, Diana Purwitasari
Submission Date: 2017-07-23 18:08:15
Accepted Date: 2018-01-09 21:27:31

Abstract


Dokumen tugas akhir dan tesis sering kali disediakan dalam dua bahasa, yaitu bahasa Indonesia dan Inggris. Dalam pencarian, setiap mahasiswa memiliki kecenderungan mencari dokumen dengan menggunakan kata kunci dengan bahasa tertentu. Tujuan dari penelitian ini adalah untuk membangun cross-lingual tesaurus bahasa Indonesia dan bahasa Inggris dengan pendekatan Constraint Satisfaction Problem. Dalam penelitian ini digunakan data Tugas Akhir serta Tesis mahasiswa Institut Teknologi Sepuluh Nopember.

Pada pengolahan dokumen dilakukan beberapa langkah yaitu pembentukan pararell corpus, ekstraksi kata, pembobotan kata, dan pembentukan informasi co-occurrence, yang selanjutnya dilakukan Constraint Satisfaction Problem dengan backtracking sebagai solusi pencarian. Pembobotan menggunakan TF-IDF (term frequency–inverse document frequency)

Hasil dari proses pembangunan tesaurus, tesaurus yang dibentuk dengan menggunakan CSP menghasilkan precision 91,38% sedangkan tesaurus yang dibentuk tanpa menggunakan CSP menghasilkan precision 45,23%. Pencarian dokumen menggunakan tesaurus menghasilkan recall 86,67%,  precision 100% dan akurasi 86,67%.


Keywords


Backtracking; Co-occurrence; Constraint Satisfaction Problem; Cross-lingual; Tesaurus

References


D. W. Oard, “Alternative approaches for cross-language text retrieval,” in AAAI Symposium on Cross-Language Text and Speech Retrieval. American Association for Artificial Intelligence, 1997, vol. 16.

K. Wolk, “Noisy-parallel and comparable corpora filtering methodology for the extraction of bi-lingual equivalent data at sentence level,” ArXiv Prepr. ArXiv151004500, 2015.

N. Bel, C. H. Koster, and M. Villegas, “Cross-lingual text categorization,” in International Conference on Theory and Practice of Digital Libraries, 2003, pp. 126–139.

“HOWTO Fetch Internet Resources Using urllib2 — Python 2.7.13 documentation.” [Online]. Available: https://docs.python.org/2/howto/urllib2.html. [Accessed: 07-Jul-2017].

C. C. Yang, C.-P. Wei, and K. W. Li, “Cross-lingual thesaurus for multilingual knowledge management,” Decis. Support Syst., vol. 45, no. 3, pp. 596–605, 2008.

C. C. Yang and K. W. Li, “An associate constraint network approach to extract multi-lingual information for crime analysis,” Decis. Support Syst., vol. 43, no. 4, pp. 1348–1361, 2007.


Full Text: PDF

CC Licencing


Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).

Refbacks

  • There are currently no refbacks.


Creative Commons License
Jurnal Teknik ITS by Lembaga Penelitian dan Pengabdian Kepada Masyarakat, LPPM-ITS is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at http://ejurnal.its.ac.id/index.php/teknik.
Statistik Pengunjung