Dataset and Evaluation
Training Corpus
To develop your multilingual author profiling software, we provide you a training corpus, which consists of multilingual (Roman Urdu and English) SMS based author profiles (or documents). For gender, a multilingual author profile may belong to either Male or Female class. With regard to age, a multilingual author profile may fall into one of the three categories: 15-19, 20-24, 25-xx.
Gender and age information is associated with each multilingual author profile (see truth file: truth.txt, which is provided with the training corpus). All author profiles in the training corpus are stored in the .txt format to make the task easier.
Test Corpus
Once you finish tuning your multilingual author profiling software to achieve the satisfying performance on the training corpus, you should run your software on the test corpus and submit the output of your software as described below.
Performance Measure
The performance of your multilingual author profiling system will be measured using Accuracy evaluation measure.
Submission of Software and Output
You need to submit the following:
- Multilingual Author Profiling Software
- We ask you to prepare your software so that it can be executed generically for both age and gender. To maximize the sustainability of software submissions for this task, we encourage you to prepare your software so it can be re-trained on demand, i.e., by offering two commands, one for training, and one for testing. This way, your software can be reused on future evaluation corpora.
- The training command shall take as input, i) an absolute path to a training corpus, and ii) an absolute path to an empty output directory. Based on the training corpus, your software shall train a classification model, and save the trained model to the specified output directory (optional).
- The testing command shall take as input, i) an absolute path to a test corpus ii) an absolute path to a previously trained classification model (optional), and iii) an absolute path to an empty output directory. Based on the classification model, the software shall classify each case found in the test corpus and write an output in csv file(s).
- When your software is submission-ready, please mail your software along with output (csv) files. Email: maponsms@gmail.com
Note: By submitting your software you retain full copyrights. You agree to grant us usage rights only for the purpose of the FIRE'18-MAPonSMS competition. We agree not to share your software with a third party or use it for other purposes than the FIRE'18-MAPonSMS competition.
- Output Files
You can use your approach developed on the training corpus, to get predictions on the test corpus. Your output submission should be two CSV files in the following format:
Gender Prediction |
Test_Author_Profile_Id | Gender |
Test-Document-001 | male |
Test-Document-002 | female |
Test-Document-003 | male |
Test-Document-004 | male |
Test-Document-005 | female |
Age Prediction |
Test_Author_Profile_Id | Age |
Test-Document-001 | 15-19 |
Test-Document-002 | 20-24 |
Test-Document-003 | 25-xx |
Test-Document-004 | 15-19 |
Test-Document-005 | 25-xx |
The sample submission files for Gender and Age can be downloaded by clicking the buttons below.
- Working Notes
All the participants have to email their working note paper describing their approach used to multilingual author profiling on SMS messages. Email: maponsms@gmail.com
Total 19 teams registered with us for the task, while 9 submitted their systems. The following table lists the performances achieved by the participating teams:
Multilingual Author Profiling on SMS Performance (Accuracy) |
Participants | Gender | Age | Joint |
Sharmila Devi, Subramanian Kannimuthu, G. Safeeq, and Anand Kumar
Karpagam College of Engineering, India. | 0.87 | 0.65 | 0.57 |
D. Thenmozhi and Chandrabose Aravindan
Sri Sivasubramaniya Nadar College of Engineering, India. | 0.85 | 0.63 | 0.52 |
Ali Nemati
The University of Washington Tacoma, USA. | 0.83 | 0.60 | 0.49 |
Deepanshu Gaur
Maharaja Agrasen Institute of Technology (IPU), India. | 0.75 | 0.64 | 0.47 |
Dijana Kosmajac and Vlado Keselj
Dalhousie University, Canada. | 0.74 | 0.59 | 0.43 |
Òscar Garibo
Optical Tech & Support, Spain. | 0.77 | 0.57 | 0.43 |
Ramsha Imran* and Muntaha Iqbal°
*CUI, Lahore campus, Pakistan and °Virtual University Lahore, Pakistan. | 0.73 | 0.53 | 0.38 |
Asmara Safdar, Osama Akhter, Osama Inayat, and Naeem Hassan
CUI, Lahore campus, Pakistan. | 0.69 | 0.53 | 0.35 |
Baseline | 0.60 | 0.51 | 0.32 |
Abdul Sittar and Iqra Ameer
CUI, Lahore campus, Pakistan. | 0.55 | 0.37 | 0.23 |
A more detailed analysis of the detection performances can be found in the overview paper accompanying this task. (available on 01-10-2018)