FIRE'18 - MAPonSMS

Dataset and Evaluation

Training Corpus

To develop your multilingual author profiling software, we provide you a training corpus, which consists of multilingual (Roman Urdu and English) SMS based author profiles (or documents). For gender, a multilingual author profile may belong to either Male or Female class. With regard to age, a multilingual author profile may fall into one of the three categories: 15-19, 20-24, 25-xx.

Gender and age information is associated with each multilingual author profile (see truth file: truth.txt, which is provided with the training corpus). All author profiles in the training corpus are stored in the .txt format to make the task easier.

Test Corpus

Once you finish tuning your multilingual author profiling software to achieve the satisfying performance on the training corpus, you should run your software on the test corpus and submit the output of your software as described below.

Performance Measure

The performance of your multilingual author profiling system will be measured using Accuracy evaluation measure.

Submission of Software and Output

You need to submit the following:

Multilingual Author Profiling Software
1. We ask you to prepare your software so that it can be executed generically for both age and gender. To maximize the sustainability of software submissions for this task, we encourage you to prepare your software so it can be re-trained on demand, i.e., by offering two commands, one for training, and one for testing. This way, your software can be reused on future evaluation corpora.
2. The training command shall take as input, i) an absolute path to a training corpus, and ii) an absolute path to an empty output directory. Based on the training corpus, your software shall train a classification model, and save the trained model to the specified output directory (optional).
3. The testing command shall take as input, i) an absolute path to a test corpus ii) an absolute path to a previously trained classification model (optional), and iii) an absolute path to an empty output directory. Based on the classification model, the software shall classify each case found in the test corpus and write an output in csv file(s).
4. When your software is submission-ready, please mail your software along with output (csv) files. Email: maponsms@gmail.com
Output Files
You can use your approach developed on the training corpus, to get predictions on the test corpus. Your output submission should be two CSV files in the following format:

Gender Prediction
Test_Author_Profile_Id	Gender
Test-Document-001	male
Test-Document-002	female
Test-Document-003	male
Test-Document-004	male
Test-Document-005	female

Age Prediction
Test_Author_Profile_Id	Age
Test-Document-001	15-19
Test-Document-002	20-24
Test-Document-003	25-xx
Test-Document-004	15-19
Test-Document-005	25-xx

Working Notes
All the participants have to email their working note paper describing their approach used to multilingual author profiling on SMS messages. Email: maponsms@gmail.com

Results

Total 19 teams registered with us for the task, while 9 submitted their systems. The following table lists the performances achieved by the participating teams:

Multilingual Author Profiling on SMS Performance (Accuracy)
Participants	Gender	Age	Joint
Sharmila Devi, Subramanian Kannimuthu, G. Safeeq, and Anand Kumar Karpagam College of Engineering, India.	0.87	0.65	0.57
D. Thenmozhi and Chandrabose Aravindan Sri Sivasubramaniya Nadar College of Engineering, India.	0.85	0.63	0.52
Ali Nemati The University of Washington Tacoma, USA.	0.83	0.60	0.49
Deepanshu Gaur Maharaja Agrasen Institute of Technology (IPU), India.	0.75	0.64	0.47
Dijana Kosmajac and Vlado Keselj Dalhousie University, Canada.	0.74	0.59	0.43
Òscar Garibo Optical Tech & Support, Spain.	0.77	0.57	0.43
Ramsha Imran* and Muntaha Iqbal° *CUI, Lahore campus, Pakistan and °Virtual University Lahore, Pakistan.	0.73	0.53	0.38
Asmara Safdar, Osama Akhter, Osama Inayat, and Naeem Hassan CUI, Lahore campus, Pakistan.	0.69	0.53	0.35
Baseline	0.60	0.51	0.32
Abdul Sittar and Iqra Ameer CUI, Lahore campus, Pakistan.	0.55	0.37	0.23

A more detailed analysis of the detection performances can be found in the overview paper accompanying this task. (available on 01-10-2018)