DeepHack.Q&A

Baseline solution

The described solution was proposed by 5vision team. It requires Ubuntu with python 2.7 and a banch of libraries (see the code for details), but you can run it on Windows too.

First of all we recommend to install Anaconda: https://www.continuum.io/downloads
Then you should install nltk and wikipedia:
http://www.nltk.org/install.html

https://pypi.python.org/pypi/wikipedia
The logic for installing other missing libraries is similar.

After that follow this steps:
1. Download the solution from https://github.com/5vision/kaggle_allen
2. Download the validation set from https://www.kaggle.com/c/the-allen-ai-science-chal...
3. Unzip files
4. Put validation_set.tsv into kaggle_allen-master/data/
5. Run fetch_glove_data.sh (it will take some time to download and unzip data)
$ sh fetch_glove_data.sh
6. Run glove_predict.py
$ python glove_predict.py

You will get - prediction.csv and then you can submit this file and get around 0.31875

7. Run ck12_wiki_predict.py (you need to set get_data 1 only for the first time)
$ python ck12_wiki_predict.py --get_data 1
8. Maybe you will need to rerun ck12_wiki_predict.py with --get_data 0

You will get - prediction.csv and then you can submit this file and get around 0.35375

Remember, that such steps can be easily made by all other participants, but we will invite only TOP-50 from the registered candidates based on their results in the original kaggle leaderboard.
Therefore, it is a good idea to improve these solutions or to make a new one with a better score.

Good luck in the qualification round!
Made on
Tilda