Scientific Papers Keywords Categorization
- Tech Stack:
- Language: Python
- Libraries: Selenium, Pandas, FastAI, Blurr, HuggingFace, ONNX, Flask
- IDE: VS Code
- Notebook: Google Colab
- Website Integration: Link
- Github URL: Project Link
- Video Demonstration: Link
I decided to perform a multi-label classification task. The main goal is to categorize some optimized keywords based on papers abstract to show the paper to the relatable users. That's why I scraped IEEE. I fetched papers abstract and the keywords. Then I performed data cleaning and data preprocessing parts to feed transformers. I choose BERT, DistilBERT and RoBERTa as the learners. After haivng trained, the BERT showed a better performance than others. As the model was larger in size, I compressed the model using ONNX and it costs slightly decreasing performancce. I deployed the model to huggingface. Lastly, I created a website from scratch using flask and integrated the model api.