Scientific Papers Keywords Categorization

  • Tech Stack:
    • Language: Python
    • Libraries: Selenium, Pandas, FastAI, Blurr, HuggingFace, ONNX, Flask
    • IDE: VS Code
    • Notebook: Google Colab
  • Website Integration: Link
  • Github URL: Project Link
  • Video Demonstration: Link

I decided to perform a multi-label classification task. The main goal is to categorize some optimized keywords based on papers abstract to show the paper to the relatable users. That's why I scraped IEEE. I fetched papers abstract and the keywords. Then I performed data cleaning and data preprocessing parts to feed transformers. I choose BERT, DistilBERT and RoBERTa as the learners. After haivng trained, the BERT showed a better performance than others. As the model was larger in size, I compressed the model using ONNX and it costs slightly decreasing performancce. I deployed the model to huggingface. Lastly, I created a website from scratch using flask and integrated the model api.