๐ญ Computer Science MSc candidate researching NLP @HUJI (publications), former Data Scientist @PayPal & Research Assistant @AI2
๐ฌ Feel free to ask me about any of my repos, I love getting messages about my work!
โ If my code or my notes helped you, you can buy me a coffee if you'd like
I have collected all of the detailed notes I wrote during my studies at HebrewU as well as courses I studied independently, and published them as a part of my goal to make Data Science & NLP topics more accessible to Hebrew speakers.
This colelction contains detailed notes in Hebrew on subjects such as Math (Calculus, Linear Algebra, Probability, Discrete Math), foundations of Computer Science (Data Structures, Algorithms, Complexity), as well as advanced Data Science (Machine Learning, NLP).
This includes my recent detailed notes (90 pages) for Stanford's CS224N (NLP with DL) course, that gained more than 1K likes across Israeli DS & ML communities.
Recently I decided to share my private Notion hub where I organize all of my NLP knowledge (mostly in Hebrew). This hub is meant for my personal use, but since many people found it useful I decided to share it. It conatains some notes by topic (such as NLP tasks, architectures or uses) that has significant ovarlaps with my CS224N notes, as well as noted I wrote for a few dozens NLP papers I have read in the past year for my studies and my reasearch.
I shared my simple-but-useful system for queueing and reviewing papers I read (or plan to read). This Notion template is free to use, and also contains tips on how to personalize it to work for your needs.
Corpify (2023)
In this project, we introduce the novel NLP task of corpy textual style-transfer, which involves the transformation of casual English text into a style suited for a professional workplace setting. We constructed an original parallel corpus comprising 634 sentences in casual English and their corporate-style paraphrases.
This project includes the dataset itself, the code for fine-tuning the style transfer models, 2 of the best performing fine-tuned models, and code for fine-tuning a style detection model for detecting corpy style in text.
Methods used in this NLP projects: Textual style transfer, text classification.
Knesset Topic Classification (2022)
An independant multi-phase NLP project for classifying parlemintary quotes in Hebrew into 8 topics. Also includes the annotated dataset.
In this project, I started with a raw dataset of quotes (in Hebrew) gathered from protocoles of the Knesset (the Israeli parliment). In the first stage of the project, I used unsupervised topic modeling methods in order to cluster quotes by topics. The topic assignment that was created during the first stage were used to prioritize qoutes for manual tagging process - quotes with the highest confidence score were sent to mannual tagging. This process created ~2,700 quotes that were manually tagged into 8 topics (in addition to a "no topic" tag). Then, in the second phase of this project, I trained a supervised classifier to predict quotes topics.
Methods used in this NLP projects: Topic modeling (unsupervised), Topic classification (supervised).
PickUsLunch (2022)
AI assistant that helps groups of friends or co-workers find a restaurant to order from together, that best matches the group members' dining preferences.
In this project, we used restaurants menus gathered via Wolt's API and created a smart system that helps groups of friends or co-workers find a single restaurant that matches everyone's needs and preferences (such as vegeterianism, price limits, prefered cuisines etc). We examined several different algorithms (neither are ML-based), all of them provided solutions who were incredibly close to the optimal solution (that could be found by iterating over the entire 30M combinations dataset) in a fraction of the time (up to 11K times faster)!
Methods used in this AI projects: local search, genetic algorithms.