Telegram Web Link
Unpopular opinion: NLP as a field is moving at a break-neck speed how to get a good hang of it? many asked. While Github Repos, MOOCs, videos, papers, datasets, code, and medium posts are good sources, you can't chase millions of them. The trick is to nail the basics down and accelerate your learning speed. It really pays to get your basics right.
I recommend this brilliant book.

I got all my Neural-NLP basics and advanced lessons from this book, still use it as a reference. (By all means, add other resources, if a particular topic is missing. ). The book is pricey but fortunately, I have a free PDF for you.
DataSpoof pinned a photo
goldberg_2017_book_draft.pdf
6.5 MB
goldberg_2017_book_draft.pdf
Today you can learn from the best people in your field at no cost

I especially enjoy books written by practitioners: their books are fun to read, useful, and contain no fluff.

Here are 3 great ones you can get for free:

FastAI deep learning framework by Jeremy Howard
https://github.com/fastai/fastbook

Approaching almost any machine learning problem ebook

Pdf link- https://github.com/abhishekkrthakur/approachingalmost

Bayesian method for hackers
Pdf link
https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers
DataSpoof pinned «Today you can learn from the best people in your field at no cost I especially enjoy books written by practitioners: their books are fun to read, useful, and contain no fluff. Here are 3 great ones you can get for free: FastAI deep learning framework…»
All algorithm related to machinelearning and deep learning
Feature Scaling is one of the most useful and necessary transformations to perform on a training dataset, since with very few exceptions, ML algorithms do not fit well to datasets with attributes that have very different scales.

Let's talk about it 🧵

There are 2 very effective techniques to transform all the attributes of a dataset to the same scale, which are:
▪️ Normalization
▪️ Standardization

The 2 techniques perform the same task, but in different ways. Moreover, each one has its strengths and weaknesses.

Normalization (min-max scaling) is very simple: values are shifted and rescaled to be in the range of 0 and 1.

This is achieved by subtracting each value by the min value and dividing the result by the difference between the max and min value.

In contrast, Standardization first subtracts the mean value (so that the values always have zero mean) and then divides the result by the standard deviation (so that the resulting distribution has unit variance).

More about them:
▪️Standardization doesn't frame the data between the range 0-1, which is undesirable for some algorithms.
▪️Standardization is robust to outliers.
▪️Normalization is sensitive to outliers. A very large value may squash the other values in the range 0.0-0.2.

Both algorithms are implemented in the Scikit-learn Python library and are very easy to use. Check below Google Colab code with a toy example, where you can see how each technique works.

https://colab.research.google.com/drive/1DsvTezhnwfS7bPAeHHHHLHzcZTvjBzLc?usp=sharing

Check below spreadsheet, where you can see another example, step by step, of how to normalize and standardize your data.

https://docs.google.com/spreadsheets/d/14GsqJxrulv2CBW_XyNUGoA-f9l-6iKuZLJMcc2_5tZM/edit?usp=drivesdk

Well, the real benefit of feature scaling is when you want to train a model from a dataset with many features (e.g., m > 10) and these features have very different scales (different orders of magnitude). For NN this preprocessing is key.

Enable gradient descent to converge faster
DataSpoof pinned «Feature Scaling is one of the most useful and necessary transformations to perform on a training dataset, since with very few exceptions, ML algorithms do not fit well to datasets with attributes that have very different scales. Let's talk about it 🧵 There…»
Make sure to follow us on our instagram page. Where we post each topics I carousel post

www.instagram.com/dataspoof
TLA: Twitter Linguistic Analysis - TLA is built using PyTorch, Transformers and several other state of the art machine learning techniques and it aims to expedite and structure the cumbersome process of collecting, labeling, and analyzing data from Twitter for a corpus of languages while providing detailed labeled datasets for all the languages.


$ pip install TLAF


https://github.com/tusharsarkar3/TLA
2025/07/14 06:19:48
Back to Top
HTML Embed Code: