top of page
Search

Fake Job Posting Detection

  • Writer: Aunsh Arekar
    Aunsh Arekar
  • Apr 29, 2022
  • 3 min read

Updated: Apr 29, 2022

Overview


Here we are building a system that is able to detect real job postings from fake job postings. Every so often, job aspirants face issues of having applied to a job seeing its posting on websites, eventually turning out to be fake or fraudulent postings. As a result of which these aspirants end up getting scammed. In order to avoid this we are developing a system that will be able to overcome this problem and thus benefit the job aspirants.


Algorithm Used


In order to build this system we will be making use of the Naive Bayes Classifier. This classifier works on the concept of conditional probabilities of events. The Naive Bayes classifier works on a formula which is given by fig(a) below:


fig(a): Naive Bayes formula


So as we can see, the terms P(class|data) and P(data|class) are the conditional probabilities which can be said as probability of 'class' given 'data' and vice versa.

Now that we have a basic understanding of the algorithm's concept and the formula that it is based on, we can move on with the actual implementation of it.


Step-by-step Implementation


Now let's take a look at the dataset first.


fig(b): Dataset


From this data set we can observe that it has columns having various details of the job postings such as the title, location, department, requirements, whether the job is real or fraudulent which is represented by either 0 or 1, 0 being real and 1 being fraudulent, so on and so forth.


Combining all text data in one column

Now to get started, we first combine all the columns with text in them into a single column called 'combo text' and also make another column with called 'combo text length' having the length of text of this new column as shown below:


fig(c): combined text column with length

We also dropped all the columns that we just combined since they are no longer relevant to us.


Cleaning the data


Next we make a function used to clean the combined text column for better handing and mathematical calculations. This function splits the lines into single words, makes them lower case, removes the stopwords, removes special characters and alphanumeric characters. This is shown in fig(d)


fig(d): cleaned data column


As we can see, the combo text column is cleaned using the text cleaning function that we have made.


A few things that we do for data visualization such as getting the word cloud for the real job postings and the fake ones. This is as shown below in fig(e) and fig(f):


fig(e) : Real Jobs Word Cloud



fig(e) : Fake Jobs Word Cloud


Train Test Split


Now we perform the train test split as usual. We take the combined text column as X and the deciding 'Fraudulent' column as Y.


fig(f): Train Test Split


TF-IDF


Next we do the TF-IDF Vectorization. TF-IDF stands for term frequency Inverse document frequency. It is a measure that is basically used to quantify the importance or relevance of a string in a big chunk of strings. We do this for both, train and test dataset as shown below:


fig(g): TF-IDF


Applying Naive Bayes


Finally we apply the Naive Bayes formula on our training dataset. We make use of the library from sklearn of Naive Bayes to make use of the MultinomialNB function in it. This is what performs the actual Naive Bayes and gets the predictions. This is as shown below:


fig(h): Applying Naive Bayes


From fig(h) we can see that after applying Naive Bayes, we get an accuracy of about 95%


For further evaluation of the model, we have also got the classification report which gives us values such as precision, support, f-1 score and recall. This is as shown below:


fig(i): Classification report


Contribution


The main part came in the data pre-processing stage of particularly cleaning the data since that is what improved the accuracy. So the data cleaning was made more refined by adding more parameters to it such as removal of stop words which made the data more precise to what we wanted.


References



Source Code



Youtube video link


https://youtu.be/YxwU2zOSmyI





 
 
 

Comments


  • Facebook
  • Twitter
  • Instagram

Inner Pieces

123-456-7890

info@mysite.com

© 2023 by Inner Pieces.

Proudly created with Wix.com

Contact

Ask me anything

Thanks for submitting!

bottom of page