Introduction to Recommender

Mainly there’s two types of recommender, collaborative and content-based recommender.

Collaborative filtering

analyse information on user’d behaviours and predict what users will like based on their similarity to other users algo: k-NN approach, Pearson Correlation info: explicit and implicit data suffers from cold start, scalability, sparsity matrix factorisation

Content-based filtering (personality-based approach)

based on description of item and profile of user’s preference, recommend similar items tf-idf representation (item representation algo) info: user’s preference, user’s history recommendation is narrow

Hybrid recommender systems

combine collaborative filtering and content-based filtering, might be more effective sometimes

Job Recommendation

with dataset from Careerbuilder

TOC

Import dependencies

In [1]:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np

Load dataset

In [2]:

!ls data/*.tsv
data/apps.tsv  data/test_users.tsv    data/users.tsv
data/jobs.tsv  data/user_history.tsv  data/window_dates.tsv
  • users
  • jobs
  • apps
  • users_history
  • test_users

In [3]:

users = pd.read_csv('data/users.tsv', sep='\t', encoding='utf-8')
jobs = pd.read_csv('data/jobs.tsv', sep='\t', encoding='utf-8', error_bad_lines=False)
apps = pd.read_csv('data/apps.tsv', sep='\t', encoding='utf-8')
user_history = pd.read_csv('data/user_history.tsv', sep='\t', encoding='utf-8')
test_users = pd.read_csv('data/test_users.tsv', sep='\t', encoding='utf-8')

# jobs = pd.read_csv('data/jobs.tsv', sep='\t')
b'Skipping line 122433: expected 11 fields, saw 12\n'
b'Skipping line 602576: expected 11 fields, saw 12\n'
b'Skipping line 990950: expected 11 fields, saw 12\n'
/home/hectoryee/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (8) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

In [4]:

users.head()
UserID WindowID Split City State Country ZipCode DegreeType Major GraduationDate WorkHistoryCount TotalYearsExperience CurrentlyEmployed ManagedOthers ManagedHowMany
0 47 1 Train Paramount CA US 90723 High School NaN 1999-06-01 00:00:00 3 10.0 Yes No 0
1 72 1 Train La Mesa CA US 91941 Master's Anthropology 2011-01-01 00:00:00 10 8.0 Yes No 0
2 80 1 Train Williamstown NJ US 08094 High School Not Applicable 1985-06-01 00:00:00 5 11.0 Yes Yes 5
3 98 1 Train Astoria NY US 11105 Master's Journalism 2007-05-01 00:00:00 3 3.0 Yes No 0
4 123 1 Train Baton Rouge LA US 70808 Bachelor's Agricultural Business 2011-05-01 00:00:00 1 9.0 Yes No 0

In [5]:

users.columns
Index(['UserID', 'WindowID', 'Split', 'City', 'State', 'Country', 'ZipCode',
       'DegreeType', 'Major', 'GraduationDate', 'WorkHistoryCount',
       'TotalYearsExperience', 'CurrentlyEmployed', 'ManagedOthers',
       'ManagedHowMany'],
      dtype='object')

In [6]:

users.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 389708 entries, 0 to 389707
Data columns (total 15 columns):
UserID                  389708 non-null int64
WindowID                389708 non-null int64
Split                   389708 non-null object
City                    389708 non-null object
State                   389218 non-null object
Country                 389708 non-null object
ZipCode                 387974 non-null object
DegreeType              389708 non-null object
Major                   292468 non-null object
GraduationDate          269477 non-null object
WorkHistoryCount        389708 non-null int64
TotalYearsExperience    375528 non-null float64
CurrentlyEmployed       347632 non-null object
ManagedOthers           389708 non-null object
ManagedHowMany          389708 non-null int64
dtypes: float64(1), int64(4), object(10)
memory usage: 44.6+ MB

In [7]:

jobs.replace('NaN',np.NaN)
jobs.head()
JobID WindowID Title Description Requirements City State Country Zip5 StartDate EndDate
0 1 1 Security Engineer/Technical Lead <p>Security Clearance Required:&nbsp; Top Secr... <p>SKILL SET</p>\r<p>&nbsp;</p>\r<p>Network Se... Washington DC US 20531 2012-03-07 13:17:01.643 2012-04-06 23:59:59
1 4 1 SAP Business Analyst / WM <strong>NO Corp. to Corp resumes&nbsp;are bein... <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... Charlotte NC US 28217 2012-03-21 02:03:44.137 2012-04-20 23:59:59
2 7 1 P/T HUMAN RESOURCES ASSISTANT <b> <b> P/T HUMAN RESOURCES ASSISTANT</b> <... Please refer to the Job Description to view th... Winter Park FL US 32792 2012-03-02 16:36:55.447 2012-04-01 23:59:59
3 8 1 Route Delivery Drivers CITY BEVERAGES Come to work for the best in th... Please refer to the Job Description to view th... Orlando FL US NaN 2012-03-03 09:01:10.077 2012-04-02 23:59:59
4 9 1 Housekeeping I make sure every part of their day is magica... Please refer to the Job Description to view th... Orlando FL US NaN 2012-03-03 09:01:11.88 2012-04-02 23:59:59

In [8]:

apps.head()
UserID WindowID Split ApplicationDate JobID
0 47 1 Train 2012-04-04 15:56:23.537 169528
1 47 1 Train 2012-04-06 01:03:00.003 284009
2 47 1 Train 2012-04-05 02:40:27.753 2121
3 47 1 Train 2012-04-05 02:37:02.673 848187
4 47 1 Train 2012-04-05 22:44:06.653 733748

In [9]:

user_history.head()
UserID WindowID Split Sequence JobTitle
0 47 1 Train 1 National Space Communication Programs-Special ...
1 47 1 Train 2 Detention Officer
2 47 1 Train 3 Passenger Screener, TSA
3 72 1 Train 1 Lecturer, Department of Anthropology
4 72 1 Train 2 Student Assistant

In [10]:

test_users.head()
UserID WindowID
0 767 1
1 769 1
2 861 1
3 1006 1
4 1192 1

EDA and Preprocessing

Splitting into Training and Testing dataset

with attribute split:

  • users
  • apps
  • user_history

users

In [11]:

users_training = users.loc[users['Split'] == 'Train']

In [12]:

users_testing = users.loc[users['Split'] == 'Test']

apps

In [13]:

apps_training = apps.loc[apps['Split'] == 'Train']

In [14]:

apps_testing = apps.loc[apps['Split'] == 'Test']

user_history

In [15]:

user_history_training = user_history.loc[user_history['Split'] == 'Train']

In [16]:

user_history_testing = user_history.loc[user_history['Split'] == 'Test']

Dataframes

  • users_training
  • users_testing
  • apps_training
  • apps_testing
  • user_history_training
  • user_history_testing

Preprocessing

  • Considering only US
  • Removing data with empty state

In [17]:

jobs_US = jobs.loc[jobs['Country'] == 'US']

In [18]:

jobs_US[['City','State','Country']]
City State Country
0 Washington DC US
1 Charlotte NC US
2 Winter Park FL US
3 Orlando FL US
4 Orlando FL US
5 Ormond Beach FL US
6 Orlando FL US
7 Orlando FL US
8 Orlando FL US
9 Winter Park FL US
10 Los Angeles CA US
11 Longwood FL US
12 Altamonte Springs FL US
13 Orlando FL US
14 Daytona Beach FL US
15 Oviedo FL US
16 Orlando FL US
17 Orlando FL US
18 Altamonte Springs FL US
19 Windermere FL US
20 Leesburg FL US
21 Orlando FL US
22 Orlando FL US
23 Orlando FL US
24 Orlando FL US
25 Orlando FL US
26 Orlando FL US
27 Longwood FL US
28 Apopka FL US
29 Orlando FL US
... ... ... ...
1091893 Yonkers NY US
1091894 Newark NJ US
1091895 Charlotte NC US
1091896 Columbus MN US
1091897 Las Vegas NV US
1091898 Westminster CO US
1091899 New York NY US
1091900 Columbia SC US
1091901 Suffolk VA US
1091902 Chicago IL US
1091903 Belleville MI US
1091904 Kalamazoo MI US
1091905 Durham NC US
1091906 Waco TX US
1091907 Atlanta GA US
1091908 Columbia SC US
1091909 Darlington SC US
1091910 Reynoldsburg OH US
1091911 Canton MS US
1091912 Saint Louis MO US
1091913 Appleton WI US
1091914 Nashville TN US
1091915 Albuquerque NM US
1091916 Chicago IL US
1091917 Schaumburg IL US
1091918 Amsterdam NY US
1091919 Birmingham AL US
1091920 Carthage MS US
1091921 Warren MI US
1091922 Syracuse NY US

1090462 rows × 3 columns

In [19]:

jobs_US.groupby(['City', 'State', 'Country']).size().reset_index(name = 'Locationwise').sort_values('Locationwise', ascending = False).head()
City State Country Locationwise
6601 Houston TX US 19306
9835 New York NY US 18395
2651 Chicago IL US 17806
3475 Dallas TX US 13139
610 Atlanta GA US 12352

In [20]:

statewise_jobs = jobs_US.groupby(['State']).size().reset_index(name = 'Statewise').sort_values('Statewise', ascending = False)

In [21]:

jobs_US.groupby(['City']).size().reset_index(name='Citywise').sort_values('Citywise', ascending=False)
City Citywise
4564 Houston 19323
6809 New York 18402
1782 Chicago 17806
2351 Dallas 13202
408 Atlanta 12365
7650 Phoenix 12297
1709 Charlotte 10419
2056 Columbus 9323
4684 Indianapolis 9235
5632 Los Angeles 8878
7641 Philadelphia 8527
1846 Cincinnati 7650
10229 Washington 7619
440 Austin 7294
2518 Denver 7121
8574 San Antonio 7107
9551 Tampa 7076
7291 Orlando 7008
6299 Minneapolis 6811
6634 Nashville 6751
1031 Boston 6730
5653 Louisville 6717
4895 Kansas City 6717
8585 San Diego 6699
8589 San Francisco 6665
6182 Miami 6659
517 Baltimore 6472
3513 Fort Worth 6157
8783 Seattle 6112
8524 Saint Louis 5582
... ... ...
2265 Cross Junction 1
2263 Cross 1
7718 Pinon 1
7663 Pierce City 1
7662 Pierce 1
2284 Crystal Bay 1
2307 Cuney 1
7603 Pequot Indian Res 1
7605 Percy 1
7608 Peridot 1
7610 Perkins 1
7611 Perrine 1
2316 Cushman 1
2313 Curtiss 1
7622 Pescadero 1
2310 Curtice 1
2308 Cunningham 1
7635 Peyton 1
7658 Pickwick 1
2304 Cumberland Foreside 1
2300 Culver 1
7642 Philippi 1
7646 Philmont 1
7647 Philo 1
7649 Philomont 1
2293 Cuchillo 1
7652 Picabo 1
2290 Crystola 1
2286 Crystal Hill 1
0 29 Palms 1

10913 rows × 2 columns

In [22]:

citywise_jobs = jobs_US.groupby(['City']).size().reset_index(name='Citywise').sort_values('Citywise', ascending=False)

In [23]:

citywise_jobs_top = citywise_jobs.loc[citywise_jobs['Citywise']>=12]
  • jobs_US
  • statewise_jobs
  • citywise_jobs
  • citywise_jobs_top

User profile based on location

In [24]:

users_training_US = users_training.loc[users_training['Country'] == 'US']

In [25]:

users_training_statewise = users_training_US.groupby('State').size().reset_index(
    name='statewise').sort_values('statewise',ascending=False)
users_training_statewise.head()
State statewise
11 FL 40381
47 TX 33260
6 CA 31141
17 IL 22557
37 NY 19299

In [26]:

users_training_statewise_top = users_training_statewise.loc[users_training_statewise['statewise'] >= 12]

In [27]:

users_training_citywise = users_training_US.groupby(['City']).size().reset_index(
    name='citywise').sort_values('citywise',ascending=False)
users_training_citywise.head()
City citywise
1528 Chicago 6964
4066 Houston 5487
4177 Indianapolis 4450
5604 Miami 4359
6965 Philadelphia 4347

In [28]:

users_training_citywise_top = users_training_citywise.loc[users_training_citywise['citywise'] >= 12]
  • users_training_US
  • users_training_statewise
  • users_training_citywise
  • users_training_citywise_top

In [29]:

import ast 
from scipy import stats
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
# from nltk.stem.snowball import SnowballStemmer
# from nltk.stem.wordnet import WordNetLemmatizer
# from nltk.corpus import wordnet

Building model

In [30]:

jobs_US.columns
Index(['JobID', 'WindowID', 'Title', 'Description', 'Requirements', 'City',
       'State', 'Country', 'Zip5', 'StartDate', 'EndDate'],
      dtype='object')

In [31]:

jobs_US.head().transpose()
0 1 2 3 4
JobID 1 4 7 8 9
WindowID 1 1 1 1 1
Title Security Engineer/Technical Lead SAP Business Analyst / WM P/T HUMAN RESOURCES ASSISTANT Route Delivery Drivers Housekeeping
Description <p>Security Clearance Required:&nbsp; Top Secr... <strong>NO Corp. to Corp resumes&nbsp;are bein... <b> <b> P/T HUMAN RESOURCES ASSISTANT</b> <... CITY BEVERAGES Come to work for the best in th... I make sure every part of their day is magica...
Requirements <p>SKILL SET</p>\r<p>&nbsp;</p>\r<p>Network Se... <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... Please refer to the Job Description to view th... Please refer to the Job Description to view th... Please refer to the Job Description to view th...
City Washington Charlotte Winter Park Orlando Orlando
State DC NC FL FL FL
Country US US US US US
Zip5 20531 28217 32792 NaN NaN
StartDate 2012-03-07 13:17:01.643 2012-03-21 02:03:44.137 2012-03-02 16:36:55.447 2012-03-03 09:01:10.077 2012-03-03 09:01:11.88
EndDate 2012-04-06 23:59:59 2012-04-20 23:59:59 2012-04-01 23:59:59 2012-04-02 23:59:59 2012-04-02 23:59:59

In [32]:

jobs_US_base_line = jobs_US.iloc[0:10000,0:8]

In [33]:

jobs_US_base_line.head()
JobID WindowID Title Description Requirements City State Country
0 1 1 Security Engineer/Technical Lead <p>Security Clearance Required:&nbsp; Top Secr... <p>SKILL SET</p>\r<p>&nbsp;</p>\r<p>Network Se... Washington DC US
1 4 1 SAP Business Analyst / WM <strong>NO Corp. to Corp resumes&nbsp;are bein... <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... Charlotte NC US
2 7 1 P/T HUMAN RESOURCES ASSISTANT <b> <b> P/T HUMAN RESOURCES ASSISTANT</b> <... Please refer to the Job Description to view th... Winter Park FL US
3 8 1 Route Delivery Drivers CITY BEVERAGES Come to work for the best in th... Please refer to the Job Description to view th... Orlando FL US
4 9 1 Housekeeping I make sure every part of their day is magica... Please refer to the Job Description to view th... Orlando FL US

In [34]:

jobs_US_base_line['Title'] = jobs_US_base_line['Title'].fillna('')
jobs_US_base_line['Description'] = jobs_US_base_line['Description'].fillna('')
#jobs_US_base_line['Requirements'] = jobs_US_base_line['Requirements'].fillna('')

jobs_US_base_line['Description'] = jobs_US_base_line['Title'] + jobs_US_base_line['Description']

Clean html

In [35]:

import re

def preprocessor(text):
    text = text.replace('\\r', '').replace('&nbsp', '').replace('\n', '')
    text = re.sub('<[^>]*>', '', text)
    emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text)
    text = re.sub('[\W]+', ' ', text.lower()) +\
        ' '.join(emoticons).replace('-', '')
    return text

In [36]:

jobs_US_base_line['Description'] = jobs_US_base_line['Description'].astype(dtype='str').apply(preprocessor)

In [37]:

jobs_US_base_line.loc[0,'Description']
'security engineer technical leadsecurity clearance required top secret job number tmr 447location of job washington dctmr inc is an equal employment opportunity companyfor more job opportunities with tmr visit our website www tmrhq comsend resumes to hr tmrhq2 com job summary leads the customer rsquo s overall cyber security strategy formalizes service offerings consisted with itil best practices and provides design and architecture support provide security design architecture support for ojp rsquo s it security division itsd leads the secops team in the day to day ojp security operations support provides direction when needed in a security incident or technical issues works in concert with network operations on design integration for best security posture supports business development functions including capture management proposal development and responses and other initiatives to include conferences trade shows webinars developing white papers and the like identifies resources and mentors in house talent to ensure tmr remains responsive to growing initiatives and contracts with qualified personnel '

Dataset

From here onwards use jobs_US_base_line data frame to work on, which is selected by jobs_US.iloc[0:10000,0:8].

In [38]:

jobs_US_base_line.head()
JobID WindowID Title Description Requirements City State Country
0 1 1 Security Engineer/Technical Lead security engineer technical leadsecurity clear... <p>SKILL SET</p>\r<p>&nbsp;</p>\r<p>Network Se... Washington DC US
1 4 1 SAP Business Analyst / WM sap business analyst wmno corp to corp resumes... <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... Charlotte NC US
2 7 1 P/T HUMAN RESOURCES ASSISTANT p t human resources assistant p t human resour... Please refer to the Job Description to view th... Winter Park FL US
3 8 1 Route Delivery Drivers route delivery driverscity beverages come to w... Please refer to the Job Description to view th... Orlando FL US
4 9 1 Housekeeping housekeepingi make sure every part of their da... Please refer to the Job Description to view th... Orlando FL US

Dataframes

  • users_training
  • users_testing
  • apps_training
  • apps_testing
  • user_history_training
  • user_history_testing

Location

  • jobs_US
  • statewise_jobs
  • citywise_jobs
  • citywise_jobs_top

Content based filtering

job description based recommender

using term frequency-inverse document frequency

In [39]:

tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(jobs_US_base_line['Description'])

In [40]:

tfidf_matrix.shape
(10000, 535561)

In [41]:

print(tfidf_matrix)
  (0, 441695)	0.22505879122815065
  (0, 169589)	0.030669382747046916
  (0, 488697)	0.04752972531305034
  (0, 264028)	0.07576675940802603
  (0, 91384)	0.043127331709931944
  (0, 415767)	0.020558424763840996
  (0, 441253)	0.04981586018912077
  (0, 253884)	0.08610417293518174
  (0, 326351)	0.03628758749940522
  (0, 498708)	0.23741486018184604
  (0, 12862)	0.07913828672728201
  (0, 522879)	0.04488762967640074
  (0, 133507)	0.07913828672728201
  (0, 176599)	0.028902410192437344
  (0, 167677)	0.03274166426535237
  (0, 337345)	0.021001362288507987
  (0, 104352)	0.07000309538665801
  (0, 336738)	0.023882963162787138
  (0, 520158)	0.03215125588973581
  (0, 524056)	0.036239661087183676
  (0, 532905)	0.026723078362849224
  (0, 498712)	0.07913828672728201
  (0, 108785)	0.07913828672728201
  (0, 425609)	0.0371210673841218
  (0, 226031)	0.03469578910963611
  :	:
  (9999, 344616)	0.05017478681815903
  (9999, 467781)	0.05017478681815903
  (9999, 74829)	0.05017478681815903
  (9999, 523365)	0.05017478681815903
  (9999, 260759)	0.05017478681815903
  (9999, 74696)	0.05017478681815903
  (9999, 96875)	0.05017478681815903
  (9999, 150349)	0.05017478681815903
  (9999, 373161)	0.05017478681815903
  (9999, 390419)	0.05017478681815903
  (9999, 129078)	0.05017478681815903
  (9999, 203951)	0.05017478681815903
  (9999, 317361)	0.05017478681815903
  (9999, 385716)	0.05017478681815903
  (9999, 79158)	0.05017478681815903
  (9999, 492689)	0.05017478681815903
  (9999, 220490)	0.05017478681815903
  (9999, 414561)	0.05017478681815903
  (9999, 184487)	0.05017478681815903
  (9999, 492732)	0.05017478681815903
  (9999, 94472)	0.05017478681815903
  (9999, 351340)	0.05017478681815903
  (9999, 97193)	0.05017478681815903
  (9999, 447009)	0.05017478681815903
  (9999, 351336)	0.05339311347511375

In [42]:

jobs_US_base_line.loc[0,'Description']
'security engineer technical leadsecurity clearance required top secret job number tmr 447location of job washington dctmr inc is an equal employment opportunity companyfor more job opportunities with tmr visit our website www tmrhq comsend resumes to hr tmrhq2 com job summary leads the customer rsquo s overall cyber security strategy formalizes service offerings consisted with itil best practices and provides design and architecture support provide security design architecture support for ojp rsquo s it security division itsd leads the secops team in the day to day ojp security operations support provides direction when needed in a security incident or technical issues works in concert with network operations on design integration for best security posture supports business development functions including capture management proposal development and responses and other initiatives to include conferences trade shows webinars developing white papers and the like identifies resources and mentors in house talent to ensure tmr remains responsive to growing initiatives and contracts with qualified personnel '

In [43]:

cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [44]:

cosine_sim[0]
array([1.        , 0.03241652, 0.00838853, ..., 0.01491531, 0.01491531,
       0.01491531])

In [45]:

jobs_US_base_line = jobs_US_base_line.reset_index()
titles = jobs_US_base_line['Title']
indices = pd.Series(jobs_US_base_line.index, index=jobs_US_base_line['Title'])

In [46]:

jobs_US_base_line.head()
index JobID WindowID Title Description Requirements City State Country
0 0 1 1 Security Engineer/Technical Lead security engineer technical leadsecurity clear... <p>SKILL SET</p>\r<p>&nbsp;</p>\r<p>Network Se... Washington DC US
1 1 4 1 SAP Business Analyst / WM sap business analyst wmno corp to corp resumes... <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... Charlotte NC US
2 2 7 1 P/T HUMAN RESOURCES ASSISTANT p t human resources assistant p t human resour... Please refer to the Job Description to view th... Winter Park FL US
3 3 8 1 Route Delivery Drivers route delivery driverscity beverages come to w... Please refer to the Job Description to view th... Orlando FL US
4 4 9 1 Housekeeping housekeepingi make sure every part of their da... Please refer to the Job Description to view th... Orlando FL US

In [47]:

print(indices)
Title
Security Engineer/Technical Lead                                 0
SAP Business Analyst / WM                                        1
P/T HUMAN RESOURCES ASSISTANT                                    2
Route Delivery Drivers                                           3
Housekeeping                                                     4
SALON/SPA COORDINATOR                                            5
SUPERINTENDENT                                                   6
ELECTRONIC PRE-PRESS PROFESSIONAL                                7
UTILITY LINE TRUCK OPERATOR/ DIGGER DERRICK                      8
CONSTRUCTION PROJECT MGR & PM TRAINEE                            9
Administrative Assistant                                        10
ACCOUNT EXECUTIVES                                              11
COMMERCIAL ESTIMATOR                                            12
Immediate Opening                                               13
TESL Adjunct                                                    14
Salon Manager/Hairstylists                                      15
VOCATIONAL COUNSELOR                                            16
GALLERY SALES POSITIONS                                         17
SURGICAL SCRUB TECH                                             18
Real Estate Agent                                               19
LPN, RN, CNA, TECHS                                             20
Vacation Sales Representatives                                  21
Top Sales Agent                                                 22
Quick Service Food & Beverage                                   23
CREDIT/COLLECTIONS ASSISTANT                                    24
POOL TECH                                                       25
EXPERIENCED ROOFERS                                             26
ARIZA TALENT & MODELING                                         27
CDL CLASS A DRIVER                                              28
Skilled Tradesman                                               29
                                                              ... 
Sales Representative / Account Manager /  Customer Service    9970
Sales Representative / Account Manager /  Customer Service    9971
Sales Representative / Account Manager /  Customer Service    9972
Sales Representative / Account Manager /  Customer Service    9973
Sales Representative / Account Manager /  Customer Service    9974
Sales Representative / Account Manager /  Customer Service    9975
Sales Representative / Account Manager /  Customer Service    9976
Sales Representative / Account Manager /  Customer Service    9977
Sales Representative / Account Manager /  Customer Service    9978
Sales Representative / Account Manager /  Customer Service    9979
Sales Representative / Account Manager /  Customer Service    9980
Sales Representative / Account Manager /  Customer Service    9981
Sales Representative / Account Manager /  Customer Service    9982
Sales Representative / Account Manager /  Customer Service    9983
Sales Representative / Account Manager /  Customer Service    9984
Sales Representative / Account Manager /  Customer Service    9985
Sales Representative / Account Manager /  Customer Service    9986
Sales Representative / Account Manager /  Customer Service    9987
Sales Representative / Account Manager /  Customer Service    9988
Sales Representative / Account Manager /  Customer Service    9989
Sales Representative / Account Manager /  Customer Service    9990
Sales Representative / Account Manager /  Customer Service    9991
Sales Representative / Account Manager /  Customer Service    9992
Sales Representative / Account Manager /  Customer Service    9993
Sales Representative / Account Manager /  Customer Service    9994
Sales Representative / Account Manager /  Customer Service    9995
Sales Representative / Account Manager /  Customer Service    9996
Sales Representative / Account Manager /  Customer Service    9997
Sales Representative / Account Manager /  Customer Service    9998
Sales Representative / Account Manager /  Customer Service    9999
Length: 10000, dtype: int64

In [48]:

def get_recommendations(title):
    idx = indices[title]
    #print (idx)
    sim_scores = list(enumerate(cosine_sim[idx]))
    #print (sim_scores)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    job_indices = [i[0] for i in sim_scores]
    return titles.iloc[job_indices]

In [49]:

get_recommendations('SAP Business Analyst / WM').head(10)
1                           SAP Business Analyst / WM
6051                    SAP FI/CO Business Consultant
5159                          SAP Basis Administrator
5868                       SAP FI/CO Business Analyst
5351    SAP Sales and Distribution Solution Architect
4796       Senior Specialist - SAP Configuration - SD
5117                       SAP Integration Specialist
4290           SAP FICO Functional -2years experience
4728           SAP ABAP Developer with PRA experience
5244                                 Business Analyst
Name: Title, dtype: object

In [50]:

get_recommendations('Security Engineer/Technical Lead').head(10)
0                        Security Engineer/Technical Lead
5906                             Senior Security Engineer
6380                Security Technology - SIEM Consultant
3248                Senior Lead Systems Security Engineer
1302                       Information Security Architect
5525                   Sr. Information Security Architect
6873              Integrated System Service Engineer - CA
3230                    Computer Systems Security Manager
1568                 Senior Information Security Engineer
4901    Cloud Services Security Application Administrator
Name: Title, dtype: object

similar user based recommender

  • degree type, majors and total years of experience
  • users_training dataset

In [51]:

users_training.head()
UserID WindowID Split City State Country ZipCode DegreeType Major GraduationDate WorkHistoryCount TotalYearsExperience CurrentlyEmployed ManagedOthers ManagedHowMany
0 47 1 Train Paramount CA US 90723 High School NaN 1999-06-01 00:00:00 3 10.0 Yes No 0
1 72 1 Train La Mesa CA US 91941 Master's Anthropology 2011-01-01 00:00:00 10 8.0 Yes No 0
2 80 1 Train Williamstown NJ US 08094 High School Not Applicable 1985-06-01 00:00:00 5 11.0 Yes Yes 5
3 98 1 Train Astoria NY US 11105 Master's Journalism 2007-05-01 00:00:00 3 3.0 Yes No 0
4 123 1 Train Baton Rouge LA US 70808 Bachelor's Agricultural Business 2011-05-01 00:00:00 1 9.0 Yes No 0

In [52]:

user_based_approach_US = users_training.loc[users_training['Country']=='US']

In [53]:

user_based_approach = user_based_approach_US.iloc[0:10000,:].copy()

In [54]:

user_based_approach.head()
UserID WindowID Split City State Country ZipCode DegreeType Major GraduationDate WorkHistoryCount TotalYearsExperience CurrentlyEmployed ManagedOthers ManagedHowMany
0 47 1 Train Paramount CA US 90723 High School NaN 1999-06-01 00:00:00 3 10.0 Yes No 0
1 72 1 Train La Mesa CA US 91941 Master's Anthropology 2011-01-01 00:00:00 10 8.0 Yes No 0
2 80 1 Train Williamstown NJ US 08094 High School Not Applicable 1985-06-01 00:00:00 5 11.0 Yes Yes 5
3 98 1 Train Astoria NY US 11105 Master's Journalism 2007-05-01 00:00:00 3 3.0 Yes No 0
4 123 1 Train Baton Rouge LA US 70808 Bachelor's Agricultural Business 2011-05-01 00:00:00 1 9.0 Yes No 0

In [55]:

user_based_approach['DegreeType'] = user_based_approach['DegreeType'].fillna('')
user_based_approach['Major'] = user_based_approach['Major'].fillna('')
user_based_approach['TotalYearsExperience'] = str(user_based_approach['TotalYearsExperience'].fillna(''))

user_based_approach['DegreeType'] = user_based_approach['DegreeType'] + user_based_approach['Major'] + \
                                    user_based_approach['TotalYearsExperience']

In [56]:

tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(user_based_approach['DegreeType'])

In [57]:

tfidf_matrix.shape
(10000, 7337)

In [58]:

cosine_sim = linear_kernel(tfidf_matrix,tfidf_matrix)

In [59]:

cosine_sim[0]
array([1.        , 0.67053882, 0.84759861, ..., 0.43990417, 0.79335895,
       0.69670809])

In [60]:

user_based_approach = user_based_approach.reset_index()
userid = user_based_approach['UserID']
indices = pd.Series(user_based_approach.index, index=user_based_approach['UserID'])
indices.head(2)
UserID
47    0
72    1
dtype: int64

In [61]:

def get_recommendations_userwise(userid):
    idx = indices[userid]
    #print (idx)
    sim_scores = list(enumerate(cosine_sim[idx]))
    #print (sim_scores)
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    user_indices = [i[0] for i in sim_scores]
    #print (user_indices)
    return user_indices[0:11]

In [62]:

get_recommendations_userwise(123)
[4, 150, 1594, 5560, 2464, 2846, 7945, 8125, 1171, 11, 24]

In [63]:

def get_job_id(usrid_list):
    jobs_userwise = apps_training['UserID'].isin(usrid_list) #
    df1 = pd.DataFrame(data = apps_training[jobs_userwise], columns=['JobID'])
    joblist = df1['JobID'].tolist()
    Job_list = jobs['JobID'].isin(joblist) #[1083186, 516837, 507614, 754917, 686406, 1058896, 335132])
    df_temp = pd.DataFrame(data = jobs[Job_list], columns=['JobID','Title','Description','City','State'])
    return df_temp

In [64]:

get_job_id(get_recommendations_userwise(47))
JobID Title Description City State
905894 428902 Aircraft Servicer <b>Job Classification: </b> Direct Hire \r\n\r... Memphis TN
975525 1098447 Automotive Service Advisor <div>\r<div>Briggs Nissan in Lawrence Kansas h... Lawrence KS
980507 37309 Medical Lab Technician - High Volume Lab <span>Position Title:<span>&nbsp;&nbsp;&nbsp;&... Fort Myers FL
986244 83507 Nurse Tech (CNA/STNA) <p align="center"><b>Purpose of Your Job Posit... Englewood FL
987452 93883 Nurse Tech II (CNA/STNA) <B>Nurse Tech II (CNA/STNA)</B> <BR>\r<BR>\rTh... Fort Myers FL
1000910 228284 REGISTERED NURSE – ICU <p><strong><span><font face="">Registered Nurs... Punta Gorda FL
1007140 284840 Certified Nursing Assistant / CNA <hr>\r<p style="text-align: center"><strong>Ce... Saint Petersburg FL
1007141 284841 Home Health Aide / HHA <hr>\r<p style="text-align: center"><strong>Ho... Saint Petersburg FL
1009455 312536 Secretary II <br><br><b>Department: </b>COMM Maryland Cardi... Baltimore MD
1011978 341662 Medical Assistant Certified Medical Assistant for busy Pain Clin... Fort Myers FL
1034578 551375 Phlebotomist <p>Every day All Medical Personnel helps excep... Clearwater FL
1048060 684278 Sales Representative / Customer Service / Acco... <P>Central Payment offers limitless opportunit... Bonita Springs FL
1066952 867194 Hospital Liaison and Pharmaceutical Hospital Liaison with Pharmaceutical exp<br />... Fort Myers FL
1070785 910932 Nursing: CNA (PRN) <p>&nbsp;</p>\r<p>Take advantage of this great... Fort Myers FL
1076051 960285 All college grads apply! Entry level sales and... <div> <span>\r<div>\r<div><strong>All college ... Fort Myers FL
1091311 1108709 Certified Nursing Assistant / CNA / HHA <hr>\r<p style="text-align: center"><strong>Ce... Sarasota FL

Jupyter Notebook

Back to top ↑