Job Recommendation Engine
Introduction to Recommender
Mainly there’s two types of recommender, collaborative and content-based recommender.
Collaborative filtering
analyse information on user’d behaviours and predict what users will like based on their similarity to other users algo: k-NN approach, Pearson Correlation info: explicit and implicit data suffers from cold start, scalability, sparsity matrix factorisation
Content-based filtering (personality-based approach)
based on description of item and profile of user’s preference, recommend similar items tf-idf representation (item representation algo) info: user’s preference, user’s history recommendation is narrow
Hybrid recommender systems
combine collaborative filtering and content-based filtering, might be more effective sometimes
Job Recommendation
with dataset from Careerbuilder
TOC
- Import dependencies
- Load dataset
- EDA and Preprocessing
- split into training and testing dataset
- location
- preprocessing
- Building model
- Clean html
- Content based filtering
- job description based recommender
- similar user based recommender
Import dependencies
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
Load dataset
In [2]:
!ls data/*.tsv
data/apps.tsv data/test_users.tsv data/users.tsv
data/jobs.tsv data/user_history.tsv data/window_dates.tsv
- users
- jobs
- apps
- users_history
- test_users
In [3]:
users = pd.read_csv('data/users.tsv', sep='\t', encoding='utf-8')
jobs = pd.read_csv('data/jobs.tsv', sep='\t', encoding='utf-8', error_bad_lines=False)
apps = pd.read_csv('data/apps.tsv', sep='\t', encoding='utf-8')
user_history = pd.read_csv('data/user_history.tsv', sep='\t', encoding='utf-8')
test_users = pd.read_csv('data/test_users.tsv', sep='\t', encoding='utf-8')
# jobs = pd.read_csv('data/jobs.tsv', sep='\t')
b'Skipping line 122433: expected 11 fields, saw 12\n'
b'Skipping line 602576: expected 11 fields, saw 12\n'
b'Skipping line 990950: expected 11 fields, saw 12\n'
/home/hectoryee/anaconda3/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2785: DtypeWarning: Columns (8) have mixed types. Specify dtype option on import or set low_memory=False.
interactivity=interactivity, compiler=compiler, result=result)
In [4]:
users.head()
UserID | WindowID | Split | City | State | Country | ZipCode | DegreeType | Major | GraduationDate | WorkHistoryCount | TotalYearsExperience | CurrentlyEmployed | ManagedOthers | ManagedHowMany | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 47 | 1 | Train | Paramount | CA | US | 90723 | High School | NaN | 1999-06-01 00:00:00 | 3 | 10.0 | Yes | No | 0 |
1 | 72 | 1 | Train | La Mesa | CA | US | 91941 | Master's | Anthropology | 2011-01-01 00:00:00 | 10 | 8.0 | Yes | No | 0 |
2 | 80 | 1 | Train | Williamstown | NJ | US | 08094 | High School | Not Applicable | 1985-06-01 00:00:00 | 5 | 11.0 | Yes | Yes | 5 |
3 | 98 | 1 | Train | Astoria | NY | US | 11105 | Master's | Journalism | 2007-05-01 00:00:00 | 3 | 3.0 | Yes | No | 0 |
4 | 123 | 1 | Train | Baton Rouge | LA | US | 70808 | Bachelor's | Agricultural Business | 2011-05-01 00:00:00 | 1 | 9.0 | Yes | No | 0 |
In [5]:
users.columns
Index(['UserID', 'WindowID', 'Split', 'City', 'State', 'Country', 'ZipCode',
'DegreeType', 'Major', 'GraduationDate', 'WorkHistoryCount',
'TotalYearsExperience', 'CurrentlyEmployed', 'ManagedOthers',
'ManagedHowMany'],
dtype='object')
In [6]:
users.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 389708 entries, 0 to 389707
Data columns (total 15 columns):
UserID 389708 non-null int64
WindowID 389708 non-null int64
Split 389708 non-null object
City 389708 non-null object
State 389218 non-null object
Country 389708 non-null object
ZipCode 387974 non-null object
DegreeType 389708 non-null object
Major 292468 non-null object
GraduationDate 269477 non-null object
WorkHistoryCount 389708 non-null int64
TotalYearsExperience 375528 non-null float64
CurrentlyEmployed 347632 non-null object
ManagedOthers 389708 non-null object
ManagedHowMany 389708 non-null int64
dtypes: float64(1), int64(4), object(10)
memory usage: 44.6+ MB
In [7]:
jobs.replace('NaN',np.NaN)
jobs.head()
JobID | WindowID | Title | Description | Requirements | City | State | Country | Zip5 | StartDate | EndDate | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | Security Engineer/Technical Lead | <p>Security Clearance Required: Top Secr... | <p>SKILL SET</p>\r<p> </p>\r<p>Network Se... | Washington | DC | US | 20531 | 2012-03-07 13:17:01.643 | 2012-04-06 23:59:59 |
1 | 4 | 1 | SAP Business Analyst / WM | <strong>NO Corp. to Corp resumes are bein... | <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... | Charlotte | NC | US | 28217 | 2012-03-21 02:03:44.137 | 2012-04-20 23:59:59 |
2 | 7 | 1 | P/T HUMAN RESOURCES ASSISTANT | <b> <b> P/T HUMAN RESOURCES ASSISTANT</b> <... | Please refer to the Job Description to view th... | Winter Park | FL | US | 32792 | 2012-03-02 16:36:55.447 | 2012-04-01 23:59:59 |
3 | 8 | 1 | Route Delivery Drivers | CITY BEVERAGES Come to work for the best in th... | Please refer to the Job Description to view th... | Orlando | FL | US | NaN | 2012-03-03 09:01:10.077 | 2012-04-02 23:59:59 |
4 | 9 | 1 | Housekeeping | I make sure every part of their day is magica... | Please refer to the Job Description to view th... | Orlando | FL | US | NaN | 2012-03-03 09:01:11.88 | 2012-04-02 23:59:59 |
In [8]:
apps.head()
UserID | WindowID | Split | ApplicationDate | JobID | |
---|---|---|---|---|---|
0 | 47 | 1 | Train | 2012-04-04 15:56:23.537 | 169528 |
1 | 47 | 1 | Train | 2012-04-06 01:03:00.003 | 284009 |
2 | 47 | 1 | Train | 2012-04-05 02:40:27.753 | 2121 |
3 | 47 | 1 | Train | 2012-04-05 02:37:02.673 | 848187 |
4 | 47 | 1 | Train | 2012-04-05 22:44:06.653 | 733748 |
In [9]:
user_history.head()
UserID | WindowID | Split | Sequence | JobTitle | |
---|---|---|---|---|---|
0 | 47 | 1 | Train | 1 | National Space Communication Programs-Special ... |
1 | 47 | 1 | Train | 2 | Detention Officer |
2 | 47 | 1 | Train | 3 | Passenger Screener, TSA |
3 | 72 | 1 | Train | 1 | Lecturer, Department of Anthropology |
4 | 72 | 1 | Train | 2 | Student Assistant |
In [10]:
test_users.head()
UserID | WindowID | |
---|---|---|
0 | 767 | 1 |
1 | 769 | 1 |
2 | 861 | 1 |
3 | 1006 | 1 |
4 | 1192 | 1 |
EDA and Preprocessing
Splitting into Training and Testing dataset
with attribute split:
- users
- apps
- user_history
users
In [11]:
users_training = users.loc[users['Split'] == 'Train']
In [12]:
users_testing = users.loc[users['Split'] == 'Test']
apps
In [13]:
apps_training = apps.loc[apps['Split'] == 'Train']
In [14]:
apps_testing = apps.loc[apps['Split'] == 'Test']
user_history
In [15]:
user_history_training = user_history.loc[user_history['Split'] == 'Train']
In [16]:
user_history_testing = user_history.loc[user_history['Split'] == 'Test']
Dataframes
- users_training
- users_testing
- apps_training
- apps_testing
- user_history_training
- user_history_testing
Preprocessing
- Considering only US
- Removing data with empty state
In [17]:
jobs_US = jobs.loc[jobs['Country'] == 'US']
In [18]:
jobs_US[['City','State','Country']]
City | State | Country | |
---|---|---|---|
0 | Washington | DC | US |
1 | Charlotte | NC | US |
2 | Winter Park | FL | US |
3 | Orlando | FL | US |
4 | Orlando | FL | US |
5 | Ormond Beach | FL | US |
6 | Orlando | FL | US |
7 | Orlando | FL | US |
8 | Orlando | FL | US |
9 | Winter Park | FL | US |
10 | Los Angeles | CA | US |
11 | Longwood | FL | US |
12 | Altamonte Springs | FL | US |
13 | Orlando | FL | US |
14 | Daytona Beach | FL | US |
15 | Oviedo | FL | US |
16 | Orlando | FL | US |
17 | Orlando | FL | US |
18 | Altamonte Springs | FL | US |
19 | Windermere | FL | US |
20 | Leesburg | FL | US |
21 | Orlando | FL | US |
22 | Orlando | FL | US |
23 | Orlando | FL | US |
24 | Orlando | FL | US |
25 | Orlando | FL | US |
26 | Orlando | FL | US |
27 | Longwood | FL | US |
28 | Apopka | FL | US |
29 | Orlando | FL | US |
... | ... | ... | ... |
1091893 | Yonkers | NY | US |
1091894 | Newark | NJ | US |
1091895 | Charlotte | NC | US |
1091896 | Columbus | MN | US |
1091897 | Las Vegas | NV | US |
1091898 | Westminster | CO | US |
1091899 | New York | NY | US |
1091900 | Columbia | SC | US |
1091901 | Suffolk | VA | US |
1091902 | Chicago | IL | US |
1091903 | Belleville | MI | US |
1091904 | Kalamazoo | MI | US |
1091905 | Durham | NC | US |
1091906 | Waco | TX | US |
1091907 | Atlanta | GA | US |
1091908 | Columbia | SC | US |
1091909 | Darlington | SC | US |
1091910 | Reynoldsburg | OH | US |
1091911 | Canton | MS | US |
1091912 | Saint Louis | MO | US |
1091913 | Appleton | WI | US |
1091914 | Nashville | TN | US |
1091915 | Albuquerque | NM | US |
1091916 | Chicago | IL | US |
1091917 | Schaumburg | IL | US |
1091918 | Amsterdam | NY | US |
1091919 | Birmingham | AL | US |
1091920 | Carthage | MS | US |
1091921 | Warren | MI | US |
1091922 | Syracuse | NY | US |
1090462 rows × 3 columns
In [19]:
jobs_US.groupby(['City', 'State', 'Country']).size().reset_index(name = 'Locationwise').sort_values('Locationwise', ascending = False).head()
City | State | Country | Locationwise | |
---|---|---|---|---|
6601 | Houston | TX | US | 19306 |
9835 | New York | NY | US | 18395 |
2651 | Chicago | IL | US | 17806 |
3475 | Dallas | TX | US | 13139 |
610 | Atlanta | GA | US | 12352 |
In [20]:
statewise_jobs = jobs_US.groupby(['State']).size().reset_index(name = 'Statewise').sort_values('Statewise', ascending = False)
In [21]:
jobs_US.groupby(['City']).size().reset_index(name='Citywise').sort_values('Citywise', ascending=False)
City | Citywise | |
---|---|---|
4564 | Houston | 19323 |
6809 | New York | 18402 |
1782 | Chicago | 17806 |
2351 | Dallas | 13202 |
408 | Atlanta | 12365 |
7650 | Phoenix | 12297 |
1709 | Charlotte | 10419 |
2056 | Columbus | 9323 |
4684 | Indianapolis | 9235 |
5632 | Los Angeles | 8878 |
7641 | Philadelphia | 8527 |
1846 | Cincinnati | 7650 |
10229 | Washington | 7619 |
440 | Austin | 7294 |
2518 | Denver | 7121 |
8574 | San Antonio | 7107 |
9551 | Tampa | 7076 |
7291 | Orlando | 7008 |
6299 | Minneapolis | 6811 |
6634 | Nashville | 6751 |
1031 | Boston | 6730 |
5653 | Louisville | 6717 |
4895 | Kansas City | 6717 |
8585 | San Diego | 6699 |
8589 | San Francisco | 6665 |
6182 | Miami | 6659 |
517 | Baltimore | 6472 |
3513 | Fort Worth | 6157 |
8783 | Seattle | 6112 |
8524 | Saint Louis | 5582 |
... | ... | ... |
2265 | Cross Junction | 1 |
2263 | Cross | 1 |
7718 | Pinon | 1 |
7663 | Pierce City | 1 |
7662 | Pierce | 1 |
2284 | Crystal Bay | 1 |
2307 | Cuney | 1 |
7603 | Pequot Indian Res | 1 |
7605 | Percy | 1 |
7608 | Peridot | 1 |
7610 | Perkins | 1 |
7611 | Perrine | 1 |
2316 | Cushman | 1 |
2313 | Curtiss | 1 |
7622 | Pescadero | 1 |
2310 | Curtice | 1 |
2308 | Cunningham | 1 |
7635 | Peyton | 1 |
7658 | Pickwick | 1 |
2304 | Cumberland Foreside | 1 |
2300 | Culver | 1 |
7642 | Philippi | 1 |
7646 | Philmont | 1 |
7647 | Philo | 1 |
7649 | Philomont | 1 |
2293 | Cuchillo | 1 |
7652 | Picabo | 1 |
2290 | Crystola | 1 |
2286 | Crystal Hill | 1 |
0 | 29 Palms | 1 |
10913 rows × 2 columns
In [22]:
citywise_jobs = jobs_US.groupby(['City']).size().reset_index(name='Citywise').sort_values('Citywise', ascending=False)
In [23]:
citywise_jobs_top = citywise_jobs.loc[citywise_jobs['Citywise']>=12]
- jobs_US
- statewise_jobs
- citywise_jobs
- citywise_jobs_top
User profile based on location
In [24]:
users_training_US = users_training.loc[users_training['Country'] == 'US']
In [25]:
users_training_statewise = users_training_US.groupby('State').size().reset_index(
name='statewise').sort_values('statewise',ascending=False)
users_training_statewise.head()
State | statewise | |
---|---|---|
11 | FL | 40381 |
47 | TX | 33260 |
6 | CA | 31141 |
17 | IL | 22557 |
37 | NY | 19299 |
In [26]:
users_training_statewise_top = users_training_statewise.loc[users_training_statewise['statewise'] >= 12]
In [27]:
users_training_citywise = users_training_US.groupby(['City']).size().reset_index(
name='citywise').sort_values('citywise',ascending=False)
users_training_citywise.head()
City | citywise | |
---|---|---|
1528 | Chicago | 6964 |
4066 | Houston | 5487 |
4177 | Indianapolis | 4450 |
5604 | Miami | 4359 |
6965 | Philadelphia | 4347 |
In [28]:
users_training_citywise_top = users_training_citywise.loc[users_training_citywise['citywise'] >= 12]
- users_training_US
- users_training_statewise
- users_training_citywise
- users_training_citywise_top
In [29]:
import ast
from scipy import stats
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer, CountVectorizer
from sklearn.metrics.pairwise import linear_kernel, cosine_similarity
# from nltk.stem.snowball import SnowballStemmer
# from nltk.stem.wordnet import WordNetLemmatizer
# from nltk.corpus import wordnet
Building model
In [30]:
jobs_US.columns
Index(['JobID', 'WindowID', 'Title', 'Description', 'Requirements', 'City',
'State', 'Country', 'Zip5', 'StartDate', 'EndDate'],
dtype='object')
In [31]:
jobs_US.head().transpose()
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
JobID | 1 | 4 | 7 | 8 | 9 |
WindowID | 1 | 1 | 1 | 1 | 1 |
Title | Security Engineer/Technical Lead | SAP Business Analyst / WM | P/T HUMAN RESOURCES ASSISTANT | Route Delivery Drivers | Housekeeping |
Description | <p>Security Clearance Required: Top Secr... | <strong>NO Corp. to Corp resumes are bein... | <b> <b> P/T HUMAN RESOURCES ASSISTANT</b> <... | CITY BEVERAGES Come to work for the best in th... | I make sure every part of their day is magica... |
Requirements | <p>SKILL SET</p>\r<p> </p>\r<p>Network Se... | <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... | Please refer to the Job Description to view th... | Please refer to the Job Description to view th... | Please refer to the Job Description to view th... |
City | Washington | Charlotte | Winter Park | Orlando | Orlando |
State | DC | NC | FL | FL | FL |
Country | US | US | US | US | US |
Zip5 | 20531 | 28217 | 32792 | NaN | NaN |
StartDate | 2012-03-07 13:17:01.643 | 2012-03-21 02:03:44.137 | 2012-03-02 16:36:55.447 | 2012-03-03 09:01:10.077 | 2012-03-03 09:01:11.88 |
EndDate | 2012-04-06 23:59:59 | 2012-04-20 23:59:59 | 2012-04-01 23:59:59 | 2012-04-02 23:59:59 | 2012-04-02 23:59:59 |
In [32]:
jobs_US_base_line = jobs_US.iloc[0:10000,0:8]
In [33]:
jobs_US_base_line.head()
JobID | WindowID | Title | Description | Requirements | City | State | Country | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | Security Engineer/Technical Lead | <p>Security Clearance Required: Top Secr... | <p>SKILL SET</p>\r<p> </p>\r<p>Network Se... | Washington | DC | US |
1 | 4 | 1 | SAP Business Analyst / WM | <strong>NO Corp. to Corp resumes are bein... | <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... | Charlotte | NC | US |
2 | 7 | 1 | P/T HUMAN RESOURCES ASSISTANT | <b> <b> P/T HUMAN RESOURCES ASSISTANT</b> <... | Please refer to the Job Description to view th... | Winter Park | FL | US |
3 | 8 | 1 | Route Delivery Drivers | CITY BEVERAGES Come to work for the best in th... | Please refer to the Job Description to view th... | Orlando | FL | US |
4 | 9 | 1 | Housekeeping | I make sure every part of their day is magica... | Please refer to the Job Description to view th... | Orlando | FL | US |
In [34]:
jobs_US_base_line['Title'] = jobs_US_base_line['Title'].fillna('')
jobs_US_base_line['Description'] = jobs_US_base_line['Description'].fillna('')
#jobs_US_base_line['Requirements'] = jobs_US_base_line['Requirements'].fillna('')
jobs_US_base_line['Description'] = jobs_US_base_line['Title'] + jobs_US_base_line['Description']
Clean html
In [35]:
import re
def preprocessor(text):
text = text.replace('\\r', '').replace(' ', '').replace('\n', '')
text = re.sub('<[^>]*>', '', text)
emoticons = re.findall('(?::|;|=)(?:-)?(?:\)|\(|D|P)', text)
text = re.sub('[\W]+', ' ', text.lower()) +\
' '.join(emoticons).replace('-', '')
return text
In [36]:
jobs_US_base_line['Description'] = jobs_US_base_line['Description'].astype(dtype='str').apply(preprocessor)
In [37]:
jobs_US_base_line.loc[0,'Description']
'security engineer technical leadsecurity clearance required top secret job number tmr 447location of job washington dctmr inc is an equal employment opportunity companyfor more job opportunities with tmr visit our website www tmrhq comsend resumes to hr tmrhq2 com job summary leads the customer rsquo s overall cyber security strategy formalizes service offerings consisted with itil best practices and provides design and architecture support provide security design architecture support for ojp rsquo s it security division itsd leads the secops team in the day to day ojp security operations support provides direction when needed in a security incident or technical issues works in concert with network operations on design integration for best security posture supports business development functions including capture management proposal development and responses and other initiatives to include conferences trade shows webinars developing white papers and the like identifies resources and mentors in house talent to ensure tmr remains responsive to growing initiatives and contracts with qualified personnel '
Dataset
From here onwards use jobs_US_base_line
data frame to work on, which is
selected by jobs_US.iloc[0:10000,0:8]
.
In [38]:
jobs_US_base_line.head()
JobID | WindowID | Title | Description | Requirements | City | State | Country | |
---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | Security Engineer/Technical Lead | security engineer technical leadsecurity clear... | <p>SKILL SET</p>\r<p> </p>\r<p>Network Se... | Washington | DC | US |
1 | 4 | 1 | SAP Business Analyst / WM | sap business analyst wmno corp to corp resumes... | <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... | Charlotte | NC | US |
2 | 7 | 1 | P/T HUMAN RESOURCES ASSISTANT | p t human resources assistant p t human resour... | Please refer to the Job Description to view th... | Winter Park | FL | US |
3 | 8 | 1 | Route Delivery Drivers | route delivery driverscity beverages come to w... | Please refer to the Job Description to view th... | Orlando | FL | US |
4 | 9 | 1 | Housekeeping | housekeepingi make sure every part of their da... | Please refer to the Job Description to view th... | Orlando | FL | US |
Dataframes
- users_training
- users_testing
- apps_training
- apps_testing
- user_history_training
- user_history_testing
Location
- jobs_US
- statewise_jobs
- citywise_jobs
- citywise_jobs_top
Content based filtering
job description based recommender
using term frequency-inverse document frequency
In [39]:
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(jobs_US_base_line['Description'])
In [40]:
tfidf_matrix.shape
(10000, 535561)
In [41]:
print(tfidf_matrix)
(0, 441695) 0.22505879122815065
(0, 169589) 0.030669382747046916
(0, 488697) 0.04752972531305034
(0, 264028) 0.07576675940802603
(0, 91384) 0.043127331709931944
(0, 415767) 0.020558424763840996
(0, 441253) 0.04981586018912077
(0, 253884) 0.08610417293518174
(0, 326351) 0.03628758749940522
(0, 498708) 0.23741486018184604
(0, 12862) 0.07913828672728201
(0, 522879) 0.04488762967640074
(0, 133507) 0.07913828672728201
(0, 176599) 0.028902410192437344
(0, 167677) 0.03274166426535237
(0, 337345) 0.021001362288507987
(0, 104352) 0.07000309538665801
(0, 336738) 0.023882963162787138
(0, 520158) 0.03215125588973581
(0, 524056) 0.036239661087183676
(0, 532905) 0.026723078362849224
(0, 498712) 0.07913828672728201
(0, 108785) 0.07913828672728201
(0, 425609) 0.0371210673841218
(0, 226031) 0.03469578910963611
: :
(9999, 344616) 0.05017478681815903
(9999, 467781) 0.05017478681815903
(9999, 74829) 0.05017478681815903
(9999, 523365) 0.05017478681815903
(9999, 260759) 0.05017478681815903
(9999, 74696) 0.05017478681815903
(9999, 96875) 0.05017478681815903
(9999, 150349) 0.05017478681815903
(9999, 373161) 0.05017478681815903
(9999, 390419) 0.05017478681815903
(9999, 129078) 0.05017478681815903
(9999, 203951) 0.05017478681815903
(9999, 317361) 0.05017478681815903
(9999, 385716) 0.05017478681815903
(9999, 79158) 0.05017478681815903
(9999, 492689) 0.05017478681815903
(9999, 220490) 0.05017478681815903
(9999, 414561) 0.05017478681815903
(9999, 184487) 0.05017478681815903
(9999, 492732) 0.05017478681815903
(9999, 94472) 0.05017478681815903
(9999, 351340) 0.05017478681815903
(9999, 97193) 0.05017478681815903
(9999, 447009) 0.05017478681815903
(9999, 351336) 0.05339311347511375
In [42]:
jobs_US_base_line.loc[0,'Description']
'security engineer technical leadsecurity clearance required top secret job number tmr 447location of job washington dctmr inc is an equal employment opportunity companyfor more job opportunities with tmr visit our website www tmrhq comsend resumes to hr tmrhq2 com job summary leads the customer rsquo s overall cyber security strategy formalizes service offerings consisted with itil best practices and provides design and architecture support provide security design architecture support for ojp rsquo s it security division itsd leads the secops team in the day to day ojp security operations support provides direction when needed in a security incident or technical issues works in concert with network operations on design integration for best security posture supports business development functions including capture management proposal development and responses and other initiatives to include conferences trade shows webinars developing white papers and the like identifies resources and mentors in house talent to ensure tmr remains responsive to growing initiatives and contracts with qualified personnel '
In [43]:
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
In [44]:
cosine_sim[0]
array([1. , 0.03241652, 0.00838853, ..., 0.01491531, 0.01491531,
0.01491531])
In [45]:
jobs_US_base_line = jobs_US_base_line.reset_index()
titles = jobs_US_base_line['Title']
indices = pd.Series(jobs_US_base_line.index, index=jobs_US_base_line['Title'])
In [46]:
jobs_US_base_line.head()
index | JobID | WindowID | Title | Description | Requirements | City | State | Country | |
---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 1 | Security Engineer/Technical Lead | security engineer technical leadsecurity clear... | <p>SKILL SET</p>\r<p> </p>\r<p>Network Se... | Washington | DC | US |
1 | 1 | 4 | 1 | SAP Business Analyst / WM | sap business analyst wmno corp to corp resumes... | <p><b>WHAT YOU NEED: </b></p>\r<p>Four year co... | Charlotte | NC | US |
2 | 2 | 7 | 1 | P/T HUMAN RESOURCES ASSISTANT | p t human resources assistant p t human resour... | Please refer to the Job Description to view th... | Winter Park | FL | US |
3 | 3 | 8 | 1 | Route Delivery Drivers | route delivery driverscity beverages come to w... | Please refer to the Job Description to view th... | Orlando | FL | US |
4 | 4 | 9 | 1 | Housekeeping | housekeepingi make sure every part of their da... | Please refer to the Job Description to view th... | Orlando | FL | US |
In [47]:
print(indices)
Title
Security Engineer/Technical Lead 0
SAP Business Analyst / WM 1
P/T HUMAN RESOURCES ASSISTANT 2
Route Delivery Drivers 3
Housekeeping 4
SALON/SPA COORDINATOR 5
SUPERINTENDENT 6
ELECTRONIC PRE-PRESS PROFESSIONAL 7
UTILITY LINE TRUCK OPERATOR/ DIGGER DERRICK 8
CONSTRUCTION PROJECT MGR & PM TRAINEE 9
Administrative Assistant 10
ACCOUNT EXECUTIVES 11
COMMERCIAL ESTIMATOR 12
Immediate Opening 13
TESL Adjunct 14
Salon Manager/Hairstylists 15
VOCATIONAL COUNSELOR 16
GALLERY SALES POSITIONS 17
SURGICAL SCRUB TECH 18
Real Estate Agent 19
LPN, RN, CNA, TECHS 20
Vacation Sales Representatives 21
Top Sales Agent 22
Quick Service Food & Beverage 23
CREDIT/COLLECTIONS ASSISTANT 24
POOL TECH 25
EXPERIENCED ROOFERS 26
ARIZA TALENT & MODELING 27
CDL CLASS A DRIVER 28
Skilled Tradesman 29
...
Sales Representative / Account Manager / Customer Service 9970
Sales Representative / Account Manager / Customer Service 9971
Sales Representative / Account Manager / Customer Service 9972
Sales Representative / Account Manager / Customer Service 9973
Sales Representative / Account Manager / Customer Service 9974
Sales Representative / Account Manager / Customer Service 9975
Sales Representative / Account Manager / Customer Service 9976
Sales Representative / Account Manager / Customer Service 9977
Sales Representative / Account Manager / Customer Service 9978
Sales Representative / Account Manager / Customer Service 9979
Sales Representative / Account Manager / Customer Service 9980
Sales Representative / Account Manager / Customer Service 9981
Sales Representative / Account Manager / Customer Service 9982
Sales Representative / Account Manager / Customer Service 9983
Sales Representative / Account Manager / Customer Service 9984
Sales Representative / Account Manager / Customer Service 9985
Sales Representative / Account Manager / Customer Service 9986
Sales Representative / Account Manager / Customer Service 9987
Sales Representative / Account Manager / Customer Service 9988
Sales Representative / Account Manager / Customer Service 9989
Sales Representative / Account Manager / Customer Service 9990
Sales Representative / Account Manager / Customer Service 9991
Sales Representative / Account Manager / Customer Service 9992
Sales Representative / Account Manager / Customer Service 9993
Sales Representative / Account Manager / Customer Service 9994
Sales Representative / Account Manager / Customer Service 9995
Sales Representative / Account Manager / Customer Service 9996
Sales Representative / Account Manager / Customer Service 9997
Sales Representative / Account Manager / Customer Service 9998
Sales Representative / Account Manager / Customer Service 9999
Length: 10000, dtype: int64
In [48]:
def get_recommendations(title):
idx = indices[title]
#print (idx)
sim_scores = list(enumerate(cosine_sim[idx]))
#print (sim_scores)
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
job_indices = [i[0] for i in sim_scores]
return titles.iloc[job_indices]
In [49]:
get_recommendations('SAP Business Analyst / WM').head(10)
1 SAP Business Analyst / WM
6051 SAP FI/CO Business Consultant
5159 SAP Basis Administrator
5868 SAP FI/CO Business Analyst
5351 SAP Sales and Distribution Solution Architect
4796 Senior Specialist - SAP Configuration - SD
5117 SAP Integration Specialist
4290 SAP FICO Functional -2years experience
4728 SAP ABAP Developer with PRA experience
5244 Business Analyst
Name: Title, dtype: object
In [50]:
get_recommendations('Security Engineer/Technical Lead').head(10)
0 Security Engineer/Technical Lead
5906 Senior Security Engineer
6380 Security Technology - SIEM Consultant
3248 Senior Lead Systems Security Engineer
1302 Information Security Architect
5525 Sr. Information Security Architect
6873 Integrated System Service Engineer - CA
3230 Computer Systems Security Manager
1568 Senior Information Security Engineer
4901 Cloud Services Security Application Administrator
Name: Title, dtype: object
similar user based recommender
- degree type, majors and total years of experience
users_training
dataset
In [51]:
users_training.head()
UserID | WindowID | Split | City | State | Country | ZipCode | DegreeType | Major | GraduationDate | WorkHistoryCount | TotalYearsExperience | CurrentlyEmployed | ManagedOthers | ManagedHowMany | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 47 | 1 | Train | Paramount | CA | US | 90723 | High School | NaN | 1999-06-01 00:00:00 | 3 | 10.0 | Yes | No | 0 |
1 | 72 | 1 | Train | La Mesa | CA | US | 91941 | Master's | Anthropology | 2011-01-01 00:00:00 | 10 | 8.0 | Yes | No | 0 |
2 | 80 | 1 | Train | Williamstown | NJ | US | 08094 | High School | Not Applicable | 1985-06-01 00:00:00 | 5 | 11.0 | Yes | Yes | 5 |
3 | 98 | 1 | Train | Astoria | NY | US | 11105 | Master's | Journalism | 2007-05-01 00:00:00 | 3 | 3.0 | Yes | No | 0 |
4 | 123 | 1 | Train | Baton Rouge | LA | US | 70808 | Bachelor's | Agricultural Business | 2011-05-01 00:00:00 | 1 | 9.0 | Yes | No | 0 |
In [52]:
user_based_approach_US = users_training.loc[users_training['Country']=='US']
In [53]:
user_based_approach = user_based_approach_US.iloc[0:10000,:].copy()
In [54]:
user_based_approach.head()
UserID | WindowID | Split | City | State | Country | ZipCode | DegreeType | Major | GraduationDate | WorkHistoryCount | TotalYearsExperience | CurrentlyEmployed | ManagedOthers | ManagedHowMany | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 47 | 1 | Train | Paramount | CA | US | 90723 | High School | NaN | 1999-06-01 00:00:00 | 3 | 10.0 | Yes | No | 0 |
1 | 72 | 1 | Train | La Mesa | CA | US | 91941 | Master's | Anthropology | 2011-01-01 00:00:00 | 10 | 8.0 | Yes | No | 0 |
2 | 80 | 1 | Train | Williamstown | NJ | US | 08094 | High School | Not Applicable | 1985-06-01 00:00:00 | 5 | 11.0 | Yes | Yes | 5 |
3 | 98 | 1 | Train | Astoria | NY | US | 11105 | Master's | Journalism | 2007-05-01 00:00:00 | 3 | 3.0 | Yes | No | 0 |
4 | 123 | 1 | Train | Baton Rouge | LA | US | 70808 | Bachelor's | Agricultural Business | 2011-05-01 00:00:00 | 1 | 9.0 | Yes | No | 0 |
In [55]:
user_based_approach['DegreeType'] = user_based_approach['DegreeType'].fillna('')
user_based_approach['Major'] = user_based_approach['Major'].fillna('')
user_based_approach['TotalYearsExperience'] = str(user_based_approach['TotalYearsExperience'].fillna(''))
user_based_approach['DegreeType'] = user_based_approach['DegreeType'] + user_based_approach['Major'] + \
user_based_approach['TotalYearsExperience']
In [56]:
tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
tfidf_matrix = tf.fit_transform(user_based_approach['DegreeType'])
In [57]:
tfidf_matrix.shape
(10000, 7337)
In [58]:
cosine_sim = linear_kernel(tfidf_matrix,tfidf_matrix)
In [59]:
cosine_sim[0]
array([1. , 0.67053882, 0.84759861, ..., 0.43990417, 0.79335895,
0.69670809])
In [60]:
user_based_approach = user_based_approach.reset_index()
userid = user_based_approach['UserID']
indices = pd.Series(user_based_approach.index, index=user_based_approach['UserID'])
indices.head(2)
UserID
47 0
72 1
dtype: int64
In [61]:
def get_recommendations_userwise(userid):
idx = indices[userid]
#print (idx)
sim_scores = list(enumerate(cosine_sim[idx]))
#print (sim_scores)
sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
user_indices = [i[0] for i in sim_scores]
#print (user_indices)
return user_indices[0:11]
In [62]:
get_recommendations_userwise(123)
[4, 150, 1594, 5560, 2464, 2846, 7945, 8125, 1171, 11, 24]
In [63]:
def get_job_id(usrid_list):
jobs_userwise = apps_training['UserID'].isin(usrid_list) #
df1 = pd.DataFrame(data = apps_training[jobs_userwise], columns=['JobID'])
joblist = df1['JobID'].tolist()
Job_list = jobs['JobID'].isin(joblist) #[1083186, 516837, 507614, 754917, 686406, 1058896, 335132])
df_temp = pd.DataFrame(data = jobs[Job_list], columns=['JobID','Title','Description','City','State'])
return df_temp
In [64]:
get_job_id(get_recommendations_userwise(47))
JobID | Title | Description | City | State | |
---|---|---|---|---|---|
905894 | 428902 | Aircraft Servicer | <b>Job Classification: </b> Direct Hire \r\n\r... | Memphis | TN |
975525 | 1098447 | Automotive Service Advisor | <div>\r<div>Briggs Nissan in Lawrence Kansas h... | Lawrence | KS |
980507 | 37309 | Medical Lab Technician - High Volume Lab | <span>Position Title:<span> &... | Fort Myers | FL |
986244 | 83507 | Nurse Tech (CNA/STNA) | <p align="center"><b>Purpose of Your Job Posit... | Englewood | FL |
987452 | 93883 | Nurse Tech II (CNA/STNA) | <B>Nurse Tech II (CNA/STNA)</B> <BR>\r<BR>\rTh... | Fort Myers | FL |
1000910 | 228284 | REGISTERED NURSE – ICU | <p><strong><span><font face="">Registered Nurs... | Punta Gorda | FL |
1007140 | 284840 | Certified Nursing Assistant / CNA | <hr>\r<p style="text-align: center"><strong>Ce... | Saint Petersburg | FL |
1007141 | 284841 | Home Health Aide / HHA | <hr>\r<p style="text-align: center"><strong>Ho... | Saint Petersburg | FL |
1009455 | 312536 | Secretary II | <br><br><b>Department: </b>COMM Maryland Cardi... | Baltimore | MD |
1011978 | 341662 | Medical Assistant | Certified Medical Assistant for busy Pain Clin... | Fort Myers | FL |
1034578 | 551375 | Phlebotomist | <p>Every day All Medical Personnel helps excep... | Clearwater | FL |
1048060 | 684278 | Sales Representative / Customer Service / Acco... | <P>Central Payment offers limitless opportunit... | Bonita Springs | FL |
1066952 | 867194 | Hospital Liaison and Pharmaceutical | Hospital Liaison with Pharmaceutical exp<br />... | Fort Myers | FL |
1070785 | 910932 | Nursing: CNA (PRN) | <p> </p>\r<p>Take advantage of this great... | Fort Myers | FL |
1076051 | 960285 | All college grads apply! Entry level sales and... | <div> <span>\r<div>\r<div><strong>All college ... | Fort Myers | FL |
1091311 | 1108709 | Certified Nursing Assistant / CNA / HHA | <hr>\r<p style="text-align: center"><strong>Ce... | Sarasota | FL |