Data Cleanup
Purpose: This portion of the process reviews the results from the webscrapping process for any errors or missing data.
Tools Needed:
Airtable
Prep Database
In a cleanup view on the Search Results Pages, filter for all items with the word “Please”
This will display logins that were captured instead of titles
In the database tab, select the correct database that is required for login
Delete the title information
Assign the correct batch number in the Batch #column (this can also be done easily by creating a filtered view – it will automatically populate this field as long it is the first filter)
Link each search group to the Search ID in the URLs tab in the SID column of the Search Results Page (NOTE: This will take a substantial amount of time)
Item by Item Cleanup
Review all items in the URLs tab in Airtable that met any of these criteria:
Fewer than 25 results
Marked in Red (indicates missing title)
Marked in Green (indicates missing Item URL)
Marked in Blue (indicates mismatch between Results number and the number of linked records from the Search Results Page
To review a URL
Assign your name to the “Checker” column on the URL page
Find that Group in the Search Result Pages tab in Airtable
Click on the Page URL for the first item
Compare the list of title on the page with the list of titles in Airtable.
Fill in any missing values (Title or URL)
Ensure that the order is correct
For any odditites found, please @LizWoolcott in the comments and put in a brief description