/
Data Cleanup

Data Cleanup

Purpose: This portion of the process reviews the results from the webscrapping process for any errors or missing data.


Tools Needed:

Airtable


Prep Database

  1. In a cleanup view on the Search Results Pages, filter for all items with the word “Please”

    1. This will display logins that were captured instead of titles

    2. In the database tab, select the correct database that is required for login

    3. Delete the title information

  2. Assign the correct batch number in the Batch #column (this can also be done easily by creating a filtered view – it will automatically populate this field as long it is the first filter)

  3. Link each search group to the Search ID in the URLs tab in the SID column of the Search Results Page (NOTE: This will take a substantial amount of time)

 

Item by Item Cleanup

  1. Review all items in the URLs tab in Airtable that met any of these criteria:

    1. Fewer than 25 results

    2. Marked in Red (indicates missing title)

    3. Marked in Green (indicates missing Item URL)

    4. Marked in Blue (indicates mismatch between Results number and the number of linked records from the Search Results Page

  2. To review a URL

    1. Assign your name to the “Checker” column on the URL page

    2. Find that Group in the Search Result Pages tab in Airtable

    3. Click on the Page URL for the first item

    4. Compare the list of title on the page with the list of titles in Airtable. 

    5. Fill in any missing values (Title or URL)

    6. Ensure that the order is correct

    7. For any odditites found, please @LizWoolcott in the comments and put in a brief description

Related content