Automated Data File Validation Process

Introduction

The Automated File Validation Process aims to provide a way for validating the Import Files based on AEFIS Data Integration Specification Sheet. The process validates CSV files for Schema and Data Integrity with a predefined set of rules. A MarkDown file is created with the details of the results of the validation process per each file.

The process is scheduled to be up and running 24×7 and is triggered when a new file is uploaded every 5 minutes. To start the validation, a CSV file should be put in the following path: “sftp.aefis.net/filevalidation/{institution_code}/Validation
After each successful run, an email sent to [email protected] to notify internal users, and files are moved to “sftp.aefis.net/filevalidation/{institution_code}/Validation/Archive”.

This documentation provides the basic information for the Automated File Validation Process and aims to give a quick start. For additional information and use of the Automated File Validation Process for a specific use-case, please contact [email protected].

Implementation and Workflow

Each partner has its own SFTP folder. In the root folder, there is a subfolder called Validation. The import validation process checks if there are any files uploaded into the Validation folder and if so, it triggers the process. Process reviews the files, creates a validation results report, and moves them into the subfolder Validation/Archive. For each file, there will be another file with the same name and different extension (.md) that provides the validation results related to the file’s review.

File Review and Validation Rules

Various validation rules are applied to a single CSV file depending on the type of file. This section aims to provide information on Validation Rules and what they mean. In short, there are two main categories while validating a CSV file.

  1. Structural Validation: Validating a CSV file’s structure. Checking the file’s encoding, checking if there is a header or a correct number of columns in each row are examples of schema validation processes.
  2. Data Validation: Validating the CSV file’s data. Checking the file’s type, required fields, validation of the unique columns are examples of data validation processes.

Structural Validation

Encoding Check

Each CSV file should be UTF-8 encoded. If a file is not UTF-8 encoded, no further validation cases will be applied and the process stops.

Fail Reason: File is not UTF-8 encoded.

Empty File Check

Each CSV file should contain a header and also data. If an empty file is received, no further validation cases will be applied and the process stops.

Fail Reason: File can not be empty.

Separator Check

Each CSV file should contain a comma as a separator. If any other separator is used, no further validation cases will be applied and the process stops.

Fail Reason: File’s separator is expected to be a comma. Separator could not be detected.

Parser Check

Each CSV file should be readable by the Pandas library. If Pandas library can not detect the file as a valid CSV file, no further validation cases will be applied and the process stops.

Fail Reason: File is not parsed successfully by Pandas library.

Column Check

Each row in a CSV file should contain the same number of columns. If column numbers do not match, an error is displayed in the MD file and the validation process continues.

Fail Reason: File does not contain the same number of columns in each row.

Header Check

Each CSV file should contain a header and the header should be in the first line of the file. If a header is missing or could not be detected, an error is displayed in the MD file and the validation process continues.

Fail Reason 1: Header could not be detected.

Fail Reason 2: Comma as a separator could not be detected or the line is empty.

Fail Reason 3: Header could not be found in the first line.

Data Existence Check

Each CSV file should contain a header and also data. If there is only a header but no data is present, an error is displayed in the MD file and the validation process continues.

Fail Reason: File does not contain any data but a header.

Data Validation

File Type Check

File Type Check tries to detect the file type as one of the following by their columns’ name. The valid file types are:
Faculty Information
Student Information
Course Catalog
Course Sections
Registration
Student Degrees
CLO
PLO
Staff

If the file type could not be detected, no further validation cases will be applied and the process stops.

Fail Reason: File type could not be detected. Please check the header names.

Data Integrity Check

Based on AEFIS Data Integration Specification rules, a set of validation rules are applied for data integrity.

Required Fields

Every field that is marked as required should exist and contain data in the file. If any required field has no data, an error is displayed in the MD file and the validation process continues.

Fail Reason 1: {{required_column_name}}: Required fields should exist in the file. Column not found.

Fail Reason 2: Required fields should have a value. Please check the file! Line Numbers: {{array_of_line_numbers}}

Data Type

Every field has a specific data type accepted. Numeric and Date Fields are validated by the process. If the data type does not match, an error is displayed in the MD file and the validation process continues.

Fail Reason: Column name: {{column_name}} Type mismatch! Column Should be {{data_type}}

Data Length

Every string field has a specified length. If the length of the field overflows, an error is displayed in the MD file and the validation process continues.

Fail Reason: Column name: {{column_name}} Values cannot be more than the maximum defined {{data_length_value}}. Line Numbers: {{array_of_line_numbers}}

Uniqueness

Every CSV file contains some fields which are set to be unique on their own among the file. There are also some fields that are set to be unique while combined together. If there are multiple records found, an error is displayed in the MD file and the validation process continues.

Fail Reason 1: Column name: {{column_name}} Column should be unique in the file.

Fail Reason 2: Combined fields are not unique in the file {{combined_column_name_array}}, Line Numbers: {{array_of_line_numbers}}

Feedback

Any unexpected error will also be added to the results file. In case of an unexpected error or if you have any feedback or recommendation, please reach out to [email protected]

How can AEFIS help you? Let's explore!

Become an AEFIS Academy Member Engage with AEFIS Academy in a whole new way. AEFIS Academy Members earn exclusive access to unforgettable events, inspiring conversations with engaging community, and a lot more.
Join
Do Not Show This Again!
It's Your Academy. Help us improve it!
Your feedback is vital to help build the best Academy
— and our community deserves it!

Recent Favorites

View your most recent favorited blogs, resources, events and content hubs.