System Design for data processing
As per the requirements of large-scale data processing, the total data processing work is structured over a number of functional steps. As the data processing is decentralized and a number of people are engaged in the data processing, each step of it is planned beforehand, documented in detail, concepts and definitions are explained, training workshops are organised both centrally and then locally, and mid-course discussions are also held to sort out unforeseen data problems. Hence, it is a formal system of data processing. The stages of data processing are:
i) Checking of identification and monitoring receipt position.
ii) Hot scrutiny by officers to identify errors, which are recurring in nature, committed by field at the early stage of data collection
iii) Pre-data-entry scrutiny of schedules for manual checking of important fields
iv) Data entry and 100% verification
v) Phase-I validation (Content Check): This includes preparation of lists of inconsistencies, checking of such lists from hard copy of schedules, updation of data files and insertion of records. Number of such checks for each type of schedule varies from 60 to 150. At this stage :
- fields are checked against list of admissible values (codes)
- arithmetic consistency checks including range checks and subtotal checks are carried out on numeric fields, as well as for a group of related fields, within the same block, and for different blocks.
- records relating to same item or person are searched out between two or more blocks, and arithmetic/conditional checks are carried out involving different numeric fields from the selected records.
- person-wise or item-wise information furnished in different blocks are matched for coverage check.
- Unit value checks for quantity and value fields
- duplicate records are detected
v) Phase-II validation (Coverage Check) : Under this stage,
- coverage of data vis-a-vis the directory file, in respect of each FSU and SSU are checked.
- absence of any essential block of data is checked
- duplication of FSU or SSU data are checked
- consistency in ID-particulars are checked against directory file
vi) Phase -III validation (Extreme value checking) : Here abnormal or suspected values or derived index are searched out and referred to for checking from the filled-in schedule.
vii) Special data checking are done by officers who are involved in tabulation. A list of doubtful cases are generated and checked by DPCs with filled-in schedules and necessary updation is done in the data files.
viii) Computer-editing or Auto-correction : Under this stage,
- necessary changes in the data in order to make it internally consistent, as per a set of guidelines, are made without referring to filled-in schedule.
- all subtotals/totals are computed for each SSU, and additional records are generated if necessary.
- compatibility of current data are checked against previous years data in respect of certain important variables.
ix) Preparation of multiplier files, i.e. calculation of weighting factors for each Ultimate Stage Unit as per sample design.
x) Preparation of work files, which are extracts from the data to facilitate table generation. Related tables are usually grouped together, and all the data fields required for generating those related tables are extracted into a single workfile.
xi) Tabulation of data: Typically the number of tables(as per the approved Tabulation Plan) to be prepared for each schedule varies from 70 to 200. The tables are usually generated sector x state x sex x other socio-economic category-wise. Based on the tables generated by DPD, the SDRD prepares subject wise reports which after due approval are subsequently published.
xii) Release of multiplier posted unit level data along with meta data for dissemination through Computer Centre immediately after the release of Key Indicators reports of the survey by SDRD.
Basic steps of data processing