Data Quality Evaluation
Data quality evaluation is employed to determine how the final products are shaped and engineered to match the initial objective (MacDougall &Crummett, 1980). There are different methods of evaluating data that can be employed and these are classified into two broad methods; validation that is also referred to as certification as well as the study of sources of the errors that can be traced in the data. Data should be evaluated for its quality to determine its reliability for proper decision making. To determine the quality of data, an evaluation plan should be put in place.
Read also Data Warehouse Origin and History
Data Evaluation Plan
To determine the quality of data that will be used as the master data, there is need to carry out certain evaluation techniques. It will be prerequisite that the project plan for quality assurance was followed to the latter. It should not have any oddities that were reported to have occurred. To ascertain this information, field books and other notebook comments should be looked into (Holden, Bhagwat&Patterson, 2002). The documents containing the data should then be calibrated. The accuracy of the data should then be calculated. These calculations will be calculated by calculating the difference between the results that are measured and the standards of their accuracy (MacDougall &Crummett, 1980). The accuracy standards should be considered between the time that the data was collected and when it was sampled. The magnitude of the difference between the data’s guidelines for the accuracy of the data quality matrix (DQM).This allows for the assignment of the right data quality levels.
Read also Data Analysis and the Creation of a Statistical Test
Differences between duplicates in data should also be calculated. This precision can alsobe calculated through the comparison of the magnitude of the difference between the DQM and the data quality levels. We can also determine the overall data quality levels through the selection of the lower levels between the precision and accuracy results.
Factors that might influence the quality of data
Initial Conversion
The most basic factor that might influence the quality of data is the initial conversion that the data goes through (Holden, Bhagwat& Patterson, 2002). It is rare for a database to being its life empty. More often than not, there will usually be some sort of data conversion at the beginning of the lifecycle of the database. If something goes wrong during this stage, then there will probably be some big problems in the future as far as the quality of the data is concerned.
Read also Data Types And Data Sources In Research
Consolidation Systems
Another source of influence that usually impacts on the quality of data is the consolidation of systems. In the information technology sector, there usually are so many systems consolidations that usually take place. Usually, such conversions will take place on a regular basis. These conversions will specifically take place when there is a conversion from one system to another system in the organization (Wang, 1998). When a corporate merger occurs, there usually is a conversion that will take place as the firms merge. As this is usually unplanned for, it is usually very difficult for the firms to manage these information and data. As a result, there are several problems that might occur that will usually often than not cause the data to lose its quality.
Real-time Interfaces
The modern business enterprises usually operate on real-time interfaces. However, there is a downside to these systems as they are known to be major causes of loss of data quality (Yin, Han & Yu, 2007). The data in such systems is propagated real-time and is stored and processed as it comes in. According to Yin, Han and Yu(2007), the problem here is the fact that the data is being propagated too quickly. Actually, we can conclude that there is almost no time at all to verify that the data is okay and accurate. Usually, it is only the validity of the individual attributes that is checked. On the other hand, when this validation is taking place, when data is rejected, it can be lost forever. More often than not, quality and speed do not go along well. When data is reported and stored at very high speeds, then its quality will most probably be compromised. The vice-versa is also correct.
Data Cleansing
Now more than ever before, there are more and more companies that are working hard to cleanse their data. Notably, data cleansing was manual in the past. However, the systems are now automatic. The manual data cleansing processes that were carried out in the past were actually safer as compared to the current methods (Wang, 1998). There are different complexities as well as risks of using the automated data cleansing methods. Apparently, when carrying out automated data cleansing, you are most likely to end up with worse data quality than you had before. It is not very reliable.
Read also Strategic Plan To Move The Company’s Data To The Cloud While Still Ensuring Data Security
Data Purging
Whenever old data is purged from old systems to create room for better and more recent data, this is referred to as purging. It is carried out once in a while whenever the system’s database is clogged (Yin, Han & Yu, 2007). The organization always stands a chance of losing data and information that might be crucial to it during the purging process.
Read also How Sample Data Can Support / Inform Economic Decision Making
Resolution of Differences in Data between Systems
There are different methods that can be applied in the process of resolving differences in data between systems. In other terms, this can be considered as conflict resolution.
Accuracy of Sources
It would be best to consider the accuracy of the source of data where it was sourced or is being sourced from. There are some that are trustworthy while others are not. If there is a realization that a data source is more accurate than another, then these accuracy levels can be used to determine the true value of the other data that is merged with it (Halevy,Rajaraman&Ordille, 2006).
Freshness of Source
There are changes taking place in the world over and over. These changes mean that the value of data can change drastically within no time. The data can be relevant and of quality in one moment and then it gets to be outdated the next moment. When merging data and information, there is need to treat data from outdated sources differently from current and reliable sources (Halevy, Rajaraman&Ordille, 2006). The probability that the data is relevant and applicable should also be stated in the merger.
Dependence between Sources
In resolving mergers with dissimilar data, it should be considered whether the data was acquired from dependent sources. In other words, we should state whether there is any data that is copied from other sources. The true value of such data should be calculated using an algorithm that will consider its dependencies.
Missing Data
In databases when you experience missing data it is usually indicates as Null values. There are different types of null values which include those that have an unknown value, those whose value is inapplicable and finally those whose value is withheld (Bleiholder, &Naumann, 2006). The data that has unknown value normally exist in a state that makes sense although they themselves are not known. The inapplicable values are those that do not make sense at all and cannot be applied while the withheld values are those that make sense although they are inaccessible.
Get Your Custom Paper From Professional Writers. 100% Plagiarism Free, No AI Generated Content and Good Grade Guarantee. We Have Experts In All Subjects.
Place Your Order Now