Recognizing Handwritten Digits With Scikit-Learn

In this blog, we are going to Data Analysis of 'Handwritten Digits With Scikit-Learn'- 

A.) Recognizing handwritten text is a problem that can be traced back to the first automatic machines that needed to recognize individual characters in handwritten documents. 

 B.) Think about, for example, the ZIP codes on letters at the post office and the automation needed to recognize these five digits. Perfect recognition of these codes is necessary in order to sort mail automatically and efficiently. Included among the other applications that may come to mind is OCR (Optical Character Recognition) software. OCR software must read handwritten text, or pages of printed books, for general electronic documents in which each character is well defined. 

C.) But the problem of handwriting recognition goes farther back in time, more precisely to the early 20th Century (1920s), when Emanuel Goldberg (1881–1970) began his studies regarding this issue and suggested that a statistical approach would be an optimal choice. 

D.) To address this issue in Python, the scikit-learn library provides a good example to better understand this technique, the issues involved, and the possibility of making predictions. 

E.) The scikit-learn library (http://scikit-learn.org/) enables you to approach this type of data analysis in a way that is slightly different from what you’ve used in Project 1. The data to be analyzed is closely related to numerical values or strings, but can also involve images and sounds. The problem you have to face in this Internship project involves predicting a numeric value, and then reading and interpreting an image that uses a handwritten font. So even in this case you will have an estimator with the task of learning through a fit() function, and once it has reached a degree of predictive capability (a model sufficiently valid), it will produce a prediction with the predict() function. Then we will discuss the training set and validation set, created this time from a series of images.

F.) The Digits data set of the Scikit-learn library provides numerous data-sets that are useful for testing many problems of data analysis and prediction of the results. Some Scientist claims that it predicts the digit accurately 95% of the times. Perform data Analysis to accept or reject this Hypothesis.


Step 1.) 




Cell 1 
Includes Importing Python Libraries

Cell 2 

Includes Sklearn Module- 

Cell 3 

Loading the digits dataset from scikit-learn and Description of the digits dataset

Here Digits dataset is a dictionary that contains data, targets, images, features names, description of the dataset, target names, etc. 


Step 2.)


 

Cell 4
All the numbers are stored in image array.




Cell 5

Loading image for digit 7 for manipulation


Step 3.)



Cell 6
We focus mainly on data and targets. We extract both on different variables.

The numerical values represented by images, i.e., the targets, are contained in the digit.target array.

The size of the dataset consists of 1797 images. We can also see the total number of columns in the dataset.


Cell 7
Function To Display Digit and 

Visualizing this image for digit 7 using matplotlib

imshow Display data as an image.








Cell 9
we can also print images from data by reshaping

Step 4.)





For this 1797 data we have to consider only 1791 as training dataset and last 6 data for validation dataset.

Modified training of data

Our accuracy for modified training is also 100%




Case 2: 75% values for training (1348), and 25% values for testing (449)

The accuracy for 75:25 division is 96.88 %

Case 3: 50% values for training (899), and 50% values for testing (898)

The accuracy for 50:50 division is 96.99 %



Case 4: 20% values for training (359), and 80% values for testing (1438)

The accuracy for 80:20 division is 90.13 %

Case 5: 5% values for training (90), and 95% values for testing (1707)

The accuracy for 80:20 division is 82.02 %


Conclusion
The accuracy will be much better when more data is used for training purposes, before testing the model.

"I Am Thankful To Mentors At https://internship.suvenconsultants.com For Providing Awesome Problem Statements And Giving Many Of Us A Coding Internship Experience. Thank you www.suvenconsultants.com"