• About Us
  • Contact Us
  • Terms & Conditions
  • Privacy Policy
Technology Hive
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • More
    • Deep Learning
    • AI in Healthcare
    • AI Regulations & Policies
    • Business
    • Cloud Computing
    • Ethics & Society
No Result
View All Result
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • More
    • Deep Learning
    • AI in Healthcare
    • AI Regulations & Policies
    • Business
    • Cloud Computing
    • Ethics & Society
No Result
View All Result
Technology Hive
No Result
View All Result
Home Technology

Machine Learning Fundamentals with Python

Linda Torries – Tech Writer & Digital Trends Analyst by Linda Torries – Tech Writer & Digital Trends Analyst
October 14, 2025
in Technology
0
Machine Learning Fundamentals with Python
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter

Introduction to Machine Learning Concepts

In this article, we’ll explore how to code five machine learning concepts using Python. We’ll fetch the problem statement along with starter code from Deep-ML. Additionally, we’ll be adding a little theory with each problem to give you an idea behind the code.

5 Machine Learning Concepts

The five concepts we’ll be exploring are:

  1. PCA (Principal Component Analysis)
  2. Feature Scaling
  3. Confusion Matrix for Binary Classification
  4. Overfitting & Underfitting
  5. Random Shuffle of Dataset

PCA (Principal Component Analysis)

Principal Component Analysis is a dimensionality reduction technique. Let’s assume we have a dataset that contains n-1 independent features and 1 dependent feature. Now, this gives us an n-dimensional dataset, which in some cases might be very large. So, the dimension reduction technique can be used here to get only the important features (columns), also known as components.

An important thing to keep in mind is that the loss of information should also be minimal while choosing only the important components.

Steps in PCA

  1. Data Standardization — It is crucial, as PCA chooses the important components that maximize the variance in data, so if the data isn’t standardized, PCA will be biased towards the features with a large numerical range.
  2. Covariance Matrix — The next step is to compute the covariance matrix. It shows how features vary from each other.
  3. Eigenvalues and Eigenvectors — Eigenvectors indicate the direction of the principal components, while eigenvalues describe the variance of each component.
  4. Sort Eigenvalues and Eigenvectors — Rank the principal components in descending order of their eigenvalues. The first component explains the most variance, the second explains the next most, and so on.
  5. Return TopK Components — Finally, return the top K components as new features (principal components).
import numpy as np

def pca(data: np.ndarray, k: int) -> np.ndarray:
    # Step 1: Standardize
    data_std = (data - np.mean(data, axis=0)) / np.std(data, axis=0)

    # Step 2: Covariance matrix
    cov_matrix = np.cov(data_std, rowvar=False)

    # Step 3: Eigen decomposition
    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)

    # Step 4: Sort by eigenvalue
    sorted_idx = np.argsort(eigenvalues)[::-1]
    eigenvectors = eigenvectors[:, sorted_idx]

    # Step 5: Take top-k eigenvectors
    components = eigenvectors[:, :k]

    for i in range(components.shape[1]):
        if components[0, i] < 0:
            components[:, i] *= -1

    return np.round(components, 4)

Feature Scaling

This is where you get to understand data and also play with it. Feature scaling is a pre-processing technique that helps keep the values of each column within a certain range, so they don’t vary significantly from one another.

A visual example —

Before scaling, the dataset contains values with different ranges. After scaling, all the values are within the same range.

The formula for min-max scaling is:

import numpy as np

def feature_scaling(data: np.ndarray) -> (np.ndarray, np.ndarray):
    # Formula for both
    # normalized_data = (data - X_min) / (X_max - X_min)
    # standardized_data = (data - X_mean) / X_std

    X_min = np.min(data, axis=0)
    X_max = np.max(data, axis=0)

    X_mean = np.mean(data, axis=0)
    X_std = np.std(data, axis=0)

    normalized_data = (data - X_min) / (X_max - X_min)
    standardized_data = (data - X_mean) / (X_std)

    return standardized_data, normalized_data

Confusion Matrix for Binary Classification

In ML, the confusion matrix is very confusing. But it still makes sense with practices. Let’s decode the concept step by step.

Before examining the confusion matrix, ensure you understand the classification problem in machine learning. In a classification setup, we get y_pred (a list of predicted values) after running our prediction model on X_test.

def confusion_matrix(data):
    # Implement the function here
    TP = 0 #True Positive
    FP = 0 #False Positive
    FN = 0 #False Negative
    TN = 0 #True Negative

    for y_test, y_pred in data:
        if y_test == 1 and y_pred == 1:
            TP += 1
        elif y_test == 1 and y_pred == 0:
            FN += 1
        elif y_test == 0 and y_pred == 1:
            FP += 1
        elif y_test == 0 and y_pred == 0:
            TN += 1

    return [[TP, FN],[FP, TN]]

Overfitting & Underfitting

These two concepts mainly align with training and evaluating the Machine Learning models. Overfitting describes how well the model learned from the data so that it can generalize better to new data. On the other hand, Underfitting describes that the model was not able to learn properly from the training data.

Overfitting — High accuracy on training data, but lower accuracy on test data.
Underfitting — Low accuracy on both training and test data.

def model_fit_quality(training_accuracy, test_accuracy):
    """Determine if the model is overfitting, underfitting, or a good fit based on training and test accuracy.
    :param training_accuracy: float, training accuracy of the model (0 <= training_accuracy <= 1)
    :param test_accuracy: float, test accuracy of the model (0 <= test_accuracy <= 1)
    :return: int, one of '1', '-1', or '0'.
    """
    # Your code here
    if training_accuracy - test_accuracy > 0.2:
        return 1
    elif training_accuracy < 0.7 and test_accuracy < 0.7:
        return -1
    else:
        return 0

Random Shuffle of Dataset

This is a typically overlooked but very important concept to understand. When we discuss shuffling a dataset, it refers to shuffling the rows within it. This technique is useful because it helps reduce overfitting (by preventing bias). For example, we use this method when implementing the cross-validation technique.

import numpy as np

def shuffle_data(X, y, seed=None):
    np.random.seed(seed)
    indices = np.arange(len(X))
    np.random.shuffle(indices)
    return X[indices], y[indices]

Conclusion

These were only five examples explained; you can visit Deep-ML to solve more such problems. And I highly encourage you to do so if you’re preparing for an AI/ML role.

FAQs

  1. What is PCA in machine learning?
    PCA stands for Principal Component Analysis, which is a dimensionality reduction technique.
  2. What is feature scaling in machine learning?
    Feature scaling is a pre-processing technique that helps keep the values of each column within a certain range, so they don’t vary significantly from one another.
  3. What is a confusion matrix in machine learning?
    A confusion matrix is a table used to evaluate the performance of a classification model.
  4. What is overfitting in machine learning?
    Overfitting occurs when a model is too complex and learns the noise in the training data, resulting in poor performance on new, unseen data.
  5. What is underfitting in machine learning?
    Underfitting occurs when a model is too simple and fails to capture the underlying patterns in the training data, resulting in poor performance on both training and test data.
Previous Post

Google’s Photoshop-killer AI model is coming to search, Photos, and NotebookLM

Next Post

California Hikes Fake Nude Fines To $250K Max To Shield Kids

Linda Torries – Tech Writer & Digital Trends Analyst

Linda Torries – Tech Writer & Digital Trends Analyst

Linda Torries is a skilled technology writer with a passion for exploring the latest innovations in the digital world. With years of experience in tech journalism, she has written insightful articles on topics such as artificial intelligence, cybersecurity, software development, and consumer electronics. Her writing style is clear, engaging, and informative, making complex tech concepts accessible to a wide audience. Linda stays ahead of industry trends, providing readers with up-to-date analysis and expert opinions on emerging technologies. When she's not writing, she enjoys testing new gadgets, reviewing apps, and sharing practical tech tips to help users navigate the fast-paced digital landscape.

Related Posts

Quantifying LLMs’ Sycophancy Problem
Technology

Quantifying LLMs’ Sycophancy Problem

by Linda Torries – Tech Writer & Digital Trends Analyst
October 24, 2025
Microsoft’s Mico Exacerbates Risks of Parasocial LLM Relationships
Technology

Microsoft’s Mico Exacerbates Risks of Parasocial LLM Relationships

by Linda Torries – Tech Writer & Digital Trends Analyst
October 24, 2025
Lightricks Releases Open-Source AI Video Tool with 4K and Enhanced Rendering
Technology

Lightricks Releases Open-Source AI Video Tool with 4K and Enhanced Rendering

by Linda Torries – Tech Writer & Digital Trends Analyst
October 24, 2025
OpenAI Unlocks Enterprise Knowledge with ChatGPT Integration
Technology

OpenAI Unlocks Enterprise Knowledge with ChatGPT Integration

by Linda Torries – Tech Writer & Digital Trends Analyst
October 24, 2025
Training on “junk data” can lead to LLM “brain rot”
Technology

Training on “junk data” can lead to LLM “brain rot”

by Linda Torries – Tech Writer & Digital Trends Analyst
October 24, 2025
Next Post
California Hikes Fake Nude Fines To 0K Max To Shield Kids

California Hikes Fake Nude Fines To $250K Max To Shield Kids

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Latest Articles

AI Boom Sparks Next Wave of Cloud Storage Expansion

AI Boom Sparks Next Wave of Cloud Storage Expansion

March 3, 2025
DHS AI Roadmap Prioritizes Cybersecurity and National Safety

DHS AI Roadmap Prioritizes Cybersecurity and National Safety

March 5, 2025
Martin Trust Center for MIT Entrepreneurship welcomes Ana Bakshi as new executive director

Martin Trust Center for MIT Entrepreneurship welcomes Ana Bakshi as new executive director

October 2, 2025

Browse by Category

  • AI in Healthcare
  • AI Regulations & Policies
  • Artificial Intelligence (AI)
  • Business
  • Cloud Computing
  • Cyber Security
  • Deep Learning
  • Ethics & Society
  • Machine Learning
  • Technology
Technology Hive

Welcome to Technology Hive, your go-to source for the latest insights, trends, and innovations in technology and artificial intelligence. We are a dynamic digital magazine dedicated to exploring the ever-evolving landscape of AI, emerging technologies, and their impact on industries and everyday life.

Categories

  • AI in Healthcare
  • AI Regulations & Policies
  • Artificial Intelligence (AI)
  • Business
  • Cloud Computing
  • Cyber Security
  • Deep Learning
  • Ethics & Society
  • Machine Learning
  • Technology

Recent Posts

  • Quantifying LLMs’ Sycophancy Problem
  • Microsoft’s Mico Exacerbates Risks of Parasocial LLM Relationships
  • Lightricks Releases Open-Source AI Video Tool with 4K and Enhanced Rendering
  • OpenAI Unlocks Enterprise Knowledge with ChatGPT Integration
  • Anthropic Expands AI Infrastructure with Billion-Dollar TPU Investment

Our Newsletter

Subscribe Us To Receive Our Latest News Directly In Your Inbox!

We don’t spam! Read our privacy policy for more info.

Check your inbox or spam folder to confirm your subscription.

© Copyright 2025. All Right Reserved By Technology Hive.

No Result
View All Result
  • Home
  • Technology
  • Artificial Intelligence (AI)
  • Cyber Security
  • Machine Learning
  • AI in Healthcare
  • AI Regulations & Policies
  • Business
  • Cloud Computing
  • Ethics & Society
  • Deep Learning

© Copyright 2025. All Right Reserved By Technology Hive.

Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?