Developer Blog

Tipps und Tricks für Entwickler und IT-Interessierte

Daily AI: Analyse WebPages with AI

Introduction

Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP) by providing powerful capabilities for understanding and generating human language. Open-source LLMs have democratized access to these technologies, allowing developers and researchers to innovate and apply these models in various domains. In this blog post, we will explore Ollama, a framework for working with LLMs, and demonstrate how to load webpages, parse them, build embeddings, and query the content using Ollama.

Understanding Large Language Models (LLMs)

LLMs are neural networks trained on vast amounts of text data to understand and generate human language. They can perform tasks such as translation, summarization, question answering, and more. Popular LLMs include GPT-3, BERT, and their open-source counterparts like GPT-Neo and BERT variants. These models have diverse applications, from chatbots to automated content generation.

Introducing Ollama

Ollama is an open-source framework designed to simplify the use of LLMs in various applications. It provides tools for training, fine-tuning, and deploying LLMs, making it easier to integrate these powerful models into your projects. With Ollama, you can leverage the capabilities of LLMs to build intelligent applications that understand and generate human language.

Example

The following example from the ollama documentation demonstrates how to use the LangChain framework in conjunction with the Ollama library to load a web page, process its content, create embeddings, and perform a query on the processed data. Below is a detailed explanation of the script’s functionality and the technologies used.

Technologies Used

  1. LangChain: A framework for building applications powered by large language models (LLMs). It provides tools for loading documents, splitting text, creating embeddings, and querying data.
  2. Ollama: A library for working with LLMs and embeddings. In this script, it’s used to generate embeddings for text data.
  3. BeautifulSoup (bs4): A library used for parsing HTML and XML documents. It’s essential for loading and processing web content.
  4. ChromaDB: A vector database used for storing and querying embeddings. It allows efficient similarity searches.

Code Breakdown

Imports and Setup

The script starts by importing the necessary modules and libraries, including sys, Ollama, WebBaseLoader, RecursiveCharacterTextSplitter, OllamaEmbeddings, Chroma, and RetrievalQA.

from langchain_community.llms import Ollama

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

Loading the Web Page

The script uses WebBaseLoader to load the content of a webpage. In this case, it loads the text of “The Odyssey” by Homer from Project Gutenberg.

print("- get web page")

loader = WebBaseLoader("https://www.gutenberg.org/files/1727/1727-h/1727-h.htm")
data = loader.load()

Splitting the Document

Due to the large size of the document, it is split into smaller chunks using RecursiveCharacterTextSplitter. This ensures that the text can be processed more efficiently.

print("- split documents")

text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

Creating Embeddings and Storing Them

The script creates embeddings for the text chunks using the Ollama library and stores them in ChromaDB, a vector database. This step involves instantiating an embedding model (nomic-embed-text) and using it to generate embeddings for each text chunk.

print("- create vectorstore")

oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)

Performing a Similarity Search

A question is formulated, and the script uses the vector database to perform a similarity search. It retrieves chunks of text that are semantically similar to the question.

print("- ask for similarities")

question="Who is Neleus and who is in Neleus' family?"
docs = vectorstore.similarity_search(question)
nrofdocs=len(docs)
print(f"{question}: {nrofdocs}")

Creating an Ollama Instance and Defining a Retrieval Chain

The script initializes an instance of the Ollama model and sets up a retrieval-based question-answering (QA) chain. This chain is used to process the question and retrieve the relevant parts of the document.

print("- create ollama instance")
ollama = Ollama(
    base_url='http://localhost:11434',
    model="llama3"
)

print("- get qachain")
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())

Running the Query

Finally, the script invokes the QA chain with the question and prints the result.

print("- run query")
res = qachain.invoke({"query": question})

print(res['result'])

Result

Now lets look at the impresiv result:

Try another example: ask wikipedia page

In this example, we are going to use LangChain and Ollama to learn about something just a touch more recent. In August 2023, there was a series of wildfires on Maui. There is no way an LLM trained before that time can know about this, since their training data would not include anything as recent as that.

So we can find the Wikipedia article about the fires and ask questions about the contents.

url = "https://en.wikipedia.org/wiki/2023_Hawaii_wildfires"
question="When was Hawaii's request for a major disaster declaration approved?"

Daily AI: Analyse Images with AI

General

With Open Source Toools, it is easy to analyse images.

Just install Ollama, download the llava image and run this command:

❯ ollama run llava:latest "Beschreibe das Bild <path to image>"

Try this image: Statue of LIberty

❯ ollama run llava:latest "Beschreibe das Bild /tmp/statue-liberty-liberty-island-new-york.jpg"
Added image '/tmp/statue-liberty-liberty-island-new-york.jpg'
The image shows the Statue of Liberty, an iconic landmark in New York Harbor. This neoclassical statue is a symbol of freedom and democracy, and it has become a universal symbol of the United States. The statue is situated on Liberty Island, which is accessible via ferries from Manhattan.

In the background, you can see a clear sky with some clouds, indicating good weather. The surrounding area appears to be lush with greenery, suggesting that the photo was taken in spring or summer when vegetation is abundant. There are also people visible at the base of the statue, which gives a sense of scale and demonstrates the size of the monument.

Daily: VS Code Error NSOSStatusErrorDomain

If you got an NSOSStatusErrorDomain Error, when you start VS Code from the command line

❯ code
[0309/155203.303710:ERROR:codesign_util.cc(108)] SecCodeCheckValidity: Error Domain=NSOSStatusErrorDomain Code=-67062 "(null)" (-67062)

You should do this: codesign --force --deep --sign -

❯ which code
/Users/Shared/VSCode/Default/Visual Studio Code - Insiders.app/Contents/Resources/app/bin/code

❯ codesign --force --deep --sign - "/Users/Shared/VSCode/Default/Visual Studio Code - Insiders.app"
/Users/Shared/VSCode/ralphg/Visual Studio Code - Insiders.app: replacing existing signature

❯ code -v
1.88.0-insider
19ecb4b8337d0871f0a204853003a609d716b04e
x64

Daily Azure: Migrate a Storge Account

TL;DR

Migration is done via azcopy:

  • download souce container to local folder
  • upload local folder to destination container

Get AzCopy

Here is the script install-azcopy.ps1:

# Download and extract

#
$URI = "https://aka.ms/downloadazcopy-v10-windows"
$DST = "~\AppData\Local\Programs\AZCopy\"

Invoke-WebRequest -Uri $URI -OutFile AzCopy.zip -UseBasicParsing
Expand-Archive ./AzCopy.zip ./AzCopy -Force

# Move AzCopy
mkdir ~\AppData\Local\Programs\AZCopy
Get-ChildItem ./AzCopy/*/azcopy.exe | Move-Item -Destination "~\$DEST"

# Add AzCopy to PATH
$userenv = (Get-ItemProperty -Path 'HKCU:\Environment' -Name Path).path
$newPath = "$userenv
New-ItemProperty -Path 'HKCU:\Environment' -Name Path -Value $newPath -Force

# Clean the kitchen
del -Force AzCopy.zip
del -Force -Recurse .\AzCopy\

Copy Folder

param (
    $FOLDER = "",

    [Parameter(Mandatory=$false)]
    [string]$TYPE   = "latest",

    [Parameter(Mandatory=$false)]
    [switch]$LOGIN
)

if ($TYPE -eq "latest") {
    $SRC_ROOT="<latest-folder>"
    $DST_ROOT="latest"
} else {
    $SRC_ROOT="<history-folder>"
    $DST_ROOT="history"
}


$SRC_ACCCOUNT = "<source storage account>";
$DST_ACCCOUNT = "<destination storage account>";

$SRC_CONTAINER = "<source container>"
$DST_CONTAINER = "<destination container>"


$SRC_URL      = "https://${SRC_ACCCOUNT}.blob.core.windows.net/$SRC_CONTAINER/$SRC_ROOT/$FOLDER/"
$DST_URL      = "https://${DST_ACCCOUNT}.blob.core.windows.net/$DST_CONTAINER/$DST_ROOT/"

$TMP_FLDR     = "C:\TMP\Downloads"

Write-Host  "== Copy     '$FOLDER'"
Write-Host "       from  $SRC_URL"
Write-Host  "        to  $DST_URL"

#

if ($LOGIN) {
    $ENV:AZCOPY_CRED_TYPE = "OAuthToken";
    $ENV:AZCOPY_CONCURRENCY_VALUE = "AUTO";

    azcopy login
}

Write-Host  "== Download ======================================================"
Write-Host "       from  $SRC_URL"
Write-Host  "        to  $TMP_FLDR\$CONTAINER"

azcopy copy         $SRC_URL                                                                                      `
                    $TMP_FLDR                                                                                     `
                    --trusted-microsoft-suffixes=${SRC_ACCCOUNT}.blob.core.windows.net                            `
                    --overwrite=true                                                                              `
                    --check-md5=FailIfDifferent                                                                   `
                    --from-to=BlobLocal                                                                           `
                    --recursive                                                                                   `
                    --log-level=ERROR

# Upload
Write-Host  "== Upload   ======================================================"
Write-Host  "      from  $TMP_FLDR\$CONTAINER"
Write-Host  "        to  $DST_URL"

azcopy copy         $TMP_FLDR\$CONTAINER                         `
                    $DST_URL                                     `
                    --overwrite=true                             `
                    --from-to=LocalBlob                          `
                    --blob-type BlockBlob                        `
                    --follow-symlinks                            `
                    --check-length=true                          `
                    --put-md5                                    `
                    --follow-symlinks                            `
                    --disable-auto-decoding=false                `
                    --recursive                                  `
                    --log-level=ERROR

Call the script

First call should use -login to authenticate

.\copy.ps1 demo-folder-1 -login

Then, following commands dont need the login

.\copy.ps1 demo-folder-2

Daily: Running Microsoft SQL-Server in Docker

Introduction

Using Docker is an effortless way to launch and run an application/server software without annoying installation hassles: Just run the image and you’re done.

Even if it’s quite an uncomplicated way of looking at it, in many cases it works just like that.

So, let’s start with using Microsoft SQL Server as a database backend. We will use a docker image from Microsoft. Look here to find out more.

docker	run                                       \
			--name mssql-server       \
			--memory 4294967296       \	
			-e "ACCEPT_EULA=Y"        \
			-e "SA_PASSWORD=secret"   \
			-p 1433:1433              \
			mcr.microsoft.com/mssql/server:2019-latestles

FAQ

Error: program requires a machine with at least 2000 megabytes of memory

Start the docker container as described on the Docker Hub Page: How to use this Image

❯ docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=secret" -p 1433:1433 mcr.microsoft.com/mssql/server:2022-latest

Depending on how your docker environment is configured, this could bring up an error:

SQL Server 2019 will run as non-root by default.
This container is running as user mssql.
To learn more visit https://go.microsoft.com/fwlink/?linkid=2099216.
sqlservr: This program requires a machine with at least 2000 megabytes of memory.
/opt/mssql/bin/sqlservr: This program requires a machine with at least 2000 megabytes of memory.

As the error message states, the MS SQL server needs at least 2g of RAM. So, you must assign your Docker VMs more memory. This is configured in the Docker Dashboard.

Hint: Docker has two ways of running containers:

  • using Windows Container
  • using WSL (Windows Subsystem for Linux)

You can change the way with the context menu of the docker symbol in the task bar:

With Linux Containers (using WSL as backend), you must configure the containers via a file .wslconfig.

This file is in the folder defined by the environment variable

To open the file, run the command:

From Command Promptnotepad

Edit the content and change the memory setting

[wsl2]
memory=3GB

Restart WSL with the new settings.

❯ wsl --shutdown

Start Container again, now everything should work

❯ docker run -e "ACCEPT_EULA=Y" -e "SA_PASSWORD=secret" -p 1433:1433 mcr.microsoft.com/mssql/server:2022-latest

Daily: Build a Development Environment with Docker and VS Code

Introduction

Working with different software (samples, compilers, demos) always requires an adequate environment.

Because i don’t want to pollute my normal environment (my running PC), i decided to use a virtual environment with Docker.

Luckily, VS Code supports this by using remote containers and working fully within these containers.

The Files

.devcontainer\devcontainer.json

{
	"name": "Prolog Environment",

	"dockerComposeFile": [
		"docker-compose.yml"
	],

	"service": "app",
	"workspaceFolder": "/workspace",

	"settings": {},
	"extensions": []
}

.devcontainer\docker-compose.yml

version: '3.8'
services:
  app:
    
    build:
        context: .
        dockerfile: Dockerfile

    container_name: pws_prolog

    volumes:
        - ../workspace:/workspace:cached

    # Overrides default command so things don't shut down after the process ends.
    command: /bin/sh -c "while sleep 1000; do :; done"
 

.devcontainer\Dockerfile

#------------------------------------------------------------------------------
# STAGE 1:
#------------------------------------------------------------------------------
FROM ubuntu:latest as base_nodejs

# Configure Timezone
ENV TZ 'Europe/Berlin'

RUN echo $TZ > /etc/timezone 

RUN    apt-get update \
    && apt-get install -y tzdata \
    && rm /etc/localtime \
    && ln -snf /usr/share/zoneinfo/$TZ /etc/localtime \
    && dpkg-reconfigure -f noninteractive tzdata \
    && apt-get clean

#
RUN apt-get install --yes build-essential curl sudo git vim

# Create user
RUN    groupadd work -g 1000 \
    && adduser user --uid 1000 --gid 1000 --home /workspace --disabled-password --gecos User

# Setup sudo
RUN echo 

# Install Prolog
RUN  apt-get -y install swi-prolog

#
USER user

VOLUME [ "/workspace" ]
WORKDIR /workspace

CMD ["/bin/bash"]

The Explanation

Copyright © 2024 | Powered by WordPress | Aasta Blog theme by ThemeArile