Python – Developer Blog

A Developer’s Guide to Simplifying Environment Management

As developers, managing virtual environments is a crucial part of our workflow. With Python projects constantly shifting between dependencies and Python versions, using tools that streamline this process is key. Enter uv: a tool designed to simplify the creation, activation, and management of virtual environments and to manage python packages and projects.

In this post, I’ll introduce you to uv, walk you through its installation, and provide some tips to help you get started.

What is `uv`?

uv is an extremely fast Python package and project manager, written in Rust. It is a powerful tool that allows developers to manage Python virtual environments effortlessly. It provides functionality to create, activate, and switch between virtual environments in a standardized way.

By using uv, you can ensure that your virtual environments are consistently created and activated across different projects without the need to manually deal with multiple commands.

Why Use `uv`?

Managing Python projects often involves juggling various dependencies, versions, and configurations. Without proper tooling, this can become a headache. uv helps by:

Standardizing virtual environments across projects, ensuring consistency.
Simplifying project setup, requiring fewer manual steps to get your environment ready.
Minimizing errors by automating activation and management of virtual environments.

Hint

In our examples, before each command you will see our shell prompt:

❯

Don’t type the ❯ when you enter the command. So, when seeing

❯  uv init

just type

uv init

In addition, when we activate the virtual environment, you will see a changed prompt:

✦ ❯

Installation and Setup

Getting started with uv is easy. Below are the steps for installing and setting up uv for your Python projects.

1. Install `uv`

With MacOS or Linux, you can install uv from the website:

❯ curl -LsSf https://astral.sh/uv/install.sh | sh

Alternatively, you can install uv using pip. You’ll need to have Python 3.8+ installed on your system.

❯ pip install uv

2. Create a New Virtual Environment

Once installed, you can use uv to create a virtual environment for your project. Simply navigate to your project directory and run:

❯ uv new

This command will create a new virtual environment inside the .venv folder within your project.

3. Activate the Virtual Environment

After creating the virtual environment, you can easily activate it using the following command:

uv activate

No need to worry about different activation scripts for Windows, Linux, or macOS. uv handles that for you.

4. Install Your Dependencies

Once the environment is active, you can install your project’s dependencies as you normally would:

❯ pip install -r requirements.txt

uv ensures that your dependencies are installed in the correct environment without any extra hassle.

You can also switch to a pyproject.toml file to manage your dependencies.

First you have to initialize the project:

❯ uv init

Then, add the dependency:

❯ uv add requests

Tips with virtual environments

When you create a virtual environment, the corresponding folder should be in your PATH.

Normally this is .venv/bin, when you create it with uv init. This path is added to your $PATH variable when you run uv activate.

But, if you want to choose a different folder, you must set the variable UV_PROJECT_ENVIRONMENT to this path:

❯ mkdir playground
❯ cd playground
❯ /usr/local/bin/python3.12 -m venv .venv/python/3.12
❯ . .venv/python/3.12/bin/activate

✦ ❯ which python
.../Playground/.venv/python/3.12/bin/python

✦ ❯ export UV_PROJECT_ENVIRONMENT=$PWD/.venv/python/3.12

✦ ❯ pip install uv
Collecting uv
  Downloading uv-0.4.25-py3-none-macosx_10_12_x86_64.whl.metadata (11 kB)
Downloading uv-0.4.25-py3-none-macosx_10_12_x86_64.whl (13.2 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 13.2/13.2 MB 16.5 MB/s eta 0:00:00
Installing collected packages: uv
Successfully installed uv-0.4.25

✦ ❯ which uv
.../Playground/.venv/python/3.12/bin/uv

✦ ❯ uv init
Initialized project `playground`

So, with the default settings, you will get an error because uv is searching the virtual environment in .venv.

✦ ❯ uv add requests
warning: `VIRTUAL_ENV=.venv/python/3.12` does not match the project environment path `.../.venv/python/3.12` and will be ignored

Use the environment variable to tell uv where the virtual environment is installed.

✦ ❯ export UV_PROJECT_ENVIRONMENT=$PWD/.venv/python/3.12

✦ ❯ uv add requests
Resolved 6 packages in 0.42ms
Installed 5 packages in 8ms
 + certifi==2024.8.30
 + charset-normalizer==3.4.0
 + idna==3.10
 + requests==2.32.3
 + urllib3==2.2.3

Tip

Use direnv to automatically set your environment:

Install direnv: https://direnv.net/docs/installation.html

Set .envrc file:

✦ ❯ . .venv/python/3.12/bin/activate
✦ ❯ export UV_PROJECT_ENVIRONMENT=$PWD/.venv/python/3.12

Allow the .envrc file:

✦ ❯ direnv allow

Common `uv` Commands

Here are a few more useful uv commands to keep in mind:

Deactivate the environment: uv deactivate
Remove the environment: uv remove
List available virtual environments in your project: uv list

Tips for Using `uv` Effectively

Consistent Environment Names: By default, uv uses .venv as the folder name for virtual environments. Stick to this default to keep things consistent across your projects.
Integrate uv into your CI/CD pipeline: Ensure that your automated build tools use the same virtual environment setup by adding uv commands to your pipeline scripts.
Use uv in combination with pyproject.toml: If your project uses pyproject.toml for dependency management, uv can seamlessly integrate, ensuring your environment is always up to date.
Quick Switching: If you manage multiple Python projects, uv‘s environment activation and deactivation commands make it easy to switch between projects without worrying about which virtual environment is currently active.
Automate Activation: Combine uv with direnv or add an activation hook in your shell to automatically activate the correct environment when you enter a project folder.

Cheatsheet

`uv` Command Cheatsheet

General Commands

`uv new`	Creates a new virtual environment in the `.venv` directory.
`uv activate`	Activates the virtual environment.
`uv deactivate`	Deactivates the active virtual environment.
`uv remove`	Removes the virtual environment in the project.
`uv list`	Lists all available virtual environments in the project.
`uv install`	Installs dependencies from `requirements.txt` or `pyproject.toml`.
`uv pip [pip-command]`	Runs a pip command within the virtual environment.
`uv python [python-command]`	Runs a Python command within the virtual environment.
`uv shell`	Starts a new shell session with the virtual environment active.
`uv status`	Shows the status of the current virtual environment.

Working with Dependencies

`uv pip install [package]`	Installs a Python package in the active environment.
`uv pip uninstall [package]`	Uninstalls a Python package from the environment.
`uv pip freeze`	Outputs a list of installed packages and their versions.
`uv pip list`	Lists all installed packages in the environment.
`uv pip show [package]`	Shows details about a specific installed package.

Environment Management

`uv activate`	Activates the virtual environment.
`uv deactivate`	Deactivates the active environment.
`uv remove`	Deletes the current virtual environment.
`uv list`	Lists all virtual environments in the project.

Cleanup and Miscellaneous

`uv clean`	Removes all `.pyc` and cache files from the project.
`uv upgrade`	Upgrades `uv` itself to the latest version.

Using Python and Pip Inside Virtual Environment

`uv python`	Runs Python within the virtual environment.
`uv pip [command]`	Runs any pip command within the virtual environment.

Helper Commands

`uv status`	Displays the current virtual environment status.
`uv help`	Displays help about available commands.

Introduction

Large Language Models (LLMs) have revolutionized the field of Natural Language Processing (NLP) by providing powerful capabilities for understanding and generating human language. Open-source LLMs have democratized access to these technologies, allowing developers and researchers to innovate and apply these models in various domains. In this blog post, we will explore Ollama, a framework for working with LLMs, and demonstrate how to load webpages, parse them, build embeddings, and query the content using Ollama.

Understanding Large Language Models (LLMs)

LLMs are neural networks trained on vast amounts of text data to understand and generate human language. They can perform tasks such as translation, summarization, question answering, and more. Popular LLMs include GPT-3, BERT, and their open-source counterparts like GPT-Neo and BERT variants. These models have diverse applications, from chatbots to automated content generation.

Introducing Ollama

Ollama is an open-source framework designed to simplify the use of LLMs in various applications. It provides tools for training, fine-tuning, and deploying LLMs, making it easier to integrate these powerful models into your projects. With Ollama, you can leverage the capabilities of LLMs to build intelligent applications that understand and generate human language.

Example

The following example from the ollama documentation demonstrates how to use the LangChain framework in conjunction with the Ollama library to load a web page, process its content, create embeddings, and perform a query on the processed data. Below is a detailed explanation of the script’s functionality and the technologies used.

Technologies Used

LangChain: A framework for building applications powered by large language models (LLMs). It provides tools for loading documents, splitting text, creating embeddings, and querying data.
Ollama: A library for working with LLMs and embeddings. In this script, it’s used to generate embeddings for text data.
BeautifulSoup (bs4): A library used for parsing HTML and XML documents. It’s essential for loading and processing web content.
ChromaDB: A vector database used for storing and querying embeddings. It allows efficient similarity searches.

Code Breakdown

Imports and Setup

The script starts by importing the necessary modules and libraries, including sys, Ollama, WebBaseLoader, RecursiveCharacterTextSplitter, OllamaEmbeddings, Chroma, and RetrievalQA.

from langchain_community.llms import Ollama

from langchain_community.document_loaders import WebBaseLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import OllamaEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.chains import RetrievalQA

Loading the Web Page

The script uses WebBaseLoader to load the content of a webpage. In this case, it loads the text of “The Odyssey” by Homer from Project Gutenberg.

print("- get web page")

loader = WebBaseLoader("https://www.gutenberg.org/files/1727/1727-h/1727-h.htm")
data = loader.load()

Splitting the Document

Due to the large size of the document, it is split into smaller chunks using RecursiveCharacterTextSplitter. This ensures that the text can be processed more efficiently.

print("- split documents")

text_splitter=RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

Creating Embeddings and Storing Them

The script creates embeddings for the text chunks using the Ollama library and stores them in ChromaDB, a vector database. This step involves instantiating an embedding model (nomic-embed-text) and using it to generate embeddings for each text chunk.

print("- create vectorstore")

oembed = OllamaEmbeddings(base_url="http://localhost:11434", model="nomic-embed-text")
vectorstore = Chroma.from_documents(documents=all_splits, embedding=oembed)

Performing a Similarity Search

A question is formulated, and the script uses the vector database to perform a similarity search. It retrieves chunks of text that are semantically similar to the question.

print("- ask for similarities")

question="Who is Neleus and who is in Neleus' family?"
docs = vectorstore.similarity_search(question)
nrofdocs=len(docs)
print(f"{question}: {nrofdocs}")

Creating an Ollama Instance and Defining a Retrieval Chain

The script initializes an instance of the Ollama model and sets up a retrieval-based question-answering (QA) chain. This chain is used to process the question and retrieve the relevant parts of the document.

print("- create ollama instance")
ollama = Ollama(
    base_url='http://localhost:11434',
    model="llama3"
)

print("- get qachain")
qachain=RetrievalQA.from_chain_type(ollama, retriever=vectorstore.as_retriever())

Running the Query

Finally, the script invokes the QA chain with the question and prints the result.

print("- run query")
res = qachain.invoke({"query": question})

print(res['result'])

Result

Now lets look at the impresiv result:

Try another example: ask wikipedia page

In this example, we are going to use LangChain and Ollama to learn about something just a touch more recent. In August 2023, there was a series of wildfires on Maui. There is no way an LLM trained before that time can know about this, since their training data would not include anything as recent as that.

So we can find the Wikipedia article about the fires and ask questions about the contents.

url = "https://en.wikipedia.org/wiki/2023_Hawaii_wildfires"

question="When was Hawaii's request for a major disaster declaration approved?"

8Jul

Daily AI: Analyse Images with AI

by Ralph Daily, Ollama, Python

General

With Open Source Toools, it is easy to analyse images.

Just install Ollama, download the llava image and run this command:

❯ ollama run llava:latest "Beschreibe das Bild <path to image>"

Try this image: Statue of LIberty

❯ ollama run llava:latest "Beschreibe das Bild /tmp/statue-liberty-liberty-island-new-york.jpg"

Added image '/tmp/statue-liberty-liberty-island-new-york.jpg'
The image shows the Statue of Liberty, an iconic landmark in New York Harbor. This neoclassical statue is a symbol of freedom and democracy, and it has become a universal symbol of the United States. The statue is situated on Liberty Island, which is accessible via ferries from Manhattan.

In the background, you can see a clear sky with some clouds, indicating good weather. The surrounding area appears to be lush with greenery, suggesting that the photo was taken in spring or summer when vegetation is abundant. There are also people visible at the base of the statue, which gives a sense of scale and demonstrates the size of the monument.

10May

Ollama | Create a ChatGPT Clone with Ollama and HyperDiv

by Ralph HyperDiv, Ollama, Python

In this blog post, we’ll explore how to create a ChatGPT-like application using Hyperdiv and Ollama. Hyperdiv provides a flexible framework for building web applications, while Ollama offers powerful local machine learning capabilities.

We will start with the Hyperdiv GPT-chatbot app template and adapt it to leverage Ollama, which runs locally. This guide will walk you through the necessary steps and code changes to integrate these technologies effectively.

TL;DR

The complete code for this tutorial is here.

Step 1: Setting Up Your Environment

Install Ollama

Download Ollama from https://ollama.com/download.

Install (Windows) or unpack (macOS) the downloaded file. This gets you an Ollama app (which allows you to start the Ollama service) and a Ollama command line.

Start the Ollama service by starting the Ollama app.

On macOS, you will see an icon for the Ollama Servce at the top bar.

Then, open a terminal and type ollama list. This command displays the install models.

ollama list

To install a model, type

ollama pull llama3

For our ChatGPT Clone, we will use the llama3 model.

If you want to use another model, then search here: https://ollama.com/library

Clone the HyperDiv Examples Repository

Start by cloning or downloading the Hyperdiv GPT-chatbot app. This app provides a basic structure for a chatbot application, which we will modify to work with Ollama.

Go to your desired local folder to store the sources and type

git clone https://github.com/hyperdiv/hyperdiv-apps

Then, go to the folder hyperdiv-apps/gpt-chatbot

Adapt app to use Ollama backend

First, we will create an ollama client to process all request:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

Then we modify the request function to use this client

We change

response = openai.ChatCompletion.create(

response = client.chat.completions.create(

Next step is changing the accees to the response fields. With OpenAI, the response data is a dictionary, so the way to acess the fields is like

chunk["choices"]

With Ollama, we can access the field by name

chunk.choices

The changes are

 for chunk in response:
    message = chunk.choices[0].delta
    state.current_reply += message.content

And the last step would be the change to use the correct model:

model = form.select(
    options=("codellama", "llama2", "llama3", "mistral"),
        value="llama3",
        name="gpt-model",
)

Thats is! Save all changes

Prepare Python environment and run app

Install the required modules:

pip install openai hyperdiv

Run the app:

python start.py

Open the browser at http://localhost:8888

Final Result

The complete code for this tutorial is here.

8Oct

BeautifulSoup | Complete Cheatsheet with Examples

by Ralph BeautifulSoup, Python

Installation

pip install beautifulsoup4

from bs4 import BeautifulSoup

Creating a BeautifulSoup Object

Parse HTML string:

html = "<p>Example paragraph</p>"
soup = BeautifulSoup(html, 'html.parser')

Parse from file:

with open("index.html") as file:
  soup = BeautifulSoup(file, 'html.parser')

BeautifulSoup Object Types

When parsing documents and navigating the parse trees, you will encounter the following main object types:

Tag

A Tag corresponds to an HTML or XML tag in the original document:

soup = BeautifulSoup('<p>Hello World</p>')
p_tag = soup.p

p_tag.name # 'p'
p_tag.string # 'Hello World'

Tags contain nested Tags and NavigableStrings.

NavigableString

A NavigableString represents text content without tags:

soup = BeautifulSoup('Hello World')
text = soup.string

text # 'Hello World'
type(text) # bs4.element.NavigableString

BeautifulSoup

The BeautifulSoup object represents the parsed document as a whole. It is the root of the tree:

soup = BeautifulSoup('<html>...</html>')

soup.name # '[document]'
soup.head # <head> Tag element

Comment

Comments in HTML are also available as Comment objects:

<!-- This is a comment -->

Copy

comment = soup.find(text=re.compile('This is'))
type(comment) # bs4.element.Comment

Knowing these core object types helps when analyzing, searching, and navigating parsed documents.

Searching the Parse Tree

By Name

HTML:

<div>
  <p>Paragraph 1</p>
  <p>Paragraph 2</p>
</div>

Python:

paragraphs = soup.find_all('p')
# <p>Paragraph 1</p>, <p>Paragraph 2</p>

By Attributes

HTML:

<div id="content">
  <p>Paragraph 1</p>
</div>

Python:Copy

div = soup.find(id="content")
# <div id="content">...</div>

By Text

HTML:

<p>This is some text</p>

Python:

p = soup.find(text="This is some text")
# <p>This is some text</p>

Searching with CSS Selectors

CSS selectors provide a very powerful way to search for elements within a parsed document.

Some examples of CSS selector syntax:

By Tag Name

Select all

tags:

soup.select("p")

By ID

Select element with ID “main”:

soup.select("#main")

By Class Name

Select elements with class “article”:

soup.select(".article")

By Attribute

Select tags with a “data-category” attribute:

soup.select("[data-category]")

Descendant Combinator

Select paragraphs inside divs:

soup.select("div p")

Child Combinator

Select direct children paragraphs:

soup.select("div > p")

Adjacent Sibling

Select h2 after h1:

soup.select("h1 + h2")

General Sibling

Select h2 after any h1:

soup.select("h1 ~ h2")

By Text

Select elements containing text:

soup.select(":contains('Some text')")

By Attribute Value

Select input with type submit:

soup.select("input[type='submit']")

Pseudo-classes

Select first paragraph:

soup.select("p:first-of-type")

Chaining

Select first article paragraph:

soup.select("article > p:nth-of-type(1)")

Accessing Data

HTML:

<p class="content">Some text</p>

Python:

p = soup.find('p')
p.name # "p"
p.attrs # {"class": "content"}
p.string # "Some text"

The Power of find_all()

The find_all() method is one of the most useful and versatile searching methods in BeautifulSoup.

Returns All Matches

find_all() will find and return a list of all matching elements:

all_paras = soup.find_all('p')

This gives you all paragraphs on a page.

Flexible Queries

You can pass a wide range of queries to find_all():Name – find_all(‘p’)Attributes – find_all(‘a’, class_=’external’)Text – find_all(text=re.compile(‘summary’))Limit – find_all(‘p’, limit=2)And more!

Useful Features

Some useful things you can do with find_all():Get a count – len(soup.find_all(‘p’))Iterate through results – for p in soup.find_all(‘p’):Convert to text – [p.get_text() for p in soup.find_all(‘p’)]Extract attributes – [a[‘href’] for a in soup.find_all(‘a’)]

Why It’s Useful

In summary, find_all() is useful because:It returns all matching elementsIt supports diverse and powerful queriesIt enables easily extracting and processing result data

Whenever you need to get a collection of elements from a parsed document, find_all() will likely be your go-to tool.

Navigating Trees

Traverse up and sideways through related elements.

Modifying the Parse Tree

BeautifulSoup provides several methods for editing and modifying the parsed document tree.

HTML:

<p>Original text</p>

Python:

p = soup.find('p')
p.string = "New text"

Edit Tag Names

Change an existing tag name:

tag = soup.find('span')
tag.name = 'div'

Edit Attributes

Add, modify or delete attributes of a tag:

tag['class'] = 'header' # set attribute
tag['id'] = 'main'

del tag['class'] # delete attribute

Edit Text

Change text of a tag:

tag.string = "New text"

Append text to a tag:

tag.append("Additional text")

Insert Tags

Insert a new tag:

new_tag = soup.new_tag("h1")
tag.insert_before(new_tag)

Delete Tags

Remove a tag entirely:

tag.extract()

Wrap/Unwrap Tags

Wrap another tag around:

tag.wrap(soup.new_tag('div))

Unwrap its contents:

tag.unwrap()

Modifying the parse tree is very useful for cleaning up scraped data or extracting the parts you need.

Outputting HTML

Input HTML:

<p>Hello World</p>

Python:

print(soup.prettify())

# <p>
#  Hello World
# </p>

Integrating with Requests

Fetch a page:

import requests

res = requests.get("<https://example.com>")
soup = BeautifulSoup(res.text, 'html.parser')

Parsing Only Parts of a Document

When dealing with large documents, you may want to parse only a fragment rather than the whole thing. BeautifulSoup allows for this using SoupStrainers.

There are a few ways to parse only parts of a document:

By CSS Selector

Parse just a selection matching a CSS selector:

from bs4 import SoupStrainer

only_tables = SoupStrainer("table")
soup = BeautifulSoup(doc, parse_only=only_tables)

This will parse only the tags from the document.

By Tag Name

Parse only specific tags:

only_divs = SoupStrainer("div")
soup = BeautifulSoup(doc, parse_only=only_divs)

By Function

Pass a function to test if a tag should be parsed:

def is_short_string(string):
  return len(string) < 20

only_short_strings = SoupStrainer(string=is_short_string)
soup = BeautifulSoup(doc, parse_only=only_short_strings)

This parses tags based on their text content.

By Attributes

Parse tags that contain specific attributes:

has_data_attr = SoupStrainer(attrs={"data-category": True})
soup = BeautifulSoup(doc, parse_only=has_data_attr)

Multiple Conditions

You can combine multiple strainers:

strainer = SoupStrainer("div", id="main")
soup = BeautifulSoup(doc, parse_only=strainer)

This will parse only

Parsing only parts you need can help reduce memory usage and improve performance when scraping large documents.

Dealing with Encoding

When parsing documents, you may encounter encoding issues. Here are some ways to handle encoding:

Specify at Parse Time

Pass the from_encoding parameter when creating the BeautifulSoup object:

soup = BeautifulSoup(doc, from_encoding='utf-8')

This handles any decoding needed when initially parsing the document.

Encode Tag Contents

You can encode the contents of a tag:

tag.string.encode("utf-8")

Use this when outputting tag strings.

Encode Entire Document

To encode the entire BeautifulSoup document:

soup.encode("utf-8")

This returns a byte string with the encoded document.

Pretty Print with Encoding

Specify encoding when pretty printing

print(soup.prettify(encoder="utf-8"))

Unicode Dammit

BeautifulSoup’s UnicodeDammit class can detect and convert incoming documents to Unicode:

from bs4 import UnicodeDammit

dammit = UnicodeDammit(doc)
soup = dammit.unicode_markup

This converts even poorly encoded documents to Unicode.

Properly handling encoding ensures your scraped data is decoded and output correctly when using BeautifulSoup.

21Sep

Django | Debugging Django-App in VS Code

by Ralph Django, Python

See here how to configure VS Code:

Switch to Run view in VS Code (using the left-side activity bar or F5). You may see the message
“To customize Run and Debug create a launch.json file”.
This means that you don’t yet have a launch.json file containing debug configurations. VS Code can create that for you if you click on the create a launch.json file link:

Select the link and VS Code will prompt for a debug configuration. Select Django from the dropdown and VS Code will populate a new launch.json file with a Django run configuration.
The launch.json file contains a number of debugging configurations, each of which is a separate JSON object within the configuration array.

Scroll down to and examine the configuration with the name “Python: Django”:

{
  "version": "0.2.0",
  "configurations": [
    {
      "name": "Python: Django",
      "type": "python",
      "request": "launch",
      "program": "${workspaceFolder}\\manage.py",
      "args": ["runserver"],
      "django": true,
      "justMyCode": true
    }
  ]
}

This configuration tells VS Code to run "${workspaceFolder}/manage.py" using the selected Python interpreter and the arguments in the args list.
Launching the VS Code debugger with this configuration, then, is the same as running python manage.py runserver in the VS Code Terminal with your activated virtual environment. (You can add a port number like "5000" to args if desired.)
The "django": true entry also tells VS Code to enable debugging of Django page templates, which you see later in this tutorial.

Test the configuration by selecting the Run > Start Debugging menu command, or selecting the green Start Debugging arrow next to the list (F5):

Ctrl+click the http://127.0.0.1:8000/ URL in the terminal output window to open the browser and see that the app is running properly.

Close the browser and stop the debugger when you’re finished. To stop the debugger, use the Stop toolbar button (the red square) or the Run > Stop Debugging command (Shift+F5).

You can now use the Run > Start Debugging at any time to test the app, which also has the benefit of automatically saving all modified files.

23Aug

PySpark: Getting Started

by Ralph Allgemein, Apache Spark, PySpark, Python

PySpark and Jupyter Notebook

export PYSPARK_DRIVER_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'

Then, run

pyspark

20Jul

Python | Cookbook

by Ralph Cookbook, PIP, Python

Pip

List all available versions of a package	`pip install --use-deprecated=legacy-resolver <module>==`
	`wget -q https://pypi.org/pypi/PyJWT/json -O - \| python -m json.tool -`

Show Pip Configuration

pip config list

Set Pip Cache Folder

pip config set global.cache-dir D:\Temp\Pip\Cache

Location of packages

pip show <module>

Installation

Update all Python Packages with Powershell

pip freeze |

Update Packages

requirements.txt aktualisieren und alle Versionsnummern als Minimalversionnummer setzen

sed -i '' 's/==/>=/g' requirements.txt
pip install -U -r requirements.txt
pip freeze > requirements.txt

pip install --upgrade --force-reinstall -r requirements.txt

pip install --ignore-installed -r requirements.txt

8Jul

FastAPI| Arbeiten mit FastAPI

by Ralph FastAPI, Python, Tutorial

Installation

FastAPI basiert auf den nachfolgenden leistungsfähigen Paketen:

Starlette für deb Web-Teil.
Pydantic für den Daten-Teil.

pip install fastapi

Oder die Installation von FastAPI mit allen Komponenten

pip install fastapi[all]

pip install uvicorn[standard]

Arbeiten mit Datenbanken

Alembic

Alembic ist ein leichtgewichtiges Datenbankmigrationstool zur Verwendung mit dem SQLAlchemy Database Toolkit für Python.

pip install alembic
alembic init alembic
alembic list_templates
alembic init --template generic ./scripts

Migrationsskript zur Erstellug der Tabelle ‘account’

alembic revision -m "create account table"

Migrationsskript bearbeiten

def upgrade():
    op.create_table(
        'account',
        sa.Column('id', sa.Integer, primary_key=True),
        sa.Column('name', sa.String(50), nullable=False),
        sa.Column('description', sa.Unicode(200)),
    )

def downgrade():
    op.drop_table('account')

Migration durchführen

alembic upgrade head

Migrationsskript erstellen für das Hinzufügen einer Spalte

alembic revision -m "Add a column"

Migrationsskript bearbeiten

def upgrade():
    op.add_column('account', sa.Column('last_transaction_date', sa.DateTime))

def downgrade():
    op.drop_column('account', 'last_transaction_date')

Migration durchführen

alembic upgrade head

11Jun

SAS | Migrate from SAS to Python

by Ralph Python, SAS

Introduction

Cookbook

`proc freq`

proc freq data=mydata;
    tables myvar / nocol nopercent nocum;
run;

mydata.myvar.value_counts().sort_index()

sort by frequency

proc freq order=freq data=mydata;
	tables myvar / nocol nopercent nocum;
run;

mydata.myvar.value_counts()

with missing

proc freq order=freq data=mydata;
    tables myvar / nocol nopercent nocum missing;
run;

mydata.myvar.value_counts(dropna=False)

`proc means`

proc means data=mydata n mean std min max p25 median p75;
    var myvar;
run;

mydata.myvar.describe()

more percentiles

proc means data=mydata n mean std min max p1 p5 p10 p25 median p75 p90 p95 p99;
	var myvar;
run;

mydata.myvar.describe(percentiles=[.01, .05, .1, .25, .5, .75, .9, .95, .99])

`data` step

concatenate datasets

data concatenated;
    set mydata1 mydata2;
run;

concatenated = pandas.concat([mydata1, mydata2])

`proc contents`

proc contents data=mydata;
run;

mydata.info()

save output

proc contents noprint data=mydata out=contents;
run;

contents = mydata.info()  # check this is right

Misc

number of rows in a datastep

* Try this for size: http://www2.sas.com/proceedings/sugi26/p095-26.pdf;

len(mydata)

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

A Developer’s Guide to Simplifying Environment Management

What is uv?

Why Use uv?

Hint

Installation and Setup

1. Install uv

2. Create a New Virtual Environment

3. Activate the Virtual Environment

4. Install Your Dependencies

Tips with virtual environments

Tip

Common uv Commands

Tips for Using uv Effectively

Cheatsheet

uv Command Cheatsheet

General Commands

Working with Dependencies

Environment Management

Cleanup and Miscellaneous

Using Python and Pip Inside Virtual Environment

Helper Commands

More to read

Introduction

Understanding Large Language Models (LLMs)

Introducing Ollama

Example

Technologies Used

Code Breakdown

Result

Try another example: ask wikipedia page

General

TL;DR

Step 1: Setting Up Your Environment

Install Ollama

Clone the HyperDiv Examples Repository

Adapt app to use Ollama backend

Prepare Python environment and run app

Final Result

Installation

Creating a BeautifulSoup Object

BeautifulSoup Object Types

Tag

NavigableString

BeautifulSoup

Comment

Searching the Parse Tree

By Name

By Attributes

By Text

Searching with CSS Selectors

By Tag Name

By ID

By Class Name

By Attribute

Descendant Combinator

Child Combinator

Adjacent Sibling

General Sibling

By Text

By Attribute Value

Pseudo-classes

Chaining

Accessing Data

The Power of find_all()

Returns All Matches

Flexible Queries

Useful Features

Why It’s Useful

Navigating Trees

Modifying the Parse Tree

Edit Tag Names

Edit Attributes

Edit Text

Insert Tags

Delete Tags

Wrap/Unwrap Tags

Outputting HTML

Integrating with Requests

Parsing Only Parts of a Document

By CSS Selector

What is `uv`?

Why Use `uv`?

1. Install `uv`

Common `uv` Commands

Tips for Using `uv` Effectively

`uv` Command Cheatsheet

`proc freq`

`proc means`

`data` step

`proc contents`