DBA Blogs
Hi There,
all the recent news about APEX AI Assistance for code generation ? doe sit comes out of the box ? or we need to configure and pay to AI/LLM provider like ChatGPT ?
I am referring to your recent video Build AI-Powered Enterprise Apps Faster with Oracle APEX (https://www.youtube.com/watch?v=qZD8wtn7qoI )
Regards,
Dr. Gyana
we see this error in the daily full=Y backup :
Processing object type DATABASE_EXPORT/SCHEMA/PROCACT_SCHEMA
ORA-39127: unexpected error from call to TAG: SCHEDULER Calling: SYS.DBMS_SCHED_CLASS_EXPORT.GRANT_EXP obj: SYS.IDX_RB$CLASS_54724 - SCHEDULER CLASS
ORA-01031: insufficient privileges
ORA-06512: at "SYS.DBMS_SCHED_MAIN_EXPORT", line 2601
ORA-06512: at "SYS.DBMS_SCHED_CLASS_EXPORT", line 41
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_SCHED_MAIN_EXPORT", line 2601
ORA-06512: at "SYS.DBMS_SCHED_CLASS_EXPORT", line 41
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_METADATA", line 11144
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
Processing object type DATABASE_EXPORT/EARLY_POST_INSTANCE_IMPCALLOUT/MARKER
But, when we checked ( as sysdba) - this object dosn't exist in the container that it runs:
SYS.IDX_RB$CLASS_54724
no changes were made to the database/ shell script that runs the export datapump backup.
we had seen the same issue on 10th june 2024, but without any changes the next days export backups ran good.
yesterday - again the same error is seen.
not sure what could be causing it - here is the export command that we use for daily backup:
( the $vars are location/filenames generated in script)
expdp $pcsbackup job_name=${jobname}\
directory=DIR_DBEXP\
dumpfile=${dmpfile_name}\
logfile=${logfile_name}\
full=y\
cluster=no \
parallel=4\
exclude=STATISTICS\
exclude=SCHEMA:\" IN \(SELECT USERNAME FROM DBATOOL.EXPDP_EXCLUDE_DWP_BACKUP_TABLES\)\"\
ENCRYPTION=DATA_ONLY \
ENCRYPTION_PASSWORD=<OurProdEncPassowrd> \
ENCRYPTION_MODE=DUAL \
COMPRESSION=DATA_ONLY
our Daily job expdp has full=yes and worked OK till last night.
last night the expdp log showed:
ORA-31693: Table data object
"AUDSYS"."AUD$UNIFIED":"SYS_P23021" failed to load/unload and is being skipped due to error:
ORA-02354: error in exporting/importing data
ORA-08103: object no longer exists
Attempt to manually backup the same using same user created to export full database - fails with:
expdp $pcsbackup cluster=NO parallel=1 job_name=expdp_tbl DUMPFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs_%U.dmp LOGFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs.log directory=dir_dbexp exclude=statistics encryption=data_only ENCRYPTION_PASSWORD=***** ENCRYPTION_MODE=DUAL COMPRESSION=DATA_ONLY full=no tables=AUDSYS.AUD\$UNIFIED\:SYS_P23021
Export: Release 19.0.0.0.0 - Production on Wed Jun 19 09:03:27 2024
Version 19.20.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Starting "PCSBACKUP"."EXPDP_TBL": /********@gecdwp_pcsbackup cluster=NO parallel=1 job_name=expdp_tbl DUMPFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs_%U.dmp LOGFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs.log directory=dir_dbexp exclude=statistics encryption=data_only ENCRYPTION_PASSWORD=******** ENCRYPTION_MODE=DUAL COMPRESSION=DATA_ONLY full=no tables=AUDSYS.AUD$UNIFIED:SYS_P23021
ORA-39166: Object AUDSYS.AUD$UNIFIED was not found or could not be exported or imported.
ORA-31655: no data or metadata objects selected for job
Job "PCSBACKUP"."EXPDP_TBL" completed with 2 error(s) at Wed Jun 19 09:04:08 2024 elapsed 0 00:00:39
tried different combos for table name:
AUDSYS.AUD\$UNIFIED:SYS_P23021 -- AND ABOVE - same error
expdp $pcsbackup cluster=NO parallel=1 job_name=expdp_tbl DUMPFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs_%U.dmp LOGFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs.log directory=dir_dbexp exclude=statistics encryption=data_only ENCRYPTION_PASSWORD=***** ENCRYPTION_MODE=DUAL COMPRESSION=DATA_ONLY full=no tables=AUDSYS.AUD$UNIFIED:SYS_P23021
Export: Release 19.0.0.0.0 - Production on Wed Jun 19 09:00:37 2024
Version 19.20.0.0.0
Copyright (c) 1982, 2019, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Starting "PCSBACKUP"."EXPDP_TBL": /********@gecdwp_pcsbackup cluster=NO parallel=1 job_name=expdp_tbl DUMPFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs_%U.dmp LOGFILE=expdp_gecdwp_full_2024_0618_222801_failedobjs.log directory=dir_dbexp exclude=statistics encryption=data_only ENCRYPTION_PASSWORD=******** ENCRYPTION_MODE=DUAL COMPRESSION=DATA_ONLY full=no tables=AUDSYS.AUD:SYS_P23021
ORA-39166: Object AUDSYS.AUD was not found or could not be exported or imported.
ORA-31655: no data or metadata objects selected for job
Job "PCSBACKUP"."EXPDP_TBL" completed with 2 error(s) ...
Hi,
In order to improve reading performance of a LOB column in my table, I enable CACHE option for it.
But when I check in the buffer cache using gv$bh (I did some SELECT queries on it so it can be cached), I cannot find any entries for my cached LOB.
<code>SELECT *
FROM gv$bh
WHERE lobid =
(SELECT object_id
FROM dba_lobs lob, dba_objects obj
WHERE lob.table_name = 'MY_TABLE' -- it contains only one lob column, so the returned result is unique
AND lob.SEGMENT_NAME = obj.object_name
AND lob.owner = obj.owner);</code>
Is this the way to go? or I m missing something?
Thanks.
Could you please kindly check and advise on the following. i need to find break hours between two dates. and also i need to find if this break mins is within night hours (10PM to 6AM next day).
ex:
employee work schedule 11-MAY-2012 16:00:00 and 12-MAY-2012 09:00:00.
Break hour that fall between 10PM to 6 AM is as follows
8:00PM to 8:30PM - 1st Break
12:30 Am to 1:00 AM - 2nd Break
5:00AM to 05:30 AM - 3rd break
two of the breaks are between 10PM and 6AM next day hence my night hours break time is 60mins.
i am using following, but is there any better solution
<code> (SELECT COUNT(1)
FROM (SELECT TO_CHAR ((:SHIFT_START + ((val*4.5)/24)), 'YYYY-MM-DD HH24:MI:SS') TIME ,VAL
FROM (select to_number(val,0) VAL
from (select distinct regexp_substr('1,2,3,4,5','[^,]+',1,level) val
from dual
connect by level <=regexp_count('1,2,3,4,5',',')+1
order by val)
WHERE ROWNUM <=
( FLOOR (
( 24
* ( TO_DATE (
TO_CHAR (:SHIFT_END, 'YYYY-MM-DD hh24:mi'),
'YYYY-MM-DD hh24:mi')
- TO_DATE (
TO_CHAR (:SHIFT_START,
'YYYY-MM-DD hh24:mi'),
'YYYY-MM-DD hh24:mi')))
/ 4))
)
) DT
WHERE TO_DATE (DT.TIME,'YYYY-MM-DD HH24:MI:SS')
BETWEEN TO_DATE(TRUNC(:SHIFT_START),'YYYY-MM-DD HH24:MI:SS')+22/24 AND
TO_DATE(TRUNC(:SHIFT_START),'YYYY-MM-DD HH24:MI:SS')+24/24+6/24
)*0.5
ELSE 0 END NIGHT_BREAK_HOURS</code>
Appreciated your kind help in check and advise on this
1) I am using oracle xe 21c database, apex 24.1 and deployed on tomcat 9.0.90 on my labtop
2) I try to connect my office server using Rest enabled sql service. Where I rest enabled entire schema
3) On my labtop when I try to connect it giving me error : The URL of this remote server does not point to a REST Enabled SQL service. Please check the details of your REST Enabled SQL service definition.
4) When I write same url in browser and ending with table name it showing the data.
5) Before 24.1 menas in 18.x to 20.x its working.
6) For rest enabled sql require same version of ords both side? Please guide because both side apex version and ords version are different (my labtop and my office server I am try using remotely )
7) At my office I rest enabled schema from sql workshop->restfull service also run the following scripts
BEGIN
ORDS.enable_schema(
p_enabled => TRUE,
p_schema => 'SVM',
p_url_mapping_type => 'BASE_PATH',
p_url_mapping_pattern => 'hr',
p_auto_rest_auth => FALSE
);
COMMIT;
END;
/
PLEASE REFERE THE ATTACHED VIDEO
https://drive.google.com/file/d/1VONL-ngqD53xjnxKLWSzob63qtD-jSdy/view?usp=sharing
same case appear on apex.oracle.com
workspace :[redacted]
username:[redacted]
password :[redacted]
Application id :[redacted]
Rest enabled SQL service name : svm
I understand how to use the online pragma before the invocation of a subprogram that should be inline, but based on the Oracle documentation (https://docs.oracle.com/en/database/oracle/oracle-database/19/lnpls/plsql-optimization-and-tuning.html#GUID-2E78813E-CF29-409D-9F8B-AA24B294BFA2) it should also be possible to use pragma inline to affect every invocation of the specified subprogram:
<i>When the INLINE pragma immediately precedes a declaration, it affects: Every invocation of the specified subprogram in that declaration</i>
Unfortunately when trying to do so it seems as if this would not work as expected when reading the documentation:
<code>
ALTER SESSION SET PLSQL_WARNINGS = 'ENABLE:ALL';
CREATE OR REPLACE PACKAGE foo IS
PROCEDURE bar;
END foo;
/
CREATE OR REPLACE PACKAGE BODY foo IS
PRAGMA INLINE (sub, 'YES');
PROCEDURE sub;
PROCEDURE bar IS
BEGIN
dbms_output.put_line('bar');
sub;
END bar;
PROCEDURE sub IS
BEGIN
dbms_output.put_line('sub');
END sub;
END foo;
/
SELECT * FROM user_errors;
</code>
PLW-05011: pragma INLINE for procedure 'SUB' does not apply to any calls
This video is a step-by-step tutorial to build RAG-powered AI agents using LlamaIndex, CrewAI and OpenAI on your own data.
Code: !pip install llama-index-core !pip install llama-index-readers-file !pip install llama-index-embeddings-openai !pip install llama-index-llms-llama-api !pip install 'crewai[tools]' import os from crewai import Agent, Task, Crew, Process from crewai_tools import LlamaIndexTool from llama_index.core import SimpleDirectoryReader, VectorStoreIndex from llama_index.llms.openai import OpenAI reader = SimpleDirectoryReader(input_files=["finance.csv"]) docs = reader.load_data() docs[1].get_content() from google.colab import userdata os.environ['OPENAI_API_KEY']=userdata.get('OPENAI_API_KEY') llm = OpenAI(model="gpt-4o") index = VectorStoreIndex.from_documents(docs) query_engine = index.as_query_engine(similarity_top_k=5, llm=llm) query_tool = LlamaIndexTool.from_query_engine( query_engine, name="Finance Query Tool", description="Use this tool to lookup the financial data of products and their sales", ) query_tool.args_schema.schema() researcher = Agent( role="Senior Market Analyst", goal="Uncover insights about product sales trends", backstory="""You work at a market research firm. Your goal is to understand sales patterns across different product categories.""", verbose=True, allow_delegation=False, tools=[query_tool], ) writer = Agent( role="Product Content Specialist", goal="Craft compelling content on product trends", backstory="""You are a renowned Content Specialist, known for your insightful and engaging articles. You transform complex sales data into compelling narratives.""", verbose=True, allow_delegation=False, ) # Create tasks for your agents task1 = Task( description="""Analyze the sales data of top 5 products in the last quarter.""", expected_output="Detailed sales report with trends and insights", agent=researcher, ) task2 = Task( description="""Using the insights provided, develop an engaging blog post that highlights the top-selling products and their market trends. Your post should be informative yet accessible, catering to a casual audience. Make it sound cool, avoid complex words.""", expected_output="Full blog post of at least 4 paragraphs", agent=writer, ) # Instantiate your crew with a sequential process crew = Crew( agents=[researcher, writer], tasks=[task1, task2], verbose=2, # You can set it to 1 or 2 to different logging levels ) result = crew.kickoff() print("######################") print(result)
This video is a step-by-step tutorial to create an agentic application with Langchain and Ollama locally with function calling.
Code: conda create -n agentic python=3.11 -y && conda activate agentic pip install langchain-experimental from langchain_experimental.llms.ollama_functions import OllamaFunctions model = OllamaFunctions(model="llama3:8b", format="json") model = model.bind_tools( tools=[ { "name": "get_current_weather", "description": "Get the current weather in a given location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, " "e.g. San Francisco, CA", }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"], }, }, "required": ["location"], }, } ], function_call={"name": "get_current_weather"}, ) from langchain_core.messages import HumanMessage model.invoke("what is the weather in Boston?")
This video is a step-by-step tutorial to install and integrate AgentOps with AutoGen to monitor, test and replay analytics.
Code: conda create -n agentops python=3.11 -y && conda activate agentops pip install pyautogen agentops export AGENTOPS_API_KEY="" export OPENAI_API_KEY="" import agentops import autogen from autogen import ConversableAgent, UserProxyAgent, config_list_from_json agentops.init(tags=["fahdmirza"]) import os llm_config = { "config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}], } assistant = autogen.ConversableAgent("agent", llm_config=llm_config) user_proxy = UserProxyAgent("user", code_execution_config=False) agentops.end_session("Success") ================================ import agentops import autogen from typing import Annotated, Literal from autogen import ConversableAgent, register_function import os agentops.start_session(tags=["agentictools3"]) Operator = Literal["+", "-", "*", "/"] def calculator(a: int, b: int, operator: Annotated[Operator, "operator"]) -> int: if operator == "+": return a + b elif operator == "-": return a - b elif operator == "*": return a * b elif operator == "/": return int(a / b) else: raise ValueError("Invalid operator") llm_config = { "config_list": [{"model": "gpt-4", "api_key": os.environ["OPENAI_API_KEY"]}], } # Create the agent that uses the LLM. assistant = ConversableAgent( name="Assistant", system_message="You are a helpful AI assistant. " "You can help with simple calculations. " "Return 'TERMINATE' when the task is done.", llm_config=llm_config, ) # The user proxy agent is used for interacting with the assistant agent # and executes tool calls. user_proxy = ConversableAgent( name="User", llm_config=False, is_termination_msg=lambda msg: msg.get("content") is not None and "TERMINATE" in msg["content"], human_input_mode="NEVER", ) assistant.register_for_llm(name="calculator", description="A simple calculator")(calculator) user_proxy.register_for_execution(name="calculator")(calculator) # Register the calculator function to the two agents. register_function( calculator, caller=assistant, # The assistant agent can suggest calls to the calculator. executor=user_proxy, # The user proxy agent can execute the calculator calls. name="calculator", # By default, the function name is used as the tool name. description="A simple calculator", # A description of the tool. ) # Let the assistant start the conversation. It will end when the user types exit. user_proxy.initiate_chat(assistant, message="What is (1423 - 123) / 3 + (32 + 23) * 5?") agentops.end_session("Success")
This video is a step-by-step tutorial to install Vanna.ai locally with Ollama and with your own custom database to chat with database and convert text to SQL using AI.
Code: import sqlite3 import pandas as pd df=pd.read_csv('/home/Ubuntu/finance.csv') conn=sqlite3.connect('financedb.db') df = pd.read_sql_query("SELECT * FROM finance", conn) # Print the DataFrame print(df) # Close the connection conn.close() %pip install 'vanna[chromadb,ollama]' from vanna.ollama import Ollama from vanna.chromadb import ChromaDB_VectorStore class MyVanna(ChromaDB_VectorStore, Ollama): def __init__(self, config=None): ChromaDB_VectorStore.__init__(self, config=config) Ollama.__init__(self, config=config) vn = MyVanna(config={'model': 'mistral'}) vn.connect_to_sqlite('financedb.db') df_ddl = vn.run_sql("SELECT type, sql FROM sqlite_master WHERE sql is not null") for ddl in df_ddl['sql'].to_list(): vn.train(ddl=ddl) # The following are methods for adding training data. Make sure you modify the examples to match your database. # DDL statements are powerful because they specify table names, colume names, types, and potentially relationships vn.train(ddl=""" CREATE TABLE IF NOT EXISTS myproducts ( id INT PRIMARY KEY, productname VARCHAR(100), Segment INT ) """) # Sometimes you may want to add documentation about your business terminology or definitions. vn.train(documentation="Our business defines deals with various products in various countries related to finance.") # You can also add SQL queries to your training data. This is useful if you have some queries already laying around. You can just copy and paste those from your editor to begin generating new SQL. vn.train(sql="SELECT * FROM finance WHERE Product='Montana'") # At any time you can inspect what training data the package is able to reference training_data = vn.get_training_data() training_data Whenever you ask a new question, it will find the 10 most relevant pieces of training data and use it as part of the LLM prompt to generate the SQL. ```python vn.ask(question="How many products are sold in Canada?") from vanna.flask import VannaFlaskApp app = VannaFlaskApp(vn) app.run()
This video is a step-by-step tutorial to install SimpleTuner Locally and fine-tune stable diffusion 3 medium locally on your own images.
Code: conda create -n simpletuner python=3.11 -y https://github.com/bghira/SimpleTuner.git cd SimpleTuner pip install https://github.com/turboderp/exllamav2/releases/download/v0.1.5/exllamav2-0.1.5+cu118.torch2.3.1-cp311-cp311-linux_x86_64.whl pip install transformers pip install -U huggingface-hub pip install xformers torchvision torchaudio curl -sSL https://install.python-poetry.org | python3 - pip install -U poetry pip poetry install --no-root apt -y install git-lfs mkdir -p datasets pushd datasets git clone https://huggingface.co/datasets/ptx0/pseudo-camera-10k popd -- Get config files from this link : https://github.com/bghira/SimpleTuner/blob/main/documentation/QUICKSTART.md bash train_sdxl.sh
This video is a step-by-step tutorial to build from scratch AI-powered database agents to use with your own data. Code: !pip install pyodbc==5.1.0 !pip install tabulate==0.9.0 !pip install openai==1.12.0 !pip install langchain==0.1.6 !pip install langchain-community==0.0.20 !pip install langchain-core==0.1.23 !pip install langchain-experimental==0.0.49 !pip install langchain-openai==0.0.5 !pip install pandas==2.2.2 import os from IPython.display import Markdown, HTML, display from langchain_openai import OpenAI from langchain.agents import create_sql_agent from langchain.agents.agent_toolkits import SQLDatabaseToolkit from langchain.sql_database import SQLDatabase from langchain_openai import AzureChatOpenAI from sqlalchemy import create_engine import pandas as pd file_url = "finance.csv" df = pd.read_csv(file_url).fillna(value = 0) database_file_path = "test.db" engine = create_engine(f'sqlite:///{database_file_path}') df.to_sql( 'prodinfo', con=engine, if_exists='replace', index=False ) MSSQL_AGENT_PREFIX = """ You are an agent designed to interact with a SQL database. ## Instructions: - Given an input question, create a syntactically correct {dialect} query to run, then look at the results of the query and return the answer. - Unless the user specifies a specific number of examples they wish to obtain, **ALWAYS** limit your query to at most {top_k} results. - You can order the results by a relevant column to return the most interesting examples in the database. - Never query for all the columns from a specific table, only ask for the relevant columns given the question. - You have access to tools for interacting with the database. - You MUST double check your query before executing it.If you get an error while executing a query,rewrite the query and try again. - DO NOT make any DML statements (INSERT, UPDATE, DELETE, DROP etc.) to the database. - DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE, ONLY USE THE RESULTS OF THE CALCULATIONS YOU HAVE DONE. - Your response should be in Markdown. However, **when running a SQL Query in "Action Input", do not include the markdown backticks**. Those are only for formatting the response, not for executing the command. - ALWAYS, as part of your final answer, explain how you got to the answer on a section that starts with: "Explanation:". Include the SQL query as part of the explanation section. - If the question does not seem related to the database, just return "I don\'t know" as the answer. - Only use the below tools. Only use the information returned by the below tools to construct your query and final answer. - Do not make up table names, only use the tables returned by any of the tools below. ## Tools: """ MSSQL_AGENT_FORMAT_INSTRUCTIONS = """ ## Use the following format: Question: the input question you must answer. Thought: you should always think about what to do. Action: the action to take, should be one of [{tool_names}]. Action Input: the input to the action. Observation: the result of the action. ... (this Thought/Action/Action Input/Observation can repeat N times) Thought: I now know the final answer. Final Answer: the final answer to the original input question. """ llm = OpenAI(openai_api_key="") db = SQLDatabase.from_uri(f'sqlite:///{database_file_path}') toolkit = SQLDatabaseToolkit(db=db, llm=llm) QUESTION = """Which was the most popular product in Canada in 2014 in terms of Units Sold?""" agent_executor_SQL = create_sql_agent( prefix=MSSQL_AGENT_PREFIX, format_instructions = MSSQL_AGENT_FORMAT_INSTRUCTIONS, llm=llm, toolkit=toolkit, top_k=30, verbose=True ) agent_executor_SQL.invoke(QUESTION)
This video is a step-by-step tutorial to fine-tune Stable Diffusion 3 Medium locally on your own custom image dataset.
Code: conda create -n sdft python=3.11 -y pip install peft pip install datasets pip install huggingface_hub pip install wandb pip install bitsandbytes pip install pillow pip install git+https://github.com/huggingface/transformers pip install accelerate pip install sentencepiece git clone https://github.com/huggingface/diffusers cd diffusers pip install -e . cd examples/dreambooth pip install -r requirements_sd3.txt huggingface-cli login accelerate config default from huggingface_hub import snapshot_download mkdir /home/Ubuntu/dog local_dir = "/home/Ubuntu/dog" snapshot_download( "diffusers/dog-example", local_dir=local_dir, repo_type="dataset", ignore_patterns=".gitattributes", ) ============ export MODEL_NAME="stabilityai/stable-diffusion-3-medium-diffusers" export INSTANCE_DIR="/home/Ubuntu/dog" export OUTPUT_DIR="trained-sd3-lora" accelerate launch train_dreambooth_lora_sd3.py \ --pretrained_model_name_or_path=$MODEL_NAME \ --instance_data_dir=$INSTANCE_DIR \ --output_dir=$OUTPUT_DIR \ --mixed_precision="fp16" \ --instance_prompt="a photo of sks dog" \ --resolution=512 \ --train_batch_size=1 \ --gradient_accumulation_steps=4 \ --learning_rate=1e-5 \ --report_to="wandb" \ --lr_scheduler="constant" \ --lr_warmup_steps=0 \ --max_train_steps=500 \ --validation_prompt="A photo of sks dog in a bucket" \ --validation_epochs=25 \ --seed="0" \ --push_to_hub
This video is a quick easy tutorial to install Qwen-Agent locally and create agentic applications with Qwen2 and Ollama for function calling and RAG. Code:
conda create -n qwenagent python=3.11 -y pip install -U qwen-agent import json import os from qwen_agent.llm import get_chat_model # Example dummy function hard coded to return the same weather # In production, this could be your backend API or an external API def get_current_weather(location, unit='fahrenheit'): """Get the current weather in a given location""" if 'tokyo' in location.lower(): return json.dumps({'location': 'Tokyo', 'temperature': '10', 'unit': 'celsius'}) elif 'san francisco' in location.lower(): return json.dumps({'location': 'San Francisco', 'temperature': '72', 'unit': 'fahrenheit'}) elif 'paris' in location.lower(): return json.dumps({'location': 'Paris', 'temperature': '22', 'unit': 'celsius'}) else: return json.dumps({'location': location, 'temperature': 'unknown'}) def test(): llm = get_chat_model({ 'model': 'qwen2:7b', 'model_server': 'http://localhost:11434/v1', # api_base 'api_key': 'EMPTY', }) # Step 1: send the conversation and available functions to the model messages = [{'role': 'user', 'content': "What's the weather like in San Francisco?"}] functions = [{ 'name': 'get_current_weather', 'description': 'Get the current weather in a given location', 'parameters': { 'type': 'object', 'properties': { 'location': { 'type': 'string', 'description': 'The city and state, e.g. San Francisco, CA', }, 'unit': { 'type': 'string', 'enum': ['celsius', 'fahrenheit'] }, }, 'required': ['location'], }, }] print('# Assistant Response 1:') responses = [] for responses in llm.chat(messages=messages, functions=functions, stream=True): print(responses) messages.extend(responses) # extend conversation with assistant's reply # Step 2: check if the model wanted to call a function last_response = messages[-1] if last_response.get('function_call', None): # Step 3: call the function # Note: the JSON response may not always be valid; be sure to handle errors available_functions = { 'get_current_weather': get_current_weather, } # only one function in this example, but you can have multiple function_name = last_response['function_call']['name'] function_to_call = available_functions[function_name] function_args = json.loads(last_response['function_call']['arguments']) function_response = function_to_call( location=function_args.get('location'), unit=function_args.get('unit'), ) print('# Function Response:') print(function_response) # Step 4: send the info for each function call and function response to the model messages.append({ 'role': 'function', 'name': function_name, 'content': function_response, }) # extend conversation with function response print('# Assistant Response 2:') for responses in llm.chat( messages=messages, functions=functions, stream=True, ): # get a new response from the model where it can see the function response print(responses) if __name__ == '__main__': test()
This video is a step-by-step tutorial to fine-tune or train your own AI model in any language on any custom dataset locally easily. This example fine-tunes a model in Urdu audio files locally.
Code: conda create -n audio python=3.11 -y conda activate audio pip install torch pip install transformers pip install datasets pip install huggingface_hub pip install soundfile pip install librosa pip install evaluate jiwer pip install accelerate -U pip install tensorboardX git clone https://github.com/libsndfile/libsndfile.git cd libsndfile/ sudo apt install autoconf autogen automake build-essential libasound2-dev libflac-dev libogg-dev libtool libvorbis-dev libopus-dev libmp3lame-dev libmpg123-dev pkg-config python3 autoreconf -vif ./configure --enable-werror sudo make sudo make check sudo mkdir /usr/local/lib/python3.8/dist-packages/_soundfile_data/ from huggingface_hub.hf_api import HfFolder HfFolder.save_token('<>') from datasets import load_dataset, DatasetDict common_voice = DatasetDict() common_voice["train"] = load_dataset( "mozilla-foundation/common_voice_13_0", "ur", split="train+validation" ) common_voice["test"] = load_dataset( "mozilla-foundation/common_voice_13_0", "ur", split="test" ) print(common_voice) common_voice = common_voice.select_columns(["audio", "sentence"]) from transformers import WhisperProcessor processor = WhisperProcessor.from_pretrained( "openai/whisper-small", language="urdu", task="transcribe" ) common_voice["train"].features from datasets import Audio sampling_rate = processor.feature_extractor.sampling_rate common_voice = common_voice.cast_column("audio", Audio(sampling_rate=sampling_rate)) def prepare_dataset(example): audio = example["audio"] example = processor(audio=audio["array"],sampling_rate=audio["sampling_rate"],text=example["sentence"],) example["input_length"] = len(audio["array"]) / audio["sampling_rate"] return example common_voice = common_voice.map( prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=1 ) max_input_length = 30.0 def is_audio_in_length_range(length): return length < max_input_length common_voice["train"] = common_voice["train"].filter( is_audio_in_length_range, input_columns=["input_length"], ) import torch from dataclasses import dataclass from typing import Any, Dict, List, Union @dataclass class DataCollatorSpeechSeq2SeqWithPadding: processor: Any def __call__( self, features: List[Dict[str, Union[List[int], torch.Tensor]]] ) -> Dict[str, torch.Tensor]: input_features = [{"input_features": feature["input_features"][0]} for feature in features] batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt") label_features = [{"input_ids": feature["labels"]} for feature in features] labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt") labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100) if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item(): labels = labels[:, 1:] batch["labels"] = labels return batch data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor) import evaluate metric = evaluate.load("wer") from transformers.models.whisper.english_normalizer import BasicTextNormalizer normalizer = BasicTextNormalizer() def compute_metrics(pred): pred_ids = pred.predictions label_ids = pred.label_ids label_ids[label_ids == -100] = processor.tokenizer.pad_token_id pred_str = processor.batch_decode(pred_ids, skip_special_tokens=True) label_str = processor.batch_decode(label_ids, skip_special_tokens=True) wer_ortho = 100 * metric.compute(predictions=pred_str, references=label_str) pred_str_norm = [normalizer(pred) for pred in pred_str] label_str_norm = [normalizer(label) for label in label_str] pred_str_norm = [pred_str_norm[i] for i in range(len(pred_str_norm)) if len(label_str_norm[i]) > 0] label_str_norm = [label_str_norm[i] for i in range(len(label_str_norm)) if len(label_str_norm[i]) > 0 ] wer = 100 * metric.compute(predictions=pred_str_norm, references=label_str_norm) return {"wer_ortho": wer_ortho, "wer": wer} from transformers import WhisperForConditionalGeneration model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small") from functools import partial model.config.use_cache = False model.generate = partial(model.generate, language="urdu", task="transcribe", use_cache=True) from transformers import Seq2SeqTrainingArguments training_args = Seq2SeqTrainingArguments( output_dir="./whisper-small-ur", per_device_train_batch_size=16, gradient_accumulation_steps=1, learning_rate=1e-5, lr_scheduler_type="constant_with_warmup", warmup_steps=50, max_steps=500, gradient_checkpointing=True, fp16=True, fp16_full_eval=True, evaluation_strategy="steps", per_device_eval_batch_size=16, predict_with_generate=True, generation_max_length=225, save_steps=500, eval_steps=500, logging_steps=25, report_to=["tensorboard"], load_best_model_at_end=True, metric_for_best_model="wer", greater_is_better=False, push_to_hub=True, ) from transformers import Seq2SeqTrainer trainer = Seq2SeqTrainer( args=training_args, model=model, train_dataset=common_voice["train"], eval_dataset=common_voice["test"], data_collator=data_collator, compute_metrics=compute_metrics, tokenizer=processor, ) trainer.train() from datasets import load_dataset from datasets import Audio minds = load_dataset("mozilla-foundation/common_voice_13_0", name="ur", split="train") minds = minds.cast_column("audio", Audio(sampling_rate=16_000)) example = minds[0] print(example["sentence"]) from transformers import pipeline pipe = pipeline(task="automatic-speech-recognition", model="./whisper-small-ur") print(pipe(example["audio"]["array"]))
This video shows a step-by-step process to fine-tune Qwen2 LLM locally on custom dataset for free using UnslothUnsloth easily on Google Colab. Code: %%capture # Installs Unsloth, Xformers (Flash Attention) and all other packages! !pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" !pip install --no-deps xformers "trl<0.9.0" peft accelerate bitsandbytes from unsloth import FastLanguageModel import torch max_seq_length = 2048 dtype = None load_in_4bit = True fourbit_models = [ "unsloth/Qwen2-0.5b-bnb-4bit", ] model, tokenizer = FastLanguageModel.from_pretrained( model_name = "unsloth/Qwen2-0.5B", max_seq_length = max_seq_length, dtype = dtype, load_in_4bit = load_in_4bit, ) model = FastLanguageModel.get_peft_model( model, r = 16, target_modules = ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj",], lora_alpha = 16, lora_dropout = 0, bias = "none", use_gradient_checkpointing = "unsloth", random_state = 3407, use_rslora = False, loftq_config = None, ) alpaca_prompt = """Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request. ### Instruction: {} ### Input: {} ### Response: {}""" EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN def formatting_prompts_func(examples): instructions = examples["instruction"] inputs = examples["input"] outputs = examples["output"] texts = [] for instruction, input, output in zip(instructions, inputs, outputs): text = alpaca_prompt.format(instruction, input, output) + EOS_TOKEN texts.append(text) return { "text" : texts, } pass from datasets import load_dataset dataset = load_dataset("yahma/alpaca-cleaned", split = "train") dataset = dataset.map(formatting_prompts_func, batched = True,) from trl import SFTTrainer from transformers import TrainingArguments from unsloth import is_bfloat16_supported trainer = SFTTrainer( model = model, tokenizer = tokenizer, train_dataset = dataset, dataset_text_field = "text", max_seq_length = max_seq_length, dataset_num_proc = 2, args = TrainingArguments( per_device_train_batch_size = 2, gradient_accumulation_steps = 8, # Use num_train_epochs = 1, warmup_ratio for full training runs! warmup_steps = 20, max_steps = 120, learning_rate = 5e-5, fp16 = not is_bfloat16_supported(), bf16 = is_bfloat16_supported(), logging_steps = 1, optim = "adamw_8bit", weight_decay = 0.01, lr_scheduler_type = "linear", seed = 3407, output_dir = "outputs", ), ) trainer_stats = trainer.train()
Pages
|