PYTHON-기초통계/PYTHON 라이브러리
[PYTHON- 라이브러리]★PDF to DataFrame★
goAhEAd_29
2024. 4. 11. 15:08
728x90
반응형
1. PDF 파일내 표를 DataFrame으로 변환하고자 한다.
from tabula import read_pdf
import pandas as pd
def read_pdf_table_to_dataframe(pdf_path, page_number):
# tabula-py can only read tables from a PDF, ensure your PDF contains tables.
df_list = read_pdf(pdf_path, pages=page_number, multiple_tables=True)
# read_pdf returns a list of DataFrames, so concatenate them if there are multiple tables.
df = pd.concat(df_list, ignore_index=True) if df_list else pd.DataFrame()
return df
# Specify the path to your PDF, and the page number you want to extract the table from.
pdf_path = '젠톡_김지은_2.pdf' # Change to your PDF file path.
page_number = '8' # Change to your specific page number.
# Call the function and get the DataFrame.
df = read_pdf_table_to_dataframe(pdf_path, page_number)
# Now you can work with the DataFrame.
print(df.head())
① tabula-py를 pip install 한다.(2.9.0) , tabula를 install 하면 안된다.
②
해당 에러가 뜰경우
pip install JPype1
728x90
반응형