• 내용으로 건너뛰기

Out of the Box

사용자 도구

  • 로그인

사이트 도구

  • 최근 바뀜
  • 미디어 관리자
  • 사이트맵
추적:

tag:2023

역링크

현재 문서를 가리키는 링크가 있는 문서 목록입니다.

  • review:2023-01_gpt_in_60_lines_of_numpy
  • review:2023-03_chatgpt4pcg_competition_character-like_level_generation_for_science_birds
  • review:2023-03_multiple_hands_make_light_work_enhancing_quality_and_diversity_using_map-elites_with_multiple_parallel_evolution_strategies
  • review:2023-03_understanding_plasticity_in_neural_networks
  • review:2023-04_generative_agents_interactive_simulacra_of_human_behavior
  • review:2023-04_gymnax_reinforcement_learning_environments_in_jax
  • review:2023-05_deep_reinforcement_learning_with_plasticity_injection
  • review:2023-05_improving_language_model_negotiation_with_self-play_and_in-context_learning_from_ai_feedback
  • review:2023-06_a_technical_report_for_polyglot-ko_open-source_large-scale_korean_language_models
  • review:2023-06_jumanji_a_diverse_suite_of_scalable_reinforcement_learning_environments_in_jax
  • review:2023-06_secrets_of_rlhf_in_large_language_models_part_i_ppo
  • review:2023-07_polylm_an_open_source_polyglot_large_language_model
  • review:2023-08_jiang_chinese_open_foundation_language_model
  • review:2023-08_maintaining_plasticity_in_continual_learning_via_regenerative_regularization
  • review:2023-08_minizero_comparative_analysis_of_alphazero_and_muzero_on_go_othello_and_atari_games
  • review:2023-10_amago_scalable_in-context_reinforcement_learning_for_adaptive_agents
  • review:2023-10_a_general_theoretical_paradigm_to_understand_learning_from_human_preferences
  • review:2023-10_large_language_models_as_generalizable_policies_for_embodied_tasks
  • review:2023-10_mistral_7b
  • review:2023-10_vanishing_gradients_in_reinforcement_finetuning_of_language_models
  • review:2023-11_minimax_efficient_baselines_for_autocurricula_in_jax
  • review:2023-12_batched_low-rank_adaptation_of_foundation_models
  • review:2023-12_diloco_distributed_low-communication_training_of_language_models
  • review:2023-12_direct_preference_optimization_your_language_model_is_secretly_a_reward_model
  • review:2023-12_efficient_large_language_models_a_survey
  • review:2023-12_llm-powered_hierarchical_language_agent_for_real-time_human-ai_coordination
  • review:2023-12_scalable_agent-based_modeling_for_complex_financial_market_simulations
  • review:2023-12_speeding_up_the_gpt_-_kv_cache
  • review:2023-12_unicron_economizing_self-healing_llm_training_at_scale
  • review:2023-12_xland-minigrid_scalable_meta-reinforcement_learning_environments_in_jax
  • review:2024-01_efficient_tool_use_with_chain-of-abstraction_reasoning
  • review:2024-01_enhancing_end-to-end_multi-task_dialogue_systems_a_study_on_intrinsic_motivation_reinforcement_learning_algorithms_for_improved_training_and_adaptability
  • review:2024-03_stop_regressing_training_value_functions_via_classification_for_scalable_deep_rl

문서 도구

  • 문서 보기
  • 이전 판
  • 역링크
  • Fold/unfold all
  • 맨 위로
별도로 명시하지 않을 경우, 이 위키의 내용은 다음 라이선스에 따라 사용할 수 있습니다: CC Attribution-Noncommercial-Share Alike 4.0 International
CC Attribution-Noncommercial-Share Alike 4.0 International Donate Powered by PHP Valid HTML5 Valid CSS Driven by DokuWiki